ProCan Software Engineering

Research Overview

The mission of the ProCan Software Engineering team is to provide powerful but frictionless computational capabilities so that our scientists can focus on the science unconstrained by limitations in their computing environment, while ensuring data security and data quality.

Modern biological research is as much about number crunching and sophisticated data analytics as it is about peering down a microscope.

In the case of ProCan, a significant amount of computation needs to happen before the raw output from our mass spectrometers can be used to answer clinical questions.  At the start of the ProCan journey, this computation was handled by individual researchers however as ProCan grew, it became clear that to scale, this computational step would need to become 'Business as Usual' and some automation was needed.

ProCan's Software engineering team have created a highly automated computational pipeline for Proteomics data processing.  The pipeline uses a hybrid Kubernetes cluster to manage computing resources and execute individual parts of the computational flow.  This kubernetes environment is able to leverage both Cloud computing capacity via AWS EC2 instances and capacity in our own High Performance Computing facility on-site at Westmead.  Our Software team built custom software (we call it SWON) to manage and track all aspects of the computational runs.

In addition to the computational pipeline, the Software Engineering team manages all of ProCan's over 2 Petabytes of on-site storage and the hybrid-cloud environments used for ad-hoc computation the Proteomics and Cancer Data Science teams.

The Software Engineering team balances DevOps support of the ProCan environments with major application development. Current projects include integration of a digital pathology platform (Omero) for secure web-based external pathology review of samples; development of a federated query database (POCKet) to access all ProCan’s proteomic, genomic, clinical and other data; and launching an end-to-end Laboratory Information Management System (ProLIMS).