Skip to main content

Research Overview

ProCan’s Software Engineering team has developed and maintains the IT infrastructure which enables and underpins ProCan’s outcomes.

For each injection into the mass spec, around 1 gigabyte (GB) of data is generated; at full throughput, this gives 100 GB of raw data to manage, curate, and process per day. By the end of the program, ProCan will have more than 200 terabytes (TB) of data just from the mass specs – that’s a 2 with 14 zeroes after it – but it’s significantly less data than in the digital images our Pathology team is capturing: each one of those can be up to 10 GB per image.

ProCan relies on a large computational infrastructure from a 500-core high-performance computer with 750 TB of storage, to an auto-scaling cluster that runs on Amazon’s cloud platform. It takes in the order of an hour to process each of the raw data files from the mass specs, through a pipeline of machine-learning algorithms; at the conclusion of which a matrix of data is produced that lists the abundance of each peptide for each injection in a given study. Furthermore, the computational infrastructure needs to ensure that metadata is captured through every step along the way, whether it is the buffer used during sample preparation, the mass spec instrument operator, time of day, or pipeline software version. This metadata is crucial to ensuring data is reproducible and analyses are repeatable, but it is also important for the normalisation and batch correction that must be performed in the next stage of analysis.

Lab Head

Michael Dausmann

Michael Dausmann

Team Leader, ProCan Software Engineering

Leader, ProCan Software Engineering

View full bio

Team Members

Ori Livson
Software Engineer, Procan Software Engineering
1252e2fd e50c 4f41 a222 629fab4b9e54
Muhammad Ramzan
DevOps Engineer, Procan Software Engineering
58d58fc7 2fec 454c 867b 8de7ea942467
Michael Hecker
(Biomed Eng) Backend Engineer (HPC, LIMS & Data Lake)