ProCan’s Software Engineering team has developed and maintains the IT infrastructure which enables and underpins ProCan’s outcomes.
For each injection into the mass spec, around 1 gigabyte (GB) of data is generated; at full throughput, this gives 100 GB of raw data to manage, curate, and process per day. By the end of the program, ProCan will have more than 200 terabytes (TB) of data just from the mass specs – that’s a 2 with 14 zeroes after it – but it’s significantly less data than in the digital images our Pathology team is capturing: each one of those can be up to 10 GB per image.
ProCan relies on a large computational infrastructure from a 500-core high-performance computer with 750 TB of storage, to an auto-scaling cluster that runs on Amazon’s cloud platform. It takes in the order of an hour to process each of the raw data files from the mass specs, through a pipeline of machine-learning algorithms; at the conclusion of which a matrix of data is produced that lists the abundance of each peptide for each injection in a given study. Furthermore, the computational infrastructure needs to ensure that metadata is captured through every step along the way, whether it is the buffer used during sample preparation, the mass spec instrument operator, time of day, or pipeline software version. This metadata is crucial to ensuring data is reproducible and analyses are repeatable, but it is also important for the normalisation and batch correction that must be performed in the next stage of analysis.
Team Leader, ProCan Software Engineering
View full bio
Leader, ProCan Software Engineering
Software Engineer, Procan Software Engineering
DevOps Engineer, Procan Software Engineering
(Biomed Eng) Backend Engineer (HPC, LIMS & Data Lake)