Skip to main content

ProCan Publications

  • Poulos, R. C., Cai, Z., Robinson, P. J., Reddel, R. R., Zhong, Q. (2022). Opportunities for pharmacoproteomics in biomarker discovery. Proteomics. Publication link.
  • Boys, E.L., Liu, J., Robinson, P.J., Reddel, R.R. (2022). Clinical applications of mass spectrometry-based proteomics in cancer: Where are we? Proteomics. Publication link.
  • Reddel, R.R., Aref, A. (2022). Targeting brain cancer: A new drug is designed to hit an old target with increased precision. Science, Vol 377, Issue 6605 pp. 467-468. Publication link.
  • Gonçalves, E., Poulos R.C, Cai Z., Barthorpe, S., Manda, S.S, Lucas, N., Beck, A., Bucio-Noble, D., Dausmann, M., Hall, C., Hecker, M., Koh, J., Lightfoot, H., Mahboob, S., Mali, I., Morris, J., Richardson, L., Seneviratne, A.J., Shepherd, R., Sykes, E., Thomas, F., Valentini, S., Williams, S.G., Wu, Y., Xavier, D., MacKenzie, K.L., Hains, P.G., Tully, B., Robinson, P.J., Zhong, Q., Garnett, M.J., Reddel, R.R. (2022). Pan-cancer proteomic map of 949 human cell lines. Cancer Cell, Volume 40, Issue 8, 835 - 849.e8. Publication link
  • Cai, S., Poulos, R.C., Liu, J. and Zhong, Q. (2022). Machine learning for multi-omics data integration in cancer. iScience. Publication link.
  • Manda, S.S., Noor, Z., Hains, P.G., Zhong, Q. (2021). PIONEER: Pipeline for Generating High-Quality Spectral Libraries for DIA-MS Data. Current Protocols. Publication link.
  • Akila J Seneviratne, Sean Peters, David Clarke, Michael Dausmann, Michael Hecker, Brett Tully, Peter G Hains and Qing Zhong (2021). Improved identification and quantification of peptides in mass spectrometry data via chemical and random additive noise elimination (CRANE). Bioinformatics btab563. Publication Link.

    The output of electrospray ionisation - liquid chromatography mass spectrometry (ESI-LC-MS) is influenced by multiple sources of noise and major contributors can be broadly categorised as baseline, random and chemical noise. Noise has a negative impact on the identification and quantification of peptides, which influences the reliability and reproducibility of MS-based proteomics data. Most attempts at denoising have been made on either spectra or chromatograms independently, thus important two-dimensional information is lost because the mass-to-charge ratio and retention time dimensions are not considered jointly. This paper presents a novel technique for denoising raw ESI-LC-MS data via two-dimensional undecimated wavelet transform, which is applied to proteomics data acquired by data-independent acquisition MS (DIA-MS). We demonstrate that denoising DIA-MS data results in the improvement of peptide identification and quantification in complex biological samples.

  • Tully,B. (2020) Toffee - a highly efficient, lossless file format for DIA-MS. Sci Rep 10, 8939, PM:32488104.

Described here is ‘toffee’, an open file format for mass spectrometry data with lossless compression that gives file sizes similar to the original vendor format. It is shown that mzML and toffee are equivalent when processing data using OpenSWATH algorithms, in additional to novel applications that are enabled by new data access patterns. For instance, a peptide-centric deep-learning pipeline for peptide identification is proposed. Specifically in the context of ProCan, this reduces our long-term storage costs from >$60k per month to around $6k per month -- a critical development for long-term sustainability of bio-bank scale proteomics.

The cancer tissue proteome has enormous potential as a source of novel predictive biomarkers in oncology. Progress in the development of mass spectrometry (MS)‐based tissue proteomics now presents an opportunity to exploit this by applying the strategies of comprehensive molecular profiling and big‐data analytics that are refined in other fields of ‘omics research. ProCan (ProCan is a registered trademark) is a program aiming to generate high‐quality tissue proteomic data across a broad spectrum of cancer types. It is based on data‐independent acquisition–MS proteomic analysis of annotated tissue samples sourced through collaboration with expert clinical and cancer research groups. The practical requirements of a high‐throughput translational research program have shaped the approach that ProCan is taking to address challenges in study design, sample preparation, raw data acquisition, and data analysis. The ultimate goal is to establish a large proteomics knowledge‐base that, in combination with other cancer ‘omics data, will accelerate cancer research.

  • Peters S, Hains PG, Lucas N, Robinson PJ, Tully B. A Case Study and Methodology for OpenSWATH Parameter Optimization Using the ProCan90 Data Set and 45 810 Computational Analysis Runs. J Proteome Res. 2019 18(3):1019-1031 PubMed: 30652484

In the current study, we show how ProCan90, a curated data set of HEK293 technical replicates, can be used to optimize the configuration options for algorithms in the OpenSWATH pipeline. Furthermore, we use this case study as a proof of concept for horizontal scaling of such a pipeline to allow 45 810 computational analysis runs of OpenSWATH to be completed within four and a half days on a budget of US $10 000. Through the use of Amazon Web Services (AWS), we have successfully processed each of the ProCan 90 files with 506 combinations of input parameters. In total, the project consumed more than 340 000 core hours of compute and generated in excess of 26 TB of data. Using the resulting data and a set of quantitative metrics, we show an analysis pathway that allows the calculation of two optimal parameter sets, one for a compute rich environment (where run time is not a constraint), and another for a compute poor environment (where run time is optimized). For the same input files and the compute rich parameter set, we show a 29.8% improvement in the number of quality protein (>2 peptide) identifications found compared to the current OpenSWATH defaults, with negligible adverse effects on quantification reproducibility or drop in identification confidence, and a median run time of 75 min (103% increase). For the compute poor parameter set, we find a 55% improvement in the run time from the default parameter set, at the expense of a 3.4% decrease in the number of quality protein identifications, and an intensity CV decrease from 14.0% to 13.7%.

  • Lucas,N., Robinson,A.B., Marcker,E.M., Mahboob,S., Xavier,D., Xue,J., Balleine,R.L., deFazio,A., Hains,P.G. & Robinson,P.J. (2018) Accelerated Barocycler Lysis and Extraction (ABLE) sample preparation for clinical proteomics by mass spectrometry. J Proteome Res 18, 399-401., PMID:30444966

We have developed a streamlined proteomic sample preparation protocol termed Accelerated Barocycler Lysis and Extraction (ABLE) that substantially reduces the time and cost of tissue sample processing. ABLE is based on pressure cycling technology (PCT) for rapid tissue solubilization and reliable, controlled proteolytic digestion. Here, a previously reported PCT based protocol was optimized using 1–4 mg biopsy punches from rat kidney. The tissue denaturant urea was substituted with a combination of sodium deoxycholate (SDC) and N-propanol. ABLE produced comparable numbers of protein identifications in half the sample preparation time, being ready for MS injection in 3 h compared with 6 h for the conventional urea based method. To validate ABLE, it was applied to a diverse range of rat tissues (kidney, lung, muscle, brain, testis), human HEK 293 cell lines, and human ovarian cancer samples, followed by SWATH-mass spectrometry (SWATH-MS). There were similar numbers of quantified proteins between ABLE-SWATH and the conventional method, with greater than 70% overlap for all sample types, except muscle (58%). The ABLE protocol offers a standardized, high-throughput, efficient, and reproducible proteomic preparation method that when coupled with SWATH-MS has the potential to accelerate proteomics analysis to achieve a clinically relevant turn-around time.