Skip to main content

Research Overview

Molecular regulations in cellular systems are central to health and disease. The Computational Systems Biology Group, led by Dr Pengyi Yang, focuses on developing computational and statistical models to reconstruct molecular networks and model their regulations in differentiation and development. To translate computational predictions to biological findings, the group also focuses on experimentally validating hypotheses generated from computational models.

Lab Head

Pengyi Yang

Pengyi Yang

Group Leader, Computational Systems Biology
Available for Student Supervision

Group Leader, Computational Systems Biology

View full bio

Team Members

Han Kim
Han Kim
PhD Student in Computational Biology
Di Xiao
Di Xiao
PhD Student Computational Biology

Research Projects


Molecular trans-regulatory networks (TRNs) comprised of cell signalling, transcriptional, translational, and (epi)genomic regulations are central to health and disease. A major initiative in our group is to integrate trans-omic datasets generated by state‐of‐the‐art mass spectrometer (MS) and next-generation sequencer (NGS) from various cell systems for reconstructing TRNs and understand how different regulatory machineries (e.g. signalling, transcription, and epigenomics) co-operate to define cell states, functions, and fates.

We have previously developed various computational methods to integrate the multi-layered trans‐omic datasets generated during naive to formative pluripotency transition in embryonic stem cells (ESCs) (Yang et al. Cell Systems, 2019). Our current research project aims to further this study by developing methods to characterise signaling cascades, transcriptional networks, and protein networks and their cross‐talks with the aim of answering the following questions:

  • How do different layers of regulations talk to each other in controlling stem cell fate?
  • Can we accurately predict stem cell differentiation trajectories based on their TRNs?
  • What are the key mechanisms of stem/progenitor cells in establishing identities and making cell fate decisions.

Single-cell based omics are becoming the next wave of development in biotechnologies. promising to revolutionise our ability to study biological systems at an unprecedented resolution. Our group is working on multiple methodological development and lab experiment projects with the goal of characterising cellular systems and diseases at the single-cell level.

On the methodology front, we have recently developed a computational method together with Prof. Jean Yang's group for multiple single-cell RNA-seq data integration (Lin et al. PNAS, 2019). Our current research project aims to extend on this work by developing a suite of data processing, cell type characterisation, and network reconstruction methods and tools for single-cell omic data. In parallel, we are planning to conduct experiments to profile single cells in ESC populations and during their differentiation to multiple cell lineages. Research findings from these projects will directly contribute to our aim in addressing the three questions raised in Theme I.

Computational and statistical methods are at the core of our research. To tackle complex biological questions by utilising heterogenous omic data generated from various biotechnology, our group is specialised in developing novel computational methods for analysing (i) MS-based proteomic and phosphoproteomic data, and (ii) NGS-based RNA-seq, ChIP-seq, and Hi-C data.

Build on our long-term success in computational methodology innovation, the group is developing various machine learning and deep learning methods with targeted application to biological questions and omic data types. Example of our recent developments include a knowledge-based unsupervised learning method for kinase identification (Yang et al. PLoS Computational Biology, 2015) and a semi-supervised learning method for kinase-substrate prediction (Yang et al. Bioinformatics, 2016) from phosphoproteomic data. Continued innovation in computational and statistical methods will be a key force of our group in answering fundamental biological questions.

Note on Publications Below

Full List is in Reverse Chronological Order

Total peer-reviewed articles: 63; First and/or corresponding author: 40/63 (63.5%)


† Co-first author

* Corresponding/Co-corresponding author

# Lead bioinformatician

IF (5-year impact factor, Thomson Reuters 2019)


Full NCBI Bibliography.

View all publications by Pengyi Yang.

Ensemble deep learning in bioinformatics.

Yue C, Geddes T, Yang J, Yang P* (2020), Nature Machine Intelligence, doi:10.1038/s42256-020-0217-y.

CiteFuse enables multi-model analysis of CITE-seq data.

Kim T, Lin Y, Geddes T, Yang J, Yang P* (2020). Bioinformatics, 36(14):4137-4143. (IF: 9.9)

Temporal ordering of omics and multiomic events inferred from time series data.

Kaur S, Vuong J, Peters T, Luu L, Yang P, Krycer J, O’Donohue S (2020). npj Systems Biology and Applications, 6:22. (IF: 4.3)

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference.

Lin Y, Cao Y, Kim H, Salim A, Speed T, Lin D, Yang P*, Yang J* (2020). Molecular Systems Biology, 16(6): e9389. (IF: 10.0)

Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning.

Kim H, Osteil P, Humphrey S, Cinghu S, Oldfield A, Patrick E, Wilkie E, Peng G, Suo S, Jothi R, Tam P, Yang P* (2020). Nucleic Acids Research, 48(4):1828-1842. (IF: 11.8)

scReClassify: post hoc cell type classification of single-cell RNA-seq data.

Kim T, Lo K, Geddes T, Kim H, Yang J, Yang P* (2019). BMC Genomics, 20:913. (IF: 4.1)

Global redox proteome and phosphoproteome analysis reveals redox switch in Akt.

Su Z†, Burchfield J†, Yang P†, Humphrey S, Yang G, Francis D, Yasmin S, Shin S, Norris D, Kearney A, Astore M, Scavuzzo J, Fisher-Wellman K, Wang Q, Parker B, Neely G, Vafaee F, Chiu J, Yeo R, Hogg P, Fazakerley D, Nguyen L, Kuyucak S, James D (2019). Nature Communications, 10:5486. (IF: 13.6)

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.

Geddes T, Kim T, Nan L, Burchfield J, Yang J, Tao D, Yang P* (2019). BMC Bioinformatics, 20:660. (IF: 3.2)

scDC: single cell differential composition analysis.

Cao Y, Lin Y, Ormerod J, Yang P, Yang J, Lo K (2019) BMC Bioinformatics, 20:721. (IF: 3.2)

Data independent acquisition proteomic analysis can discriminate between actinic keratosis, Bowen’s disease and cutaneous squamous cell carcinoma.

Azimi A, Yang P, Ali M, Howard V, Mann G, Kaufman K, Fernandez-Penas P (2019). Journal of Investigative Dermatology, 140(1):212-222. (IF: 6.9).

  • Recommended by F1000 Prime, doi: 10.3410/f.736074090.793562184
Evaluating stably expressed genes in single cells.

Lin Y, Ghazanfar S, Strbenac D, Wang A, Patrick E, Lin D, Speed T, Yang J*, Yang P* (2019). GigaScience, 8(9):giz106. (IF: 7.7)

NF-Y controls fidelity of transcription initiation at gene promoters through maintenance of the nucleosome-depleted region.

Oldfield A, Henriques T, Kumar D, Burkholder A, Cinghu S, Paulet D, Bennett B, Yang P, Scruggs B, Lavender C, Rivals E, Adelman K, Jothi R (2019), Nature Communications.

Multi-omic profiling reveals dynamics of the phased progression of pluripotency.

Yang P*, Humphrey S*, Cinghu S, Pathania R, Oldfield A, Kumar D, Perera D, Yang J, James D, Mann M, Jothi R* (2019) Cell Systems, 8(5), 427-445.

  • Recommended by F1000 Prime, doi:10.3410/f.735727099.793563652
scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.

Lin Y, Ghazanfar S, Wang K, Gagnon-Bartsch J, Lo K, Su X, Han Z, Ormerod J, Speed T, Yang P*, Yang J* (2019). Proceedings of the National Academy of Sciences of the United States of America, 116(20):9775-9784. (IF: 10.6)

QCMAP: An interactive web-tool for performance diagnosis and prediction of LC-MS Systems.

Kim T, Chen I, Parker B, Humphrey S, Crossett B, Cordwell S, Yang P*, Yang J* (2019). Proteomics, 19(13):1900068. (IF: 3.5)

A proteome- and lipidome-wide systems genetic analysis of hepatic lipid metabolism.

Parker B†, Calkin A†, Seldin M†, Keating M, Tarling E, Yang P, Moody S, Liu Y, Zerenturk E, Needham, E, Jayawardana K, Pan C, Mellet N, Weir J, Lazarus R, Lusis A, Meikle P, James D, Vallim T, Drew B (2019). Nature, 567:187-193. (IF: 46.5)

Impact of similarity metrics on single-cell RNA-seq data clustering.

Kim T, Chen I, Lin Y, Wang A, Yang J, Yang P* (2019). Briefings in Bioinformatics, 20(6):2316-2326. (IF: 7.5)

MiR-93-5p is a novel predictor of coronary in-stent restenosis.

O'Sullivan J, Neylon A, Fahey E, Yang P, McGorrian C, Blake G (2019). Heart Asia, 11(1):e011134.

An uncertainty visual analytics framework for fMRI functional connectivity.

Ridder M, Klein K, Yang J, Yang P, Lagopoulos J, Hickie I, Bennett M, Kim J (2019). Neuroinformatics, 17(2):211-223. (IF: 5.0)

AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications.

Yang P*, Ormerod J, Liu W, Ma C, Zomaya A, Yang J (2019). IEEE Transactions on Cybernetics, 49(5):1932-1943. (IF: 10.1)

Mitochondrial CoQ deficiency is a common driver of mitochondrial oxidants and insulin resistance.

Fazakerley D, Chaudhuri R, Yang P, Maghzal G, Thomas K, Krycer J, Humphrey S, Parker B, Fisher-Wellman K, Meoli C, Hoffman N, Diskin C, Burchfield J, Cowley M, Kaplan W, Modrusan Z, Kolumam G, Yang H, Chen D, Samocha-Bonet D, Greenfield J, Hoehn K, Stocker R, James D (2018). eLIFE, 7:e3211. (IF: 8.2)

Intragenic enhancers attenuate host gene expression.

Cinghu S†, Yang P†, Kosak J, Conway A, Kumar D, Oldfield A, Adelman K, Jothi R (2017). Molecular Cell, 68(1):104-117. (IF: 16.1)

– Highlighted in Nature Reviews Genetics, doi:10.1038/nrg.2017.90, 2017

– Highlighted in Nature Reviews Molecular Cell Biology, doi:10.1038/nrm.2017.111, 2017

An improved Akt reporter reveals intra- and inter-cellular heterogeneity and oscillations in signal transduction.

Norris DM†, Yang P†, Krycer JR, Fazakerley DJ, James DE, Burchfield JG (2017). Journal of Cell Science, 130:2757-2766. (IF: 4.9)

Integrative analysis identifies co-dependent gene expression regulation of BRG1 and CHD7 at distal regulatory sites in embryonic stem cells.

Yang P*, Oldfield A, Kim T, Yang A, Yang J, Ho J* (2017). Bioinformatics, 33(13):1916-1920. (IF: 9.9)

CNOT3-dependent mRNA deadenylation safeguards the pluripotent state.

Zheng X, Yang P#, Lackford B, Bennett B, Wang L, Li H, Wang Y, Miao Y, Foley J, Fargo D, Jin Y, Williams C, Jothi R, Hu G (2016). Stem Cell Reports, 7(5), 897-910. (IF: 6.6)

mTORC1 is a major regulatory node in the FGF21 signaling network in adipocytes.

Minard A, Tan S, Yang P#, Fazakerley D, Domanova W, Parker B, Humphrey S, Jothi R, Stöckli J, James D (2016). Cell Reports, 17(1):29-36. (IF: 8.8)

Phosphoproteomics data annotation using hypothesis driven kinase perturbation analysis.

Yang P, Patrick E, Humphrey SJ, Ghazanfar S, Jothi R, James DE, Yang YH (2016). Proteomics, 16(13):1868-1871. (IF: 3.5)

Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data.

Yang P*, Humphrey SJ, James DE, Yang YH, Jothi R* (2016). Bioinformatics, 32(2):252-259. (IF: 9.9)

NoisyGOA: Noisy GO annotations prediction using taxonomic and semantic similarity.

Lu C, Wang J, Zhang Z, Yang P, Yu G (2016). Computational Biology and Chemistry, 65:203-211. (IF: 1.8)

Unraveling kinase activation dynamics using kinase-substrate relationships from temporal large-scale phosphoproteomics studies.

Domanova W, Krycer J, Chaudhuri R, Yang P, Vafaee F, Fazakerley D, Humphrey S, James D, Kuncic Z, (2016). PLoS One, 11(6):e0157763. (IF: 3.2)

Knowledge-based analysis for detecting key signaling events from time-series phosphoproteomics data.

Yang P*, Zheng X, Jayaswal V, Hu G, Yang YH, Jothi R* (2015). PLoS Computational Biology, 11(8):e1004403. (IF: 5.3)

Global phosphoproteomic analysis of human skeletal muscle reveals a network of exercise-regulated kinases and AMPK substrates.

Hoffman N, Parker B, Chaudhuri R, Fisher-Wellman K, Kleinert M, Humphrey S, Yang P, Holliday M, Trefely S, Fazakerley D, Stockli J, Burchfield J, Jensen T, Jothi R, Kiens B, Wojtaszewski J, Richter E, James DE (2015). Cell Metabolism, 22(5):922-935. (IF: 24.3)

– Recommended by Faculty of 1000 Biology

DNMT1 is essential for mammary and cancer stem cell maintenance and tumorigenesis.

Pathania R, Ramachandran S, Elangovan S, Padia R, Yang P#, Cinghu S, Veeranan-Karmegam R, Fulzele S, Pei L, Chang C-S, Choi J-H, Shi H, Manicassamy S, Prasad PD, Sharma S, Ganapathy V, Jothi R, Thangaraju M (2015). Nature Communications, 6:6910. (IF: 13.6)

– Recommended by Faculty of 1000 Biology

Histone-fold domain protein NF-Y promotes chromatin accessibility for cell type-specific master transcription factors.

Oldfield AJ†, Yang P†, Conway AE, Cinghu S, Freudenberg JM, Yellaboina S, Jothi R (2014). Molecular Cell, 55(5):708-722. (IF: 16.1)

Direction pathway analysis of large-scale proteomics data reveals novel features of the insulin action pathway.

Yang P†, Patrick E†, Tan SX, Fazakerley DJ, Burchfield J, Gribben C, Prior MJ, James DE, Yang YH* (2014). Bioinformatics, 30(6):808-814. (IF: 9.9)

ISL1 regulates PPARγ activation and early adipogenesis via BMP4-dependent and independent mechanisms.

Ma X, Yang P#, Kaplan WH, Lee BH, Wu LE, Yang YH, Yasunaga M, Sato K, Chisholm DJ, James DE (2014). Molecular and Cellular Biology, 34(19):3607-3617. (IF: 4.0)

Fip1 regulates mRNA alternative polyadenylation to promote stem cell self-renewal.

Lackford B, Yao C, Charles GM, Weng L, Zheng X, Choi E, Xie X, Wan J, Xing Y, Freudenberg JM, Yang P, Jothi R, Hu G, Shi Y (2014). EMBO Journal, 33(8):878-889. (IF: 10.4)

Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications.

Yang P*, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014). IEEE Transactions on Cybernetics, 44(3):445-455. (IF: 10.1)

Dynamic adipocyte phosphoproteome reveals Akt directly regulates mTORC2.

Humphrey SJ, Yang G, Yang P#, Fazakerley DJ, Stockli J, Yang YH, James DE (2013). Cell Metabolism, 17(6):1009-1020. (IF: 24.3)

Re-Fraction: a machine learning approach for deterministic identification of protein homologs and splice variants in large-scale MS-based proteomics.

Yang P, Humphrey SJ, Fazakerley DJ, Prior MJ, Yang G, James DE, Yang YH* (2012). Journal of Proteome Research, 11(5):3035-3045. (IF: 3.9)

Improving X!Tandem on peptide identification from mass spectrometry by self-boosted Percolator.

Yang P*, Ma J, Wang P, Zhu Y, Zhou BB, Yang YH* (2012). IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(5):1273-1280. (IF: 2.7)

OCAP: an open comprehensive analysis pipeline for iTRAQ.

Wang P, Yang P, Yang YH (2012). Bioinformatics, 28(10):1404-1405. (IF: 9.9)

Gene-gene interaction filtering with ensemble of filters.

Yang P†,*, Ho JWK†, Yang YH, Zhou BB* (2011). BMC Bioinformatics, 12:S10. (IF: 3.2)

A genetic ensemble approach for gene-gene interaction identification.

Yang P*, Ho JWK, Zomaya AY, Zhou BB* (2010). BMC Bioinformatics, 11:524. (IF: 3.2)

A dynamic wavelet-based algorithm for pre-processing mass spectrometry data.

Wang P, Yang P, Arthur J, Yang YH (2010). Bioinformatics, 26(18):2242-2249. (IF: 9.9)

Hierarchical kernel mixture models for the prediction of AIDS disease progression using HIV structural gp120 profiles.

Yoo PD, Ho YS, Ng J, Charleston M, Saksena NK, Yang P, Zomaya AY (2010). BMC Genomics, 11:S4. (IF: 4.1)

A review of ensemble methods in bioinformatics.

Yang P*, Yang YH, Zhou BB, Zomaya AY (2010). Current Bioinformatics, 5(4):296-308. (IF: 1.2)

A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data.

Yang P*, Zhang Z, Zhou BB, Zomaya AY (2010). Neurocomputing, 73:2317-2331. (IF: 4.0)

A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data.

Yang P*, Zhou BB, Zhang Z, Zomaya AY (2010). BMC Bioinformatics, 11:S5. (IF: 3.2)

A particle swarm based hybrid system for imbalanced medical data sampling.

Yang P*, Xu L, Zhou BB, Zhang Z, Zomaya AY (2009). BMC Genomics, 10:S34. (IF: 4.1)

An embedded two-layer feature selection approach for microarray data analysis.

Yang P*, Zhang Z* (2009). IEEE Intelligent Informatics Bulletin, 10:24-32.

An agent-based hybrid system for microarray data analysis.

Zhang Z, Yang P, Wu X, Zhang C (2009). IEEE Intelligent Systems, 24(5):53-63. (IF: 4.0)

An ensemble of classifiers with genetic algorithm based feature selection.

Zhang Z*, Yang P* (2008). IEEE Intelligent Informatics Bulletin, 9:18-24.

Tang T, Wu H, Bao W, Yang P, Yuan D, Zhou B (2020) New parallel algorithms for all pairwise computation on large HPC clusters. Proceeding of the 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). IEEE, 196-201.

Yang P, Liu W, Yang J (2017) Positive unlabeled learning via wrapper-based adaptive sampling. Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI). 3273-3279.

Yang P, Liu W, Zhou BB, Chawla S, Zomaya AY (2013) Ensemble-based wrapper methods for feature selection and class imbalance learning. Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Lecture Notes in Artificial Intelligence 7818, Springer, 544-555.

Yang P, Zhang Z, Zhou BB, Zomaya AY (2011) Sample subsets optimization for classifying imbalanced biological data. Proceedings of the 15th Pacific- Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Lecture Notes in Artificial Intelligence 6635, Springer, 333-344.

Li L, Yang P, Qu L, Zhang Z, Cheng P (2010) Genetic algorithm-based multi-objective optimisation for QoS-aware web services composition. Proceedings of the 4th International Conference on Knowledge Science, Engineering and Management (KSEM). Lecture Notes in Artificial Intelligence 6291, Springer, 549-554.

Yang P, Tao L, Xu L, Zhang Z (2009) Multiagent framework for bio-data mining. Proceedings of the Fourth International Conference on Rough Set and Knowledge Technology (RSKT). Lecture Notes in Computer Science 5589, Springer, 200-207.

Yang P, Zhang Z (2008) A clustering based hybrid system for mass spectrometry data analysis. Proceedings of Pattern Recognition in Bioinformatics (PRIB). Lecture Notes in Bioinformatics 5265, Springer, 98-109.

Yang P, Zhang Z (2008) A hybrid approach to selecting susceptible single nucleotide polymorphisms for complex disease analysis. Proceedings of BioMedical and Engineering Informatics (BMEI). IEEE, 214-218.

Yang P, Zhang Z (2007) Hybrid methods to select informative gene sets in microarray data classification. Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI). Lecture Notes in Artificial Intelligence 4830, Springer, 811-815.

Yang P, Yang YH, Zhou BB, Zomaya AY (2013) Stability of feature selection algorithms and ensemble feature selection methods in bioinformatics. In Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, Wiley, New Jersey, USA, 333-352.