In a collaboration between researchers at the INESC-ID, Instituto Superior Técnico - University of Lisbon, Portugal, and ProCan® at Children’s Medical Research Institute in Sydney, a new artificial intelligence-based method has been developed to significantly improve our understanding of how cancer cells behave.
Published in the prestigious scientific journal, Nature Communications, a deep learning method, known as Multi-Omic Synthetic Augmentation (MOSA), is able to extract valuable additional information from large sets of data that have been painstakingly collected by laboratory scientists about cancer cells.
Omics refers to any type of big data about biological systems. For example, genomics refers to data about DNA and proteomics to data about proteins. Multi-omics refers to collections of two or more sets of omic data, from which many new insights can be obtained using advanced computational methods.
MOSA is designed to deal with a common problem in multi-omic databases called sparsity (gaps in the available data). In the example studied in the Nature Communications paper, because of the complexities of collecting ‘omic data, a set of cancer cells had results for anywhere between two and seven types of ‘omic data, which meant that the data set was incomplete.
The ProCan researchers involved in the collaboration included lead author Dr Zhaoxiang (Simon) Cai and joint senior author Associate Professor Qing Zhong. Senior author, Dr Emanuel Gonçalves, is an Assistant Professor at Instituto Superior Técnico, University of Lisbon. Scientists from the Wellcome Sanger Institute, Cambridge, UK, also contributed to the study.
Dr Cai explained that their MOSA method was able to artificially synthesise data to fill gaps in the multi-omic data for more than 1500 cancer cell lines, representing a wide range of cancer types, expanding the total dataset by 32.7% at a much lower cost than would be required to perform the laboratory tests. The advantage of using the resulting combination of real and synthetic data is that it is often superior to the real data alone for training machine learning models.
“We showed that the augmented data resulted in increased accuracy in predicting how cancer cells would respond to anticancer treatments and provided more opportunities to discover new potential drug targets”, said Dr Cai.
Professor Roger Reddel, who is a study co-author and a co-director of ProCan, said “This is a significant step towards ProCan’s goal of being able to predict what treatment any individual patient’s cancer will respond to, so we can assist cancer clinicians choose the best available treatment for each of their patients.”