Transforming public patient omic data into precision oncology targets: A comprehensive pan-cancer approach

October 16, 2024

Authors: Eleonore Fox¹, Lea Meunier¹, Guillaume Appe¹, Abdelkader Behdenna¹, Lucas Hensen¹,  Akpeli Nordor¹, Solene Weill¹, Camille Marijon¹.  ¹Epigene Labs, Paris, France

Background

The shift toward precision oncology requires the identification of novel, highly specific drug targets. Publicly available transcriptomic data offer a rich resource for identifying such targets, yet they remain largely underutilized. To address this, we present a scalable, data-driven platform for pan-cancer antigen target discovery leveraging the untapped potential of public transcriptomic data, along with extensive biological and pharmaceutical knowledge.

Methods

We integrated 299 microarray datasets using our AI-augmented, human-supervised clinical data curation and transcriptomic data normalization pipeline. We then used our open-source batch effects correction tool, PyComBat, to aggregate them into 15 indication-specific cohorts. The resulting cohorts, profiling 20,347 genes, breadth with 45 curated clinical data elements, exhibit exceptional size, encompassing 15,500 tumor and healthy tissue samples, surpassing TCGA projects by 2.1 times. We also increased patient population representativity with an average of 3.2 histological subtypes included in cohorts, compared to only 1.2 in datasets taken individually.

Results

To handle cancer heterogeneity, we stratified our cohorts into patient subpopulations based on transcriptomic profiles using consensus clustering analysis, interpreted with clinical data and pathway analysis. We then used our target discovery pipeline, starting with differential gene expression analysis, followed by proteomic filters to limit anticipated cytotoxicity and focus on cell surface-bound proteins. An average of 35 and 48 relevant antigen targets were identified at the indication and cluster level, respectively. These included targets already described in the literature, e.g. CD19 in acute lymphoblastic leukemia and BCMA in multiple myeloma. Finally, we characterized the hundreds of candidate targets using bulk and single cell transcriptomic data, proteomic data, and biological knowledge to evaluate their safety, efficacy, and robustness.

Conclusion

Encompassing data integration and target identification, our platform is scalable for the use with any cancer type and antigen-targeting modality, exemplifying its potential to accelerate oncology drug discovery.