Integration of single-cell RNA-Seq and CyTOF data characterises heterogeneity of rare cell subpopulations
Repapi E., Agarwal D., Napolitani G., Sims D., Taylor S.
Background: The simultaneous measurement of cellular proteins and transcriptomes of single cell data has become an exciting new possibility with the advent of highly multiplexed multi-omics methodologies. However, mass cytometry (CyTOF) is a well-established, affordable technique for the analysis of proteomic data, which is well suited for the discovery and characterisation of very rare subpopulations of cells with a wealth of publicly available datasets. Methods: We present and evaluate the multimodal integration of single cell RNA-Seq and CyTOF datasets coming from both matched and unmatched samples, using two publicly available datasets. Results: We demonstrate that the integration of well annotated CyTOF data with single cell RNA sequencing can aid in the identification and annotation of cell populations with high accuracy. Furthermore, we show that the integration can provide imputed measurements of protein markers which are comparable to the current gold standard of antibody derived tags (ADT) from CITE-Seq for both matched and unmatched datasets. Using this methodology, we identify and transcriptionally characterise a rare subpopulation of CD11c positive B cells in high resolution using publicly available data and we unravel its heterogeneity in a single cell setting without the need to sort the cells in advance, in a manner which had not been previously possible. Conclusions: This approach provides the framework for using available proteomic and transcriptomic datasets in a unified and unbiased fashion to assist ongoing and future studies of cellular characterisation and biomarker identification.