|Pathologists examine stained specimens under a microscope to diagnose a multitude of diseases. With the advancement of scanner technology in recent years, the entire tissue glass slide can now be scanned and saved as a digital image known as a whole slide image (WSI). The digitization of glass slides enables the use of digital image analysis tools to evaluate WSIs. Machine Learning (ML) and, more specifically, Deep Learning (DL) has piqued practitioners' interest because they deliver cutting-edge results without the need for feature engineering. However, the DL performance comes at a cost; training deep models necessitates massive amounts of data that must be manually labeled by domain experts. Hand-labeled datasets take time and resources to create, especially when specialized knowledge is required, as in medicine. As a result, practitioners are increasingly focusing on techniques that require less supervision. Furthermore, due to existing hardware limitations, high-dimensional digital slides impede the application of cutting-edge deep learning models. As a result, most learning methods frequently require pixel-level annotation and are best suited for simplified scenarios like working with small and manageable images.
In this thesis, two methods for representing WSIs with weakly labeled data will be proposed to address the challenges of WSI representation learning in digital pathology. First, a pipeline for learning WSI representation at low magnification is proposed. This algorithm allows for the low-cost use of deep learning methods at the slide level. A WSI is embedded into a fixed-length compact feature vector using a deep model, making it suitable for computer vision tasks such as classification and search. A multitask multi-instance learning paradigm based on Vision Transformers (ViTs) is also proposed to learn visual descriptors by learning to predict gene expressions from H&E WSIs. Not only does this approach connect the tissue morphology and transcriptomics domains, but it also generates a WSI representation that taps into the wealth of information contained in the molecular signature of the input. As a result, it is now possible to learn visual representations using rich gene-level data as the ultimate source of biological information while also providing a mechanism for translating visual knowledge to transcriptomics.
Finally, the proposed models have been trained and evaluated using renal cell cancer subtyping as a case study. TCGA, a publicly available cancer repository, is used to train the proposed models. The performance and superiority of the models are demonstrated by comparison to state-of-the-art models developed for WSI classification, search, and gene expression prediction. Lastly, the generalizability of the trained models was demonstrated by testing them on an independent (external) test cohort.