Future Capstones
The projects described below would be appropriate for satisfying the capstone requirements in the Masters of Science in Data Science Program. If you’re interested in working on one of these projects please contact Dr. Baggett (jbaggett at uwlax.edu).
Improving Breast Ultrasound Lesion Segmentation with Generative AI
The largest hurdle to training models for segmentation (tracing an outline) of lesions present in breast ultrasound images is obtaining adequate training data. It may be possible to augment existing training data by using generative AI to draw new backgrounds around already segmented lesions. One approach to doing this is described in this Medium article. In this project you would use stable diffusion to augment segmentation training data for breast lesions in ultrasound imagery.
Using Self-Supervision to Pretrain AI Models for Lesion Detection in Breast Ultrasound Images
Getting labeled breast ultrasound images is time consuming and expensive since it requires expert radiologists to do the labeling. Self-supervision is an unsupervised approach for pretraining deep-learning models. The backbone of the trained model can then be used to build classification and segmentation models that have been shown to perform nearly as well as full-supervised models. You would test this approach using breast ultrasound images. Related approaches are described in these papers Is it Time to Replace CNNs with Transformers for Medical Images? and Robust and Efficient Medical Imaging with Self-Supervision
Robustness of Feature Extraction for Downstream Modeling
Given a breast ultrasound image that displays a lesion as well as a digital mask that shows which pixels are in the lesion we can extract quantitative and qualitative features from the image that are used in as predictors in a machine learning model for predicting malignancy. We have such a model that performs quite well, but to be useful we’ll need to automatically generate (segment) the lesion boundaries. We need to understand how the performance of the predictive model changes due to inaccuracies in the estimated boundary. To get an idea what this might look like see the paper “BIRADS Features-Oriented Semi-supervised Deep Learning for Breast Ultrasound Computer-Aided Diagnosis”. You can ask Dr. Baggett for a copy of this paper.
Explore Semi-Supervised Learning for Breast Ultrasound Deep Learning Models
For all of the images/studies in our dataset we know the pathology of the lesion (benign or malignant) because the lesion has been biopsied. However there are many other characteristics of the lesion that we’d like to predict such as whether the boundary is smooth or irregular. Since labeling these characteristics requires much time and expertise it isn’t feasible to collect the labels for the entire dataset. In this project you’ll explore using semi-supervised learning to train deep learning models when only part of the dataset is labeled. Both convolutional neural network and transformer models should be trained.
Explore Self-Supervised Learning for Breast Ultrasound Deep Learning Models
Vision transformer models require large datasets to train from scratch and transfer learning may not work well if the target data (breast ultrasound images) is very different from the data used to train the model (natural images, Imagenet). Self-supervised learning replaces the classification task of the model with an alternative task so that the core of the model learns the features of the target data. Once the model is trained on a self-supervised task, the trained model is used to initialize the classification model. The idea for this project is to follow the approach in Is it Time to Replace CNNs with Transformers for Medical Images? but to apply it to breast ultrasound images.
Refine Multi-task Model for Predicting Malignancy, BI-RADS Characteristics, and BI-RADS score
Josh Jarvey, an MSDS graduate, implemented a variation of the multitask model found in BI-RADS-Net: An Explainable Multitask Learning Approach for Cancer Diagnosis in Breast Ultrasound Images for his capstone project. The model should be refined and retrained using new data since the previous dataset was too small and incorrectly labeled. Since it’s unlikely that we’ll ever have a large dataset with complete BI-RADS labels it would be helpful to apply semi-supervised learning to training this model.