Building a Web Application to detect Breast Cancer in Ultrasound images
A friendly guide to easily do Object Detection on medical images
Last month was designated as the pink month, dedicated to raising awareness of breast cancer and encouraging women to prevention through medical visits. The mortality of this disease can decrease significantly if the diagnosis is done earlier and, then, a lot of lives can be saved.
The most common way to detect breast lesions is through imaging diagnosis, which can be obtained with different methods, such as magnetic resonance imaging, mammography, and breast ultrasound. As you may deduce, the accurate identification of breast cancer is very crucial.
In this context, deep learning techniques can represent a useful tool for the automatic identification and diagnosis of breast cancer. A lot of research has been done in this field and there are public image datasets available for solving this type of task.
In this article, I am going to build a web application built with Streamlit to detect breast cancer from ultrasound images, while the model is trained using Datature’s tools. Our benchmark dataset is called Breast Ultrasound Images Dataset and can be found on Kaggle.
Table of Contents:
- Part 1: Training Model on ultrasound images
- Part 2: Build Web Application with Streamlit
Requirements:
Before going further, you need to create an account on Datature’s platform. After you can enter your account and create a new project as in the screenshot below. In particular, you need to specify the Project Name and Type, which can be Object Detection or Instance Segmentation.
In this case, we select the Object Detection option since we want to find objects in an image and then assign them the correct label, benign or malignant. We are going to work on a dataset containing breast images available on Kaggle.
This dataset has three different classes: normal, benign, and malignant images. In this project, we are going to focus only on two classes, the benign and the malignant categories. Moreover, the dataset includes the mask images, also called ground truth images, together with original images for the category.
Part 1: Training Model on ultrasound images
This part is going to be split up into different parts:
- Upload and Annotate images
- Create Workflow
- Visualize Model Performance
- Download Tensorflow Artifact
- Run the predict script.py
1. Upload and Annotate images
Let’s upload the images of benign and malignant cases by clicking the Upload button.
Since the dataset contains binary masks, which are not compatible with the platform, I used a python script to derive the bounding boxes from these masks:
This file has the CSV format, in which each row corresponds to the annotation associated with each image of the dataset. Indeed The bounding box is defined by four values in pixels: x_min, y_min, width, and height. In addition to these four values, we also have the label, which can be benign or malignant.
filename,xmin,ymin,width,height,label
benign (1).png,288,134,51,26,benign
benign (10).png,24,67,209,197,benign
benign (11).png,367,132,72,38,benign
benign (12).png,343,178,78,44,benign
benign (13).png,182,107,355,198,benign
benign (14).png,511,149,111,54,benign
After, we can import the annotation with the option “CSV Four Corner — Bounding Box”, as shown in the Datature’s documentation.
We can visualize the images with the corresponding annotations in the Annotator. Isn’t it amazing? We can have a complete overview of our annotated ultrasound images.
Moreover, the Datature’s annotator allows annotating manually in case there are images without the label and the bounding mask. This feature is surely valuable when working on a computer vision project.
2. Create Workflow
It’s time to train our model! We usually need to write a lot of code lines to import the dataset, apply data augmentation techniques, and, lastly, train a deep learning model to solve an object detection task. Datature allows for simplifying this procedure by creating a Workflow, which is a visual scheme that shows all the steps of the machine learning life cycle:
The process with the different phases is very intuitive:
- Right-click Datasets →Project dataset to split the dataset into training and validation sets.
- Right-click Augmentations and select the techniques you prefer to apply transformations to images. There is also the possibility to have a preview of the resulted augmentations by clicking the button “Preview Augmentations” at the right bottom.
- Right-click Models → RetinaNet models → Retina MobileNet V2 640x640. If you press the created window’s model, you can also set up the batch size and the number of iterations. In this case, we just change the default batch size to 4.
After defining the steps of the workflow, we can press the button “Run Training”, which will show the options to train the model, such as the GPU type and the CheckPoint Strategy.
Once we have selected the options, we can click the “Start Training” button and can finally begin the fun part! Take a break, it will take between 15 and 30 minutes to train the model on the breast cancer images.
3. Visualize Model Performance
From the plot, we can observe that both the training and validation losses are slowly decreasing as soon as the iterations increase. Moreover, it’s clear that precision and recall are improving over time.
As in many projects, looking only at the metrics is not enough to really understand how a model is performing. In this amazing platform, there is a tool called Advanced Evaluation that permits comparing the ground truth with the predicted bounding box. For example, this functionality can be important to understand if the model is confusing a benign nodule as malignant, like in the screenshot below.
4. Download Tensorflow Artifact
After the training is over and we checked the final metrics, we can finally export the artifact. Just go to the artifact page, select …
button on the box of the artifact, press “Export Artifact” and select the option “Generate Tensorflow Model”. After a few seconds, you can click Download.
This functionality will allow us to make predictions on new images outside the platform using a python script. Consequently, we can also later create the web application, that will have access to the Datature’s prediction platform API, called Portal.
Run the predict script.py
As I mentioned earlier, we want to make predictions on other ultrasound images, that weren’t passed to the model during training. To do this, there are some requirements to satisfy:
- Unzip the exported artifact.
- Download
requirements.txt
andpredict.py
files from the the Datature’s GitHub Repository. - Set up a Python environment with a Python version between 3.6–3.8.
We also should have a project structure, like this:
Don’t worry! tf_xxx.ipynb
and predict_hub.py
are facultative, we just need predict.py
.
In addition to the cited list of folders and files, we need to create the input folder, which contains the test images, and an output folder that will store the final predictions.
This is the final structure of our project:
We just need to run thepredict.py
script on the terminal with following command lines:
cd model_architecture
python predict.py --input "../input" --output "../output" --width 640 --height 640 --threshold 0.7 --model "../saved_model" --label "../label_map.pbtxt"
Output:
Model loaded, took 6.812467813491821 seconds...
Predicting for ../input/malignant (90).png...
Predicting for ../input/benign (53).png...
Saving predicted images to ../output/benign (53).png...
Predicting for ../input/benign (51).png...
Saving predicted images to ../output/benign (51).png...
Predicting for ../input/malignant (71).png...
Predicting for ../input/benign (52).png...
Predicting for ../input/malignant (111).png...
Saving predicted images to ../output/malignant (111).png...
As I said before,output
folder contains the model prediction for each image contained in the input folder. This example shows an image with a benign nodule, which was correctly identified as benign by the model:
We can also visualize an image showing a malignant nodule, which was correctly predicted as malignant.
Great! We were able to make predictions on new images. The next step will be to do the same with a web application.
Part 2: Build Web Application with Streamlit
We can finally build our web application to detect Breast Cancer from ultrasound images. Streamlit is the Python library we are going to use to create the web app with a few lines of code. We’ll also include the functions used in the predict.py
script. All the code for the app is contained in app.py
script.
These lines of code summarize well the steps to build the app:
- Use
argparse
library to define default arguments, such as the path of the model and the path of the label map. - Load the trained model.
- Create the title of the app.
- Upload an image, which can be in
jpg
orpng
format. - Create a button “Detect Breast Cancer”. After it’s pressed, we visualize the prediction on the right half of the web app.
If the file is uploaded and the button is selected, we’ll visualize the predicted box on the test image if there are predicted boxes, otherwise, there is the image without the annotation.
Final thoughts:
Congratulations! You’ve successfully created a web application that harnesses the power of computer vision to detect breast cancer from ultrasound images. If you have any further interests that could benefit from computer vision models, Datature can be a wonderfully extensive and accessible tool to facilitate your goals.
Thank you for reading, and I hope that you will be inspired by this project to try out computer vision for yourself!
Check out the code in my GitHub repository.
Disclaimer: This data set is licensed under Public Domain (CC0)
Did you like my article? Become a member and get unlimited access to new data science posts every day! It’s an indirect way of supporting me without any extra cost to you. If you are already a member, subscribe to get emails whenever I publish new data science and python guides!