Developing a preliminary diagnosis model that can potentially fight the pandemic
Left: COVID-19 positive x-ray. Right: Streptococcal Infection. ( Both are licensed as CC-NC-SA). Both images exhibit pneumonia. What can tell them apart? Images were collected as part of an open-source dataset being built.
On March 11, four days before this article was written, the World Health Organization (W.H.O.) declared Coronavirus disease 2019 (NCOV-19) a pandemic characterized by the rapid and global spread of the novel coronavirus around the world. As governments scramble to close borders, implement contact tracing, and increase awareness of personal hygiene in an effort to contain the spread of the virus, the spread of the virus is still unfortunately expected to increase until a vaccine can be developed and deployed owing to different standards of implementing these policies for each country.
As actual daily cases are expected to increase throughout the world, one significant factor that limits diagnosis is the duration of pathology tests for the virus, which are carried out in laboratories usually in city centers that demand time-consuming precision. This causes significant problems, chiefly the fact that individuals who are carriers cannot be isolated earlier, and thus they are able to infect more people during that critical period of unrestricted movement. Another problem would be the costly large-scale implementation of the current diagnosis procedure. Arguably, the most vulnerable are people in remote areas within developing countries who generally have inferior healthcare and access to diagnosis. Having a single infection may be detrimental for these communities, and having access to diagnosis will at least give them a fighting chance against the virus.
The difference of urban and rural areas is not just the number of people, it’s also the the access to healthcare resources. At this time of pandemic, this lack of access may be deadly. (Left: gaobo via Flickr. Right: danboarder via Flickr) (Both are licensed as CC BY-NC-ND 2.0)
A new study by Wang, et. al. shows the promise of using Deep Learning to scan for COVID-19 in Computerized Tomography (CT) scans, and it has been recommended as a practical component of the pre-existing diagnosis system. The study used transfer learning with an Inception Convolutional Neural Network (CNN) on 1,119 CT scans. The internal and external validation accuracy of the model was recorded at 89.5% and 79.3%, respectively. The main goal is to allow the model to extract Radiological features present in COVID-19.
While the study achieved stunning accuracy on their model, I decided to train and implement a model using a different architecture in the hopes of improving accuracy. I decided to use Chest Radiograph images (CXR) over CT scans for two reasons:
Getting CXRs are more accessible for people than getting CT scans especially in rural and isolated areas. There will also be more potential data available.
In the event radiologists and medical professionals become incapacitated from containing the virus (e.x. if they fall sick themselves), A.I. systems are essential to continue administering diagnosis.
The main obstacle for using CXRs, compared to CT scans as diagnosis sources, is the lack of details that can be visually verified coming from COVID-19. It is harder to see COVID-19 symptoms, such as pulmonary nodules, that can easily be seen in a CT scan. I want to test whether a model with enough layers can detect features in lower quality, but more practical images. My model is thus a proof-of-concept on whether a Resnet CNN model can effectively detect COVID-19 using relatively inexpensive CXRs.
COVID-19 lung scan datasets are currently limited, but the best dataset I have found, which I used for this project, is from the COVID-19 open-source dataset. It consists of scrapped COVID-19 images from publicly available research, as well as lung images with different pneumonia-causing diseases such as SARS, Streptococcus, and Pneumocystis. If you have any proper images of scans that the repository can accept, along with their citations and metadata, please contribute to building the dataset to improve AI systems that will rely on it.
I only trained my model on looking at Posteroanterior views (PA) of CXRs, which are the most common types of x-ray scans. I used transfer-learning on a Resnet 50 CNN model (my loss exploded after a few epochs on a Resnet 34), and I used a total of 339 images for training and validation. All of the implementation was done using fastai and Pytorch.
Expected Caveat: the data is heavily skewed by the lack of COVID-19 images. Taken after data was randomly split by 25%. Test set was set at 78 from the start. Image by author.
The data is heavily skewed given the lack of available public data at the time of writing (35 images for COVID-19, 226 images for Non-COVID-19, which includes images of normal and sick lungs). I decided to group all the Non-COVID-19 images together because I only had sparse images for the different diseases. I proceeded to increase the size of x-ray scans labelled “Other” using x-ray images of healthy lungs from this Kaggle dataset* before splitting the data randomly by 25%. Training set consisted of 196 images, Validation set consisted of 65 images, and the Test set consisted of 78 images (which consisted entirely of the extra dataset). The external dataset was verified not to contain any repeating images from the open-source dataset. All images were resized to 512 x 512 pixels because it performed better than 1024 x 1024 pixel images from my experience building a different classifier.
Left: First iteration of training with 2 epochs. Training loss is way higher than validation loss, a clear sign of underfitting. Right: 2nd iteration of training. All columns are the same as the one on the left. I ran it for 30 epochs. Image by author.
Running the first few epochs using fastai’s implementation of the fit_one_cycle policy showed that underfitting was a problem from the start. I decided to increase the number of epochs and getting a range for the learning rate. For the rest of the training, I pursued an aggressive strategy of training with a high number of epochs, re-adjusting the learning rate using fastai’s lrfinder, and then keep training with a higher number of epochs until training loss drops comparable to validation loss.
At last, after dozens of iterations, I managed to drop the training loss in similar levels to the validation loss while maintaining high accuracy. The final internal validation accuracy of the model is recorded at 93.8%.
LEFT: The model ran on the validation set well with a few False Positives (FP) and False Negative (FN); RIGHT: The model ran on the test set with zero FNs but three FPs. Skewed distribution really affects the outcome for True positives and needs more data for verification. Columns represent the same labels as the left Confusion matrix. Image by Author.
The model performed on the validation set really well as shown by the confusion matrix above, with only two cases each for False Positives and False Negatives. Moving on to the test data, out of 78 images, 75 were correctly predicted, which is an external validation accuracy of 96.2%. While we got three false positives in our test data, the model’s prediction was close to 50% for all of these cases. This suggests it was confusing some features of the x-ray images, but it wasn’t very decisive in concluding it was positive for COVID-19. Yet the most surprising thing came when it predicted the correct positive cases. The model never trained on these images, and yet still managed to predict with 99% certainty that the lungs in the images are positive for COVID-19.
This is an unprecedented accuracy for a deep learning model in classifying COVID-19 scans, but this is merely a preliminary experiment with an obvious lack of x-ray data, and it has not been validated by external health organizations or professionals. I will tune the model against false negatives in the future, although I am more concerned for false positives, which the scarcity of data currently prevents from further tuning.
To this end, I am making my code, the model, and the results available and accessible for researchers to use. I plan on updating the model by testing it on a more balanced dataset once more data becomes publicly accessible. I also plan to implement a GRAD-CAM to visually see which features the model concentrates on. I would love my model to be externally validated and usable beyond being a pet project, especially in this very urgent time. Feel free to reach me for questions or if you can help me connect with an organization or professional that can potentially benefit from AND IMPROVE the model.
*I realized after training that a large portion of the training and test images are from pediatric patients,which may affect the model’s performance on adult scans. However, its performance still needs to be verified with more data. The model can also be improved with future training.
*Note: I should have done upsampling on the positive cases to address the scarcity of these images, and I should have included more images of both bacterial and viral Pneumonia to make it more robust. Please do this if you will try to improve the model.
*Since investigating radiological markers for COVID-19 is an active area of research with still much to uncover, I caution from interpreting the results that the model has identified the right features unique to COVID-19.