The largest CT scan database in the world with COVID-19 features has been collected in Russia for teaching AI to diagnose COVID-19.
The new dataset contains more than 1,000 anonymised sets of chest CT scans. This expands on the earlier database of CT studies of patients with laboratory-confirmed infection created by scientists at the Diagnostics and Telemedicine Centre. The data set aims to inform AI to diagnose COVID-19.
The dataset is the largest to date, and all CT studies in the dataset have a special marking made according to the classification, which reflects the manifestation of pathological abnormalities of COVID-19 in the lung tissue based on the chest computed tomography.
Developing AI algorithms for COVID-19
According to experts at the Diagnostics and Telemedicine Center, a database with CT scans converted into the ‘research’ Neuroimaging Informatics Technology Initiative (NIFTI) format is intended for developing artificial intelligence algorithms. The marking of localisations (those areas of interest within which Artificial Intelligence algorithms should detect pathology) can be used in training services created to help a radiologist, pointing to ‘suspicious’ places on CT scans.
Marking the pathology contouring can be used for automatic quantitative assessment of lung lesions, as well as for assessing dynamics between two CT studies of a patient. The centre’s experts marked 50 studies where pixels’ zones of ground glass opacities and consolidations, specific for COVID-19, are indicated on each CT slice with lung tissue abnormalities, which is the most informative type of marking of CT scan images for AI.
Sergey Morozov, Chief regional radiology and instrumental diagnostics officer of Moscow Department of Health, CEO of Diagnostics and Telemedicine Center, said: “The additional advantage of this dataset is that all CT scans included there were performed in primary healthcare facilities for the adult population. Besides that, it has been posted in public domain, and thin CT slices of up to one mm have already been converted into NIFTI format recognised among machine learning professionals.”