Earlier, scientists of the Diagnostics and Telemedicine Center already reported that they had collected the first database of CT studies of patients with laboratory-confirmed infection. It had about 50 CT scans of 20 patients. The current database is 20 times larger. It contains more than 1,000 anonymized sets of chest CT scans. The studies were collected in Moscow from March 1 to April 25, 2020 using the Unified Radiological Information Service (URIS). Diagnostic equipment of 80 Moscow healthcare institutions is connected to URIS.
Today, the database is absolutely unique and has no analogues in the world practice. For example, the dataset collected at the University of San Diego has 349 CT scans (single) of 216 patients, while the dataset collected in Moscow contains three-dimensional CT studies. The set of RAIOSS & Livon Saúde's cases contains 10 CT scans so far. There are more than 70 scans already in the constantly updated database of the Italian Radiological Society. Radiological Society of North America's collection of new coronavirus infection cases is scattered and suitable only for familiarization. The British Society of Thoracic Radiology also has a database, but it also does not contain more than a hundred studies.
The number of cases is not the only fundamental difference between the Russian database and foreign ones. All CT studies in the Moscow dataset have a special marking. This marking is made according to the classification, reflecting manifestation of pathological abnormalities of COVID-19 in the lung tissue based on the chest computed tomography. It divides the studies into five large groups: from CT-0 (normal and absence of CT signs of viral pneumonia) to CT-4 (diffuse ground glass opacities, pulmonary parenchymal involvement more than 75%.). The classification that formed the basis of the marking was published in the guidelines for radiology diagnostics of COVID-19.
According to experts of the Diagnostics and Telemedicine Center, a database with CT scans converted into the "research" NIFTI format is intended for developing artificial intelligence algorithms. Holistic marking of cases is suitable for preparing automatic patient sorting systems. The marking of localizations (those areas of interest within which artificial intelligence algorithms should detect pathology) can be used in training services created to help a radiologist, pointing out to "suspicious" places on CT scans. Marking the pathology contouring can be used for automatic quantitative assessment of lung lesions, as well as for assessing dynamics between two CT studies of a patient.
In addition, the Center's experts marked explicitly 50 studies (5% of the total array), where pixels' zones of ground glass opacities and consolidations, specific for COVID-19 are indicated on each CT slice with lung tissue abnormalities. It is the most informative type of marking of CT scan images for artificial intelligence.
"The additional advantage of this dataset is that all CT scans included there were performed in primary healthcare facilities for the adult population. Besides that, it has been posted in public domain, and thin CT slices of up to 1 mm have already been converted into NIFTI format recognized among machine learning professionals, " said Sergey Morozov, Chief regional radiology and instrumental diagnostics officer of Moscow Department of Health, CEO of Diagnostics and Telemedicine Center, Moscow.
Creation of the Russian dataset of CT scans of COVID-19 patients has become a part of the large Moscow Experiment on the use of computer vision technologies in radiology, that launched in February and will be lasting until the end of this year. All detailed information can be found on the project's website https:/