AI News, (PDF) Multimodal Neuroimaging: Basic Concepts and Classification ... artificial intelligence

Frontiers in Neuroscience

Aging of the global population results in an increasing number of people with dementia.

MCI causes cognitive impairments with a minimal impact on instrumental activities of daily life (Petersen et al., 1999, 2018).

In this work, we partitioned MCI into progressive MCI (pMCI) and stable MCI (sMCI), which are retrospective diagnostic terms based on the clinical follow-up according to the DSM-5 criteria (American Psychiatric Association, 2013).

Distinguishing between pMCI and sMCI plays an important role in the early diagnosis of dementia, which can assist clinicians in proposing effective therapeutic interventions for the disease process (Samper-González et al., 2018).

These changes could be quantified with the help of medical imaging techniques such as magnetic resonance (MR) and positron-emission tomography (PET) (Correa et al., 2009).

For instance, T1-weighted magnetic resonance image (T1-MRI) provides high-resolution information for the brain structure, making it possible to accurately measure structural metrics like thickness, volume and shape.

This work aims at distinguishing AD or potential AD patients from cognitively unimpaired (labeled as CN) subjects accurately and automatically using medical images of the hippocampal area and recent techniques in deep learning, as it facilitates a fast-preclinical diagnosis.

As for 18F-FDG-PET, Silveira and Marques (2010) proposed a boosting learning method that used a mixture of simple classifiers to perform voxel-wise feature selections.

(2013) took regional MRI volumes, PET intensities, cerebrospinal fluid (CSF) biomarkers and genetic information as features and implemented random-forest based classification.

The studies mentioned above mostly follow three basic steps in the diagnosis algorithms, namely segmentation, feature extraction and classification.

Finally, these features will be fed to the classification step so that the classifiers are able to learn useful diagnostic information and propose predictions for given test subjects.

Benefited by the rapid development of computer science and the accumulation of clinical data, deep learning has become a popular and useful method in the field of medical imaging recently.

The general applications of deep learning in medical imaging are mainly feature extraction, image classification, object detection, segmentation and registration (Litjens et al., 2017).

(2018) conducted a T1-MRI and FDG-PET based cascaded CNN, which utilized a 3D CNN to extract features and adopted another 2D CNN to combine multi-modality features for task-specific classification.

Previous studies showed a promising potential of AD diagnosis, and thus we propose to use a deep learning framework in our work to complete the feature extraction and classification steps simultaneously.

The main contributions of our work are listed below: (1) We show that segmentation of the key substructures, such as hippocampi, is not a prerequisite in CNN-based classification.

(4) We introduce a new framework to combine complementary information from multiple modalities in our proposed network, for the classification tasks of CN vs.

Among these bio markers, the shrinkage of the hippocampi is the best-established MRI biomarker to stage the progression of AD (Jack et al., 2011a), and by now the only MRI biomarker qualified for the enrichment of clinical trials (Hill et al., 2014).

The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).

The Segmented dataset, containing MR images and corresponding segmentation results, was chosen to verify the effect of the segmentation, and the Paired dataset, containing MR and PET images, to verify the effect of multi-modality images.

For the same subject, we paired the MRI with the PET with (a) closest acquisition dates, (b) within 1 year since the MRI scan, and (c) at the time of the scan with the same diagnosis as the MRI.

After that, we registered the template image to other MR images in the Paired dataset using affine-transformation and used the corresponding affine matrix to determine the center for each MR image.

The MaskedImage group is made up of MR images masked by binary labels, it considers both the original images and the segmentation results for the hippocampi as the inputs.

By comparing the classification performance using these three datasets, it can be judged whether the segmentation results have an important effect on AD diagnosis.

The group generated using the first method is called the Small Reception Field (SmallRF) group, which has the same reception field as the ROI of MR images with 1 mm isotropic spacing.

The group generated using the second method is called the Big Reception Field (BigRF) group, which has the same orientation and ROI center but has a 2 mm isotropic spacing for each dimension, thus having a larger reception field but a lower spatial resolution.

After the data processing, the datasets were randomly split into training sets, validation sets, and testing sets according to the patient IDs to ensure that all subjects of the same patient only appear in one set.

Finally, 70% of a dataset was used as the training set, 10% as the validation set, and 20% as the testing set by random sampling.

Convolutional neural network (LeCun et al., 1995) is a deep feedforward neural network composed of multi-layer artificial neurons, with excellent performance in large-scale image processing.

CNNs are trained with a back propagation algorithm while it usually consists of multiple convolutional layers, pooling layers and fully connected layers and connects to the output units through fully connected layers or other kinds of layers.

Compared to other deep feedforward networks, CNNs have fewer connections and a smaller number of parameters, due to the sharing of the convolution kernel among pixels and are therefore easier to train and more popular.

With CNNs prospering in the field of computer vision, a number of attempts have been made to improve the original network structure to achieve better accuracy.

VGG (Simonyan and Zisserman, 2014) is a neural network based on AlexNet (Krizhevsky et al., 2012) and it achieved a 7.3% error rate in the 2014 ILSVRC competition (Russakovsky et al., 2015) as one of the Top-5 winners.

Different from traditional CNNs, VGGs evaluate very deep convolutional networks for large-scale image classification, which come up with significantly more accurate CNN architectures and can achieve excellent performance even when used as a part of relatively simple pipelines.

During the training, batch normalization (Ioffe and Szegedy, 2015) was deployed in the convolutional layers and dropout (Hinton et al., 2015) was deployed in fully connected layers to avoid overfitting.

The accuracies and receiver operating characteristic (ROC) curves of the classification on the validation data were then calculated, and the model with the best accuracy was chosen to be the final classifier.

In order to determine the proper data type for network training, we designed two experiments and evaluated the classification performances of models when they were fed with different data types.

48 in voxels, and the network contains eight convolutional layers, five max-pooling layers, and three fully connected layers.

Additionally, CNN can extract useful features directly from raw images, as CNNs show a strong ability to locate key points in object detection tasks for natural images (Ren et al., 2015;

To verify the effect of segmentation, we segmented the AD and cognitively unimpaired subjects of T1-MR images with the MALP-EM algorithm (Ledig et al., 2015) and obtained the Segmented datasets, including 2861 subjects and containing both MR images and the corresponding segmentation.

The results indicate that segmentation results do contain information needed for the classification, however, it is not necessary for the classification task since CNN is able to learn useful features without labeling the voxels.

Due to the limitation of GPU RAM and its computational ability, it was difficult to consider the entire image as the network input, as our proposed network only considered a region of 96 ×

However, the ROIs of PET images varied, as studies also attached great significance to metabolic changes in cortices, e.g., temporal lobes (Mosconi et al., 2006, 2008).

Furthermore, in terms of multi-modality classification tasks, the SmallRF group might be better, because PET images in the SmallRF group were voxel-wisely aligned with paired MR images, which could help better locate the spatial features.

The information a classifier can obtain, by using a single modality, is limited, as one medical imaging method can only profile one or several aspects of AD pathological changes, which is far from being complete.

Additionally, among the three multi-modality models, the model trained with strategy B1 had the highest accuracy and sensitivity, while the model trained with strategy B2 had the highest specificity and AUC.

They used the patch-based information of a whole brain to train or test their models and they integrated the information from the two modalities by concatenating the feature maps(Liu et al., 2015).

(2015) used a manifold regularized multitask feature learning method to preserve both the relations among modalities of data and the distribution in each modality.

Thus, we constructed a deeper neural network and fed it with medical images of higher resolution, as we supposed that the hippocampal area itself can serve as a favorable reference in AD diagnosis.

Among these three types of models, the model trained using strategy B1, which means that the MR and PET images were separately input for the convolutional layers, but with their convolutional kernels shared, performed the best in the CN vs.

The interested new imaging modalities include T2-MRI (Rombouts et al., 2005), 11C-PIB-PET (Zhang et al., 2014), and other PET agents such as amyloid protein imaging (Glenner and Wong, 1984).

Also, the features extracted by CNN are hard for human beings to comprehend, while some methods like attention mechanisms (Jetley et al., 2018) are able to visualize and analyze the activation maps of the model, in which future work could be done to improve the classification performance and to discover new medical imaging biomarkers.

The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).

Since its launch more than a decade ago, the landmark public-private partnership has made major contributions to AD research, enabling the sharing of data between researchers around the world.

Interpretable classification of Alzheimer’s disease pathologies with a convolutional neural network pipeline

CNNs operate most effectively when trained on datasets exceeding tens of thousands of example images38.

Using open-source image analysis tools (see Methods), we developed an automated preprocessing procedure (Fig. 2a and Supplementary Figs. 1, 2) to normalize slide color and generate bounding boxes around all immunohistochemically (IHC)-stained objects within WSIs.

As the native resolution of a WSI is too large (typically 50,000 by 50,000 pixels at ×20 magnification) to use as the direct input for CNNs, we designed the dataset to contain uniform 256 × 256 pixel tiles centered on individual plaque candidates.

We trained an intermediate CNN to classify objects based on the Phase I dataset, then used its predictions to prioritize an additional set of 101,671 unprocessed tiles in favor of cored plaques and CAAs for manual annotation (see Supplementary Fig. 4).

At ×20 magnification, a single 256 × 256 pixel tile (128 microns) could contain more than one object, so we trained a multi-task CNNs for multi-label classification: CNNs were asked to determine the presence or absence of all morphologies in each tile.

We combined the Phase I and Phase II datasets, then randomly split the resulting 70,000 tiles (66,030 annotated and 3970 IHC-negative) into training (from 29 WSIs) and validation (from 4 WSIs) sets, while stratifying by case (i.e., WSI source) to ensure that models generalize to new cases.

The resulting CNN model trained on 61,370 example tiles achieved validation set performance of 0.983 area under the receiver operator characteristic (AUROC) (Supplementary Fig. 7a) with an area under the precision-recall curve (AUPRC) of 0.845 (Supplementary Fig. 7b).

In the first study, we randomly selected subsets of the 61,370-example training dataset, maintaining stratification by case (i.e., WSI source), and plotted model performance as a function of the number of training examples (Fig. 4c).

Each random selection was repeated five times, and a fresh model trained each time, for a total of 90 independently trained and evaluated CNN models with identical architectures.

In this cohort, diffuse plaques are densely distributed across the gray matter, whereas cored plaques are predominantly located in deeper and lower cortical layers, in accordance with known neuroanatomic distributions1.

Interestingly, this agreement-map highlights the limitations of bounding-box annotations, such that the correct cored-plaque prediction shown is nonetheless penalized by this view (red halo) for accurately predicting the rounded boundaries of the actual plaque instead of anticipating its square ground truth bounding-box.

Stepping further out to regions of 3840 microns (Fig. 6b), these maps (see Supplementary Fig. 9 for additional examples) visualize the results plotted in Fig. 4, with the complementary addition of tissue landmarks, prediction clustering, and neuroanatomic localization.

Model performance does not change based on the nearby neuroanatomic architecture in these examples, although occasional clusters of co-localized false-positive (orange) cored plaque predictions can appear (e.g., Supplementary Fig. 9).

For instance, in Fig. 7a, the activation map for cored-plaque prediction highlights a dense and compact Aβ center (yellow arrow), whereas for the diffuse task the activation map highlights the off-center diffuse object (red arrow).

Lastly Fig. 7d, which contains both a diffuse (red arrow) and a cored plaque (yellow arrow), shows cored-task activation maps localizing to the amyloid core, with broader feature activations for diffuse and CAA tasks.

Crucially, Guided Grad-CAM activation mapping may highlight certain image features as salient because they help determine that an object is not present in the image: Despite strong localized activation for cored and diffuse maps in Fig. 7c at the punctate deposit (red arrow), the CNN predicts that neither plaque is present.

When the patch occludes an important feature such as the amyloid core of a cored plaque (Fig. 7a, yellow arrow), the model fails to predict the object correctly: cored-task confidence drops to zero (blue dot on red background, yellow arrow).

Occluding less cored-task-relevant regions such as within the off-center diffuse stain (red arrow) have little effect, indicated by the solid red coloring in the cored-task’s confidence map for this area.

For example, in occlusion maps, occluding the leftmost plaque decreases diffuse-task confidence (Fig. 7d, light blue region in the diffuse-task map, red arrow), whereas the same prediction gains confidence (red region, yellow arrow) when the cored plaque’s core is occluded.

Combined with the 10 separate hold-out WSIs from Phase III, we found this 30-WSI blinded hold-out set demonstrated strong correlation between the automated and manual scoring approaches, such that CNN-based scores significantly discriminated existing semi-quantitative categories (Fig. 8b).