Parkinson ’ s Classification and Feature Extraction from Diffusion Tensor Images

Parkinson’s disease (PD) affects over 6.2 million people around the world. Despite its prevalence, there is still no cure, and diagnostic methods are extremely subjective, relying on observation of physical motor symptoms and response to treatment protocols. Other neurodegenerative diseases can manifest similar motor symptoms and often too much neuronal damage has occurred before motor symptoms can be observed. The goal of our study is to examine diffusion tensor images (DTI) from Parkinson’s and control patients through linear dynamical systems and tensor decomposition methods to generate features for training classification models. Diffusion tensor imaging emphasizes the spread and density of white matter in the brain. We will reduce the dimensionality of these images to allow us to focus on the key features that differentiate PD and control patients. We show through our experiments that these approaches can result in good classification accuracy (90%), and indicate this avenue of research has a promising future.


Parkinson's Disease
Parkinson's disease (PD) is one of the most common neurodegenerative disorders.The disease mainly affects the motor systems and its symptoms can include shaking, slowness of movement, and reduced fine motor skills.As of 2015 an estimated 6.2 million globally were afflicted with the disease [vos2016].Its cause is largely unknown and there are some treatments available, but no cure has yet been found.Early diagnosis of PD is a topic of keen interest to diagnosticians and researchers alike.Currently Parkinson's is diagnosed based on the presence of observable motor symptoms and change in symptoms in response to medications that target dopaminergic receptors such as Levodopa [svein-bjornsdottir2016].The problem with this approach is that it relies on treating symptoms instead of preventing them.Once motor symptoms present, at least 60% of neurons have been affected and there is little likelihood of healing them fully.Additionally early diagnosis will help reduce likelihood of misdiagnosis with other motor neuron diseases.

Parkinsons Progression Markers Initiative Datasets
The Parkinson's Progression Markers Initiative (PPMI) [marek2011] is a clinical study designed to identify PD biomarkers and contribute towards new and better treatments for the disease.The cohort consists of approximately 400 de novo, untreated PD subjects and 200 healthy subjects followed longitudinally for clinical, imaging and biospecimen biomarker assessment.The PPMI data set is a collection of biomarker data collected from a longitudinal study of Parkinson's and control subjects.They have thus far collected DaT (dopamine transporter) scan, MRI (magnetic resonance imaging), fMRI (functional magnetic resonance imaging), and CT (computerized tomography) scan data from several hundred subjects in 6 month intervals.They first began collecting data in 2010, funded by the Michael J.Fox Foundation.The dataset chosen for this paper was PPMI's Diffusion Tensor Imaging (DTI) records.DTI has been shown to be a promising avenue to explore biomarkers in Parkinsonian symptoms and can provide unique insights into brain network connectivity.Moreover, the DTI data was one of PPMI's cleanest and largest datasets and thus expected to be one of the most useful for further analysis.A DTI record is a four-dimensional dataset comprised of a time-series of a three-dimensional imaging sequence of the brain.PPMI's DTIs generally consisted of 65 time slices, each taken approximately five seconds apart.This method tracks movement of water in brain over the discrete time steps, creating a representation of the brain that emphasizes the white matter structures [soares2013].

Parkinson's Disease
A variety of tools currently exist for diagnosis of Parkinson's through pre-motor symptoms.For example Parkinson's seems to measurably affect olfactory sensitivity prior to presenting motor symptoms more than other motor neuron diseases, as illustrated by the University of Pennsylvania Smell Identification Test (UPSIT) [chaudhuri2016].While there is still more work needed to refine tests like these, it is one example that proves the feasibility of earlier diagnosis of Parkinson's disease.The PPMI holds that discovery of one or more biomarkers for PD is a critical step for developing treatments for the disease.In [chahine2016] a search was conducted of existing PD articles relating to objective biomarkers for PD and found that there are several potential candidates, including biofluids, peripheral tissue, imaging, genetics, and technology based objective motor testing.Dinov et al [dinov2016] explored both model-based and model-free approaches for PD classification and prediction, jointly processing imaging, genetic, clinical, and demographic data.They were able to develop and full data-processing pipeline enabling modeling of all the data available from PPMI, and found that model-free approaches such as support vector machines (SVM) and K-nearest-neighbor (KNN) outperformed model-based techniques like logistic regression in terms of predicted accuracy.Several of these classifiers generated specificity exceeding 96% when all data available from the dataset was aggregated and used.One interesting finding was a notable increase in accuracy when using group size rebalancing techniques to counteract the effect of cohort sample-size disparities (there are many more patients than control subjects), increasing accuracy in one SVM classifier from 75.9% to 96.3%.Researchers in [baytas2017] recognized the inherent difficulty of using time-series analysis techniques on longitudinal data collected at irregularly-spaced intervals and proposed a new Long-Short Term Memory (LSTM) technique: Time-Aware LSTM (T-LSTM).In [simuni2016] it was found that the subgroup PD classification of tremor dominant (TD) versus postural instability gait disorder dominant (PIGD) has substantial variability, especially in the early stages of diagnosis.For this reason no attempt was made in this paper to include subtype assignment, but only to learn a binary Yes/No PD classification prediction.State-of-the art Parkinson's classification results were reported by [adeli2017] in early 2017 through use of a joint kernelbased feature selection and classification framework.Unlike conventional feature selection techniques, this allowed them to select features that best benefit the classification scheme in the kernel space as opposed to the original input feature space.They analyzed MRI and SPECT data of 538 subjects from the PPMI database and obtained a diagnosis accuracy of 70.5% in MRI generated features and 95.6% in SPECT image generated features.The authors speculated that their non-linear feature selection was the reason for their outperformance of other methods on this non-linear classification problem.Other researchers, [banerjee2016] were able to achieve 98.53% using ensemble learning methods trained on T1 weighted MRI data.However Banerjee used several domain knowledge based feature extraction methods to preprocess their data including image registration, segmentation, and volumetric analysis.
The present research strikes a balance between feature selection and domain knowledge.While our autoregressive model does utilize a basic understanding of relevance of time in diffusion tensor imaging, we do not utilize any other domain specific knowledge to inform our feature extraction.Our hope is to build a generalizable approach that can be applied to other data structured similarly both within and outside the domain of biomedical image analysis.Additionally we want to improve the models being trained without domain specific knowledge on MRI data.This is because MRI is a far less invasive brain imaging method than SPECT imaging which is an X-ray based technique and must be used at a limited frequency.Additionally the multiple MRI modalities offer versatility in examining biological structures.

Tensor and Matrix Decomposition
Matrix decomposition has been used in a variety of computer vision applications in recent years including analysis of facial features.It offers another means of quantifying the features that describe the relationships between values in a 2D space and can be generalized to a variety of applications.The key being that decomposition offers a powerful means of simultaneously evaluating the relationships of values in a 2 or higher dimensional space.In higher dimensional spaces, tensor decomposition is used, where tensors are a generalization of matrices [rabanser2017].Matrix decomposition can be described as a means of separating a matrix into several component matrices whose product would result in the original matrix.For example when solving a system of equations you might approach formulate the problem as: where A is a matrix and x and b are vectors.When trying to solve this system of linear equation, we could apply a matrix decompositions operations to the matrix A, to more efficiently solve the system.By finding the products of the of x and b with the one matrix resulting from the decomposition and the inverse of the other, we can solve the system of equations with significantly fewer operations [rabanser2017].This can be generalized to machine learning applications where increased complexity of models, often result in exponential increases in number of computations.This also affects the applications of new algorithms and pipelines, Those that are too complex and consequently have too many operations become too computationally intensive to be practical to use in some cases.We can choose specific types of decompositions that also allow us to preserve unique information about original matrix while also reducing the size of the matrix.In the case of singular value decomposition we are trying to solve: Where A is the original matrix, of size m * n, U is an orthogonal matrix of size m * n, S is a diagonal matrix of size n * n, and V T is an orthogonal matrix of size n * n.This generalization of the eigendecomposition is useful in compressing matrices without losing information.It will come into play with our final experiment using linear dynamical systems to extract features from the DTIs.Extending the premise of singular value decomposition (SVD) to higher order matrices, or tensors, we come to Tucker decomposition.
Similarly to SVD, Tucker decomposition is used to compress tensors, and can be applied to any tensor of 3 or more dimensions.This is illustrated using a tensor of three dimensions in Figure 1.The resulting core tensor from the decomposition still maintains the same shape and number of dimensions, but each are scaled down to the size specified.We are thus able to use it as means to scale brain images to a set of representative features without breaking down specific regions of interest.

Methods
There are two main experiments conducted.We examine both Tucker tensor decomposition and a linear dynamical systems approach to reduce number of dimensions and scale down diffusion tensor images.The goal is to evaluate the two approaches for the quality of features extracted.To this end, the final feature vectors produced by each method is then passed on to a random forest classifier, where the accuracy of the final trained model is measured on a classification task to predict control or Parkinson's (PD) group.
The objective is to represent the original DTI as an abstracted tensor that is the product of one of the dimensionality reduction techniques used in each experiment.

Algorithm Selection
To guide our selection of a classifier, we used the python package TPOT [olson2016].TPOT uses genetic algorithms to iteratively generate, select and evaluate classification pipelines.We evaluated 10 generations of pipelines with population size 100 in each and found that Random Forest classification was most successful as predicting Parkinson's from the generated features.Given the success of random forest classifier, we considered that we might further improve our accuracy by reducing the number of features we used from the generated set.We considered that because we are focused on the differences in a relatively small specific brain regions, only a small number of features would be relevant.To test this theory, we used three different methods to reduce the dimensionality of our feature set to 20 components: linear principle component analysis (PCA), linear discriminant analysis (LDA) and kernel PCA using a radial basis function (RBF).

Experiment I
Using the tensorly package [kossaifi2019], a Tucker decomposition is applied to each brain image.This approach to tensor decomposition was selected because it will produce one core tensor that is representative but scaled down from the original diffusion tensor image.Additionally Tucker decomposition, unlike other forms of tensor decomposition is significantly better at preserving features specific to the tensor being decomposed.Because of this it has applications in compression algorithms.The Tucker decomposition method is chosen in the present study over other tensor decomposition methods to preserve features unique to each brain image it is applied to.This will allow us to scale down each image and focus features and regions of interest in each that are specific to that image.In this experiment we decompose each brain image from a dimension of (65, 100, 116, 116) to (10, 10, 10, 10) to have a continuity in number of features produced.

Experiment II
This experiment focused on breaking down the feature extraction further and evaluate another approach: linear dynamical systems.We scale down each coronal slice in the images and then evaluate the change over time.The reason for scaling down the coronal slices is to allow us to more efficiently build a transition model to represent the flow of water over the time steps of the image.This will allow us to build a three-dimensional representation of the brain from the images that will show the flow of water and the distribution of white matter in the brain.We evaluate the produced transition matrix as features to be applied to the classification pipeline.The nature of the linear dynamical systems allow us to directly model the flow of water via the net change over time in the DTI.

Experiment I
While we were able to achieve an accuracy of 94% immediately, we were not able to improve on this by further reducing the produced features with various dimensionality reduction methods.
In fact it appears that in some cases, such as linear discriminant analysis (LDA), additional dimensionality reduction adversely affects classifier performance.In exploring a slice of the output core tensor at one 'time' point, what we see suggests that the output of the tensor decomposition might be likened to a stack of sliced that focus on the regions of interest in the original image.This is validated by examining several corresponding decomposed core and original slices.

Experiment II
We were able to achieve accuracy of 82% with random forest classifier alone.This outperforms previous benchmarks in training classifiers on synthetic features derived from MR images.Compared to present results, [cole2016] achieved only 70% accuracy at best on synthetic features generated from T1 weighted MRI scans.Furthermore, based on the F-measure scores across the experiment conditions, we can reasonably say that our model is not skewed as a consequence of the uneven distribution of the data.The PPMI  , which was also addressed by rebalancing the classes by oversampling the control.We intuited that we could speed up model training and improve accuracy by reducing the number of synthetic features we retained.We initially tried linear PCA and LDA to perform the dimensionality reduction.However, these actually hurt performance, resulting in test accuracy of 81% and 74% respectively.Based on this, we considered non-linear dimensionality reduction would be more effective.To this end we used Kernel PCA with RBF kernel, which effectively improved accuracy to 89%.

Discussion
In summary we can conclude that dimensionality reduction is a useful method for extracting meaningful features from brain imaging.Furthermore the impressive performance of these features in machine learning applications indicates that at least some subset of these features strongly correlates with the patient group.
While not explored in this paper, it would be interesting to explore why LDA seemed cause a drop in classifier performance while traditional PCA did not in the tensor decomposition.Furthermore it would be interesting to explore why PCA and LDA both have caused classifier performance decreases with features produced from linear dynamical systems.Specifically it would be interesting to explore the co linearity between the class and features that affect the output features following the LDA treatment.Specifically LDA seems to be stuck producing one strong feature and ignoring the rest.
Additionally it would be interesting to explore the effect of various preprocessing methods to improve out comes and to systematically obscure the data to evaluate which features of the raw pixel data are being hi-lighted by the tensor decomposition and linear dynamical systems steps.

Fig. 2 :
Fig. 2: (left): Slice from original brain image at a specific time point; (right): Corresponding slice from tensor decomposition output

TABLE 1 :
Classification accuracy of features generated from Tucker decomposition after various additional dimensionality reduction techniques are applied

TABLE 2 :
Classification accuracy of features generated from linear dynamical systems after various additional dimensionality reduction techniques are applied data is heavily skewed toward Parkinson's individuals, with a majority of our data set coming from Parkinson's patients (421 subjects) versus controls (213 subjects)