1 Introduction

Parkinson's disease is a chronic progressive neurodegenerative disorder that results in disability in the performance of daily activities [1]. The disease is primarily characterized by motor symptoms such as tremors, stiffness, slowness of movement (bradykinesia), muscle rigidity, and postural instability [2,3,4]. Also, patients with this disease have nonmotor symptoms that negatively impact disability levels [5]. The level of disease severity in Parkinson's disease is associated with disability, with loss of independent function reported between specific scores on the Unified PD Rating Scale (UPDRS) [6]. Unfortunately, this disease has no cure, but medications and surgical treatments can significantly alleviate these symptoms, enhancing the quality of life for individuals suffering from this disease [7]. According to the World Health Organisation, Parkinson’s disease can lead to physical disability, highlighting the importance of early intervention and rehabilitation to mitigate disability and improve long-term outcomes.

Artificial Intelligence (AI) has become increasingly pervasive across diverse domains, revolutionizing industries through its application in various fields such as finance, data analysis, smart agriculture, natural language processing, sentiment analysis [8, 9], medical field [10], healthcare [11] and more. Traditionally, disease diagnosis has relied on the experience of doctors. But recently, the techniques of AI and Machine Learning (ML) algorithms have played a critical role in the medical field [10] and decision support systems that can provide valuable insights and recommendations to healthcare professionals due to tremendous advancements in computational power and hardware. Furthermore, these technologies have brought a revolution in treatment strategies and delivering advanced medical diagnostics for many diseases such as cardiovascular diagnosis [12, 13], breast cancer [14, 15], diabetes disease [16,17,18,19], brain tumours [20, 21], liver illnesses [22], COVID-19 [23, 24], skin disease classification [25], Alzheimer, Parkinson's disease [26], etc.

Based on the differentiation and contrast between healthy people and Parkinson's disease, AI techniques and ML algorithms are used to predict this disease by leveraging data analysis, pattern recognition and predictive modelling to identify early signs, monitor progression and develop treatment approaches. Pen-pressure variation and drawing speed are signs of hand tremors produced by patients' inability to control their nervous system [27,28,29]. Accordingly, this paper proposes a hybrid classifier for early prediction of Parkinson's disease via hand tremors based on drawing spirals and waveforms.

The spiral-wave dataset used in this work is relatively small. Convolutional Neural Networks (CNNs) typically require large amounts of data for practical training. When data augmentation is employed, a CNN model may overfit the training dataset, resulting in high accuracy on the training set but poor performance on unseen data. This overfitting occurs because the model might learn the noise or outliers in the training dataset. Several studies in the literature have proposed classifiers to predict Parkinson's disease through drawing (or sketching). These classifiers are often based on transfer learning (TL) and fine-tuning of pre-trained models such as VGG16, VGG19 and ResNet50. Additionally, some studies have proposed ML classifiers for predicting Parkinson's disease using features extracted manually from drawing images or through separate approaches. Thus, this work aims to automate feature extraction based on convolution layers, reduce the computational complexity and increase the classification speed by reducing the trainable parameters (e.g., VGG16 has a total of 138 million parameters) and enhance the performance by combining deep learning techniques and ML algorithms.

In this work, we have developed, trained and evaluated two Convolutional Neural Network (CNN) models to predict PD using a spiral and wave dataset. Using the transfer learning (TL) approach, we have frozen their Convolutional layers and replaced the last layers (the fully connected artificial neural network) with K-Nearest Neighbours (KNN) algorithms to exploit the advantages of both DL and ML in order to build strong hybrid CNN-KNN models for predicting PD, reduce the computational complexity and increase the speed of classification. As a result of training and evaluation, these CNN-KNN architectures provided high speed while prediction and more accuracy than the first two models. Based on the last two models (Spiral-CNN-KNN and Wave-CNN-KNN), we proposed an ensemble voting classifier to predict Parkinson's disease according to the average of the probabilities of the spiral and wave drawings. We can summarize the contributions of this work as.

  • Provide insight into the possibility of diagnosing Parkinson's disease using hand sketching based on tremors in the early stages that is crucial to enhance the quality of life for individuals suffering from this disease.

  • Present a summary of the available databases to train models for disease diagnosis that support decision-makers in predicting Parkinson's disease in its early stages for early intervention and rehabilitation to mitigate disability and improve long-term outcomes.

  • Provide the best adjustment of data augmentation parameters that increase the limited training dataset's size and diversity to improve the generalization and robustness of a model and avoid the deformation of data, which reflects poor performance.

  • Build an ensemble classifier for predicting Parkinson's disease based on a hybrid architecture that combines DL and ML using TL to enhance the prediction.

  • Clarify the seriousness of misclassification in the medical field and propose solutions to address these critical problems that may cause death, primarily when the model classifies a person as healthy while he/she carrying the disease.

This paper is organized and formatted in the following manner: Sect. 2 provides an overview of the state of the art concerning the available approaches for diagnosing Parkinson's disease and the related works. Then, Sect. 3 details the materials and methodology that have been employed to develop and evaluate the proposed classifier. The results of the development and evaluation processes and the findings are presented in Sect. 4. Finally, Sect. 5 concludes this paper with the research outcomes and essential findings discussed throughout the previous sections of the paper.

2 Related work

Diagnosing Parkinson's disease often involves a multi-faceted approach, which includes conducting a thorough physical examination, evaluating neurological symptoms, and utilizing imaging techniques like MRI or CT scans to identify possible abnormalities in the brain. Sometimes, healthcare professionals may also employ dopamine transporter imaging and genetic testing as supplementary diagnostic tools. Laboratory tests may be conducted to eliminate the possibility of other conditions presenting similar symptoms. Typically, the diagnosis of Parkinson's disease is established by a specialist, such as a neurologist, and this process may entail multiple appointments and tests to validate the diagnosis and establish an optimal treatment plan.

Recently, techniques of Artificial intelligence and algorithms of Machine Learning have played a primary role in the medical diagnostics field to improve the diagnosis of many diseases, especially in Parkinson's disease diagnosis via some signs and symptoms such as speech disorder [30,31,32], hand-writing disorders [33], EEG signals [34, 35], hand tremor [36], nocturnal breathing signals [37], detection of PD through smell signatures [38], early diagnosing using brain scans MRI [39, 40], Urine biomarkers discovery by metabolomics and machine learning [41], sketching of the spiral and wave, etc.

Patients with Parkinson's disease (PD) face challenges in executing motor-based tasks, such as writing and drawing, due to alterations in the functionality of neuronal mechanisms responsible for controlling body movement (or control of body limb movements). Therefore, this symptom attracted the attention of the scientific research community to build different datasets based on handwriting and drawing in order to find patterns and discover contrasts between PD and healthy people and to use these variances for early predicting PD. PaHaW (Stands for Parkinson's Disease Handwriting) is a composite handwriting and drawing dataset collected from 75 persons (PD/Health: 37/38) (19 males and 18 females / 20 males and 18 females) respectively. The dataset acquisition was a collaborative effort involving St. Anne's University Hospital in Brno, Czech Republic, and the Movement Disorders Centre at the First Department of Neurology, Masaryk University [42, 43]. Moreover, HandPD is a dataset of spiral and meander sketching. It was collected from (PD/healthy: 74/18) (59 males and 15 females / 6 males and 12 females) at Botucatu Medical School, São Paulo State University—Brazil. This dataset has 368 composite samples of Spiral and Meander collected from these volunteers by asking every person to repeat the drawing four times. Also, NewHandPD is a spiral, meander, circle and signal dataset. It is an upgraded version of the last dataset and collected from (PD/healthy: 31/35) (21 males and 10 females / 18 males and 17 females) by asking every person to repeat the drawing of spiral, meander and circle four times, whereas, the signals (handwritten dynamics) have been gathered using the smart pen that has been used for sketching. Both HandPD and NewHandPD datasets are available online.Footnote 1

Moreover, Spiral and WaveFootnote 2 is a composite spiral and wave dataset comprising 102 samples (PD/healthy: 51/51). The participants have been recruited to draw Archimedean spiral and Sinusoidal wave [44]. There are many other datasets concerning Parkinson's disease that we did not mention, such as Cube-Triangle, handwriting, acoustic [45], accelerometer, gyroscope, etc.

The study [46] aimed to differentiate individuals at different stages of Parkinson's disease by analyzing speed and pen pressure in sketches. They recruited and assessed 55 volunteers (27 patients have PD, and 28 persons are Healthy) to draw a spiral on an A3 sheet. They extracted features from the sketches and established a correlation factor with Parkinson's disease severity. The study validated the methods using the Mann–Whitney test, revealing a significant difference in the correlation factor across Parkinson's disease stages.

The authors in the study [47] developed two CNN Architectures for classifying Parkinson's disease using two separate spiral and wave sketches datasets. Then, they built an ensemble voting classifier based on two sub-classifiers, Random Forest (RF) and Logistic Regression (LR) and trained this ensemble classifier using the prediction probabilities of the two CNN Architectures. The model provided an accuracy of 93.3%, a sensitivity of 94% and a precision equal to 93.5%.

The paper [48] aimed to compare three different approaches for predicting Parkinson's disease via hand tremor using two different datasets, which are Spiral-Wave and Cube-Triangle. In the first approach, they trained pre-trained models, Inception-ResNet-v2, Xception, Inception-v3, MobileNet-v2, ResNet50 and VGG19, to train them from scratch with random weights. In the second, they used Transfer Learning (TL) with fine-tuning using the same datasets and pre-trained models. In the last, two shallow CNNs have been built, trained and evaluated. They found that Transfer Learning (TL) with fine-tuning provided the best 91.6% and 100% accuracy using first and second datasets, respectively.

In paper [49], they used the transfer learning (TL) approach to fine-tune the pre-trained model (VGG-19) and adapted it for predicting Parkinson's disease using the spiral-wave dataset. The training and evaluation of the model have been done by two cross-validation groups, which are fourfold and tenfold. The model provided accuracy equal to 86.5% and 87.3% with the fourfold cross-validation of the spiral and wave, while it provided an accuracy of 88.5% concerning the spiral set and 88% concerning the wave set when tenfold cross-validation was used.

Also, Drotar published many studies to predict Parkinson's disease using different techniques and classifiers [42, 43, 50,51,52,53]. Pereira also focuses on Analysing handwriting movements and extracting the features for predicting Parkinson's disease using different deep-learning models and machine-learning algorithms [54,55,56,57].

The study [58] introduces a system for PD diagnosis based on pre-trained CNN models, transfer learning, and Bilinear Pooling. The study uses CNN architectures like Efficient-Net B0, Mobile-Net V2, and a custom CNN model, initially trained on ImageNet and adapted with TL. These are combined with bilinear pooling, forming three Bilinear CNN models applied to DaTSCAN images of PD. Using 2720 images from the Parkinson’s Progression Marker Initiative (PPMI) dataset, the Bilinear CNN EfficientNet-B0-MobileNet-V2 model achieved the highest accuracy of 98.47% compared to other methods.

The study [44] sought to enhance the objective assessment of tremors in Parkinson's disease by incorporating and evaluating the histograms of oriented gradients (HOG) in the analysis of sinusoidal and spiral handwriting patterns. To automate detecting tremors in participants of Parkinson's disease, the authors employed the HOG descriptor as a features extractor from spiral and wave drawing to feed four classifiers: a support vector machine, K-Nearest Neighbor (KNN), random forest and one-dimensional CNN. This latter reached an accuracy of 83.1%, and this accuracy is the best compared with the other classifiers used in the study.

The scientific research community seeks to find robust models to early predict Parkinson's disease using different datasets, techniques and classifiers [59, 60] to enhance the lifestyle of the people with illness. In this section, we provided some related work and datasets that reflect this field's state of the art. Table 1 compares several studies to highlight the techniques and approaches mentioned in this section.

Table 1 Comparison between the classifiers, approaches of related works in literature

3 Materials and methods

3.1 Disability levels and stages of Parkinson’s disease

Parkinson's disease is a progressive neurological disorder that impacts daily life and can lead to disability [1] in its advanced stages. According to the Parkinson's Foundation, there are some stages of progression of Parkinson's disease, which can be summarised into five stages.

In stage 1, individuals exhibit mild symptoms that predominantly affect one side of the body. These symptoms can include tremors, changes in posture, walking difficulties, and alterations in facial expressions. As they progress to stage 2, these symptoms intensify, impacting both sides of the body or the midline, manifesting as challenges in walking and maintaining proper posture. However, living independently remains feasible at this stage, although everyday tasks become increasingly cumbersome.

By stage 3, considered the mid-stage, the hallmark becomes a notable loss of balance, especially during turns or when pushed. Falls become more frequent, and while motor symptoms continue to escalate, the individual can still maintain an independent lifestyle, albeit with growing restrictions in their daily activities. Disabilities are mild to moderately pronounced. In stage 4, symptoms reach a fully developed state and become severely incapacitating. Individuals can still stand and walk without aid but may rely on devices like canes or walkers for safety. They require considerable assistance for daily tasks, making independent living unviable. Finally, in stage 5, the most debilitating phase, stiffness in the legs can render standing or walking impossible. Individuals become either bedridden or confined to a wheelchair and need external assistance. Constant care becomes imperative for all daily activities.

3.2 Proposed classifier global overview

Unfortunately, Parkinson's disease has no cure, but medications and surgical treatments can significantly alleviate these symptoms [7]. Therefore, early diagnosis is crucial to enhance the quality of life for individuals suffering from this disease. One of the symptoms of Parkinson's disease in its early stages is hand tremors, and for early diagnosis of this disease, we used a Composite Index of Speed and pen pressure using the Spiral-Wave dataset. In this work, we proposed an ensemble voting classifier based on hybrid CNN-KNN architecture to predict Parkinson’s disease via Spiral-Wave drawing.

As shown in Fig. 1, the flowchart represents several stages and different techniques that have been used to reach the target. The first step in the developing process is obtaining, processing and splitting the dataset. Next, we augmented this dataset to train the model and evaluate its performance. Then, hyperparameter tuning has been used to obtain the best architecture for the model, which helps it to learn effectively on the training set and provide the optimal performance on an unseen dataset.

Fig. 1
figure 1

Flowchart of developing process of a proposed classifier to predict PD via hand tremor

3.3 Dataset description

This study's Spiral and Wave dataset is available on the Kaggle repository. It has 102 composite samples (102 spiral images and 102 wave images) of two classes of Parkinson's and healthy (PD/Healthy: 51/51) of age in years’ average (average ± SD) (PD/Healthy: 67.65 ± 9.10 / 67.05 ± 8.40). According to the study [44], the origin of this dataset was created by Paulo Folador and his partners at the Federal University of Uberlândia, Brazil. Figure 2 represents briefly samples of the Spiral-Wave drawings of two classes which are Parkinson's and Healthy.

Fig. 2
figure 2

Samples from spiral and wave sets of the two classes

The creator of this data was inspired by the study [46], which found that it was possible to detect Parkinson’s disease by asking the patient to draw a spiral and then track the Speed of drawing and the Pen pressure. The authors mentioned that the drawing speed was slower and the pen pressure lower among Parkinson’s patients, and this was especially pronounced for patients with more acute or advanced forms of the disease [61].

The dataset consists of 204 pre-processed images and is pre-split into a training set and a testing set, consisting of Spiral: 102 images, 72 training, and 30 testing and Wave: 102 images, 72 training, and 30 testing. Table 2 represents some information regarding the dataset type, splitting, the age average and the number of participants including the people with Parkinson's disease and the control group (healthy).

Table 2 Description of spiral and wave dataset of Parkinson’s disease

3.4 Dataset visualization

Hand tremors, Speed of drawing and Pen-Pressure may vary from person to person [46]. A skeleton image can be generated to understand the dissimilarities between the two classes' Spiral-Wave drawings. This skeleton image can then be transformed into a new data frame, where each row represents the coordinates of non-zero pixels in the respective image.

Creating these skeletons has many details, starting by reading each image from the dataset as a grayscale and subtracting each pixel from one to inverse the image's background into black and the sketch to white. Then, the image has been smoothed using the Median filter and threshold using Yen's method. These processes aim to refine and segment the images, ensuring that the drawings are distinctly extracted, separating the drawing pixels from the noise. The final operation is the skeletonization using the pre-developed algorithm from the Skimage package. For more details, the source code is available on the Kaggle repository.Footnote 3

Figure 3 represents the results of the skeletonization process that reflects the entire dataset in one image. As we noticed, all drawings start from one point and finish at one point. The drawing of the control group (health individuals) is more regular and smoother than the people with Parkinson's disease, and classification algorithms can detect these tremors or irregularities to predict Parkinson's disease.

Fig. 3
figure 3

Visualization of dissimilarities of spiral and wave dataset’s classes (healthy and Parkinson) using skeletonization approach

3.5 Dataset augmentation

The dataset used for this study has Spiral images and Wave images that have been resized to 256 × 256 pixels in height and width for Spiral and 256 × 512 pixels in height and width for Wave images, respectively. To prepare the dataset for training, increase its diversity, improve performance on unseen examples and enhance the reliability of deep learning (DL) models, we performed the dataset augmentation technique, which is a valuable technique that may help combat overfitting, enhance model generalization [62] and obtain the optimal benefit from this limited Spiral-Wave dataset more effectively [63]. For this mission, we used a pre-developed data augmentation model from TensorFlow to facilitate this process. The data augmentation parameters have been selected based on some previous studies [47] and the functionality of each parameter to enrich the drawing dataset [63] and consider some conditions that prevent the deformation of the images using the trial and error approach. The parameters are (top, bottom, right and left) shifting, brightness range, shearing range, zoom-in, zoom-out and rotation in two directions. The rest parameters, such as horizontal and vertical flip take the default values that do not change the original image. Table 3 represents the parameters applied to both Spiral and Wave image augmentation models.

Table 3 Data augmentation parameters of spiral and wave images

We generated fifteen new images from every single original image in the training set, which means we have got ((15*72) + 72 = 1152) spiral images and 1152 wave images for the training, while the testing set (30 + 30 = 60 images) remained as they were and did not undergo any changes. We mention that every image is unique due to the combination of the randomly selected parameters from the mentioned ranges.

As we notice in the result of this augmentation in Fig. 4a, some Spiral-Wave images have been distorted, and some images lost part of them because the sketches are close to their edges (borders). To avoid this deformation, we proposed a solution for this problem: padding the images before applying the data augmentation approach. If we apply padding P on an image of size W x H pixels, the output image will have dimensions of (W + 2P) x (H + 2P). We aimed to preserve all information in the middle and on the borders of the images. Therefore, we added 25 pixels (P = 25) on the left, right, top and bottom sides of all images, with permanent colours equal to (239, 239 and 239) for (R, G and B) channels. Figure 4b and Fig. 4c represent the Padding approach and the results of this process, respectively.

Fig. 4
figure 4

The proposed solution for deformation of images. (a): displays an example of images deformation while data augmentation, (b): Images Padding approach and (c): Original images versus Padded Images of spiral and wave dataset, twenty-five pixels were added to each side with constant colours equal to 239 for each channel (RGB)

After padding the images, we repeated the date augmentation approach with the same parameters’ settings mentioned in Table 3. Both original images (No-padding) and Padded images will be used for training and evaluating our models to compare their effect on the performance.

As we mentioned previously, twenty-five pixels were added to each side (top, bottom, right and left) of the original image with constant colours equal to 239 for each channel (RGB) to reflect the background as well as possible. Figure 4c represents two examples of the original and the padded images. Then, the size of the padded images was resized to 256 × 256 and 256 × 512 for spiral and wave images, respectively. Also, the images retained or preserved most of their characteristics and resolution thanks to the minor change they underwent, as we saw in the examples.

3.6 Performance evaluation

The confusion matrix is especially valuable because it does not just provide a single metric, like accuracy, but offers insights into the types of mistakes the model makes. So, the performance of all classifiers used in this paper was evaluated using multiple criteria, including accuracy, specificity, sensitivity, precision, f1-score, and Matthew's correlation coefficient (MCC). All these key Indicators are calculated using the confusion matrix in Table 4.

Table 4 Confusion matrix

In the context of a classifier for Parkinson's disease, several vital terms [64] denote its performance. True Positives (TP) are cases where the classifier accurately identifies a patient with Parkinson's. It means the actual condition and the classifier's prediction concur in diagnosing the disease. Conversely, True Negatives (TN) represent instances where the classifier correctly confirms that an individual does not have Parkinson's and is genuinely healthy. However, not all predictions are accurate. False Positives (FP) occur when the classifier mistakenly indicates a healthy person has Parkinson's. Such errors can lead to unwarranted stress and potentially unnecessary medical interventions for the individual. On the other hand, False Negatives (FN) represent a scenario where the classifier overlooks the presence of Parkinson's in a patient who indeed has the disease, potentially leading to a lack of necessary medical attention. In medical diagnostics, minimizing both FP and FN are paramount, given the significant ramifications they can have on patient care and well-being.

Several performance metrics such as accuracy, precision, recall, and the F1 score can be derived from the confusion matrix to evaluate the classifier's performance more holistically.

  • Accuracy: it is a metric that quantifies the overall correctness of the classifier in predicting the actual labels across both the Parkinson's disease class and the healthy class and can be calculated using Eq. 1.

    $${\mathrm{Accuracy}}=\frac{TP+TN}{TP+TN+FP+FN}\times 100$$
    (1)
  • Recall (Sensitivity or True Positive Rate): quantifies the model's capability to capture all potential positive cases (Parkinson's) from the actual positive class and avoid the miss classification of positive cases as determined in Eq. 2.

    $${\mathrm{Sensitivity}}=\frac{TP}{TP+FN}\times 100$$
    (2)
  • Precision: provides insight into the reliability of positive classifications made by the models and reflects the classifiers' ability to correctly predict the actual healthy instances and avoid the miss classification of the control group (healthy instances) as shown in Eq. 3.

    $${\mathrm{Precision}}=\frac{TP}{TP+FP}\times 100$$
    (3)
  • F1-score: represents the harmonic mean of precision and recall. It offers a balanced measure between the two metrics, especially useful in situations with uneven class distributions and can be calculated using Eq. 4.

    $$F1=2\times \frac{{Recall}\times{Precision}}{{Recall}+{Precision}}$$
    (4)

3.7 Classifiers development overview

In this work, we proposed a Convolutional Neural Network with K-Nearest Neighbours algorithm (CNN-KNN) architecture (Fig. 5) instead of the standard architecture of Convolutional Neural Network (Fig. 6) to predict Parkinson’s disease through hand tremors using spiral and wave sketching. Combining CNN with traditional ML algorithms can offer several advantages in specific scenarios. CNN excels at processing structured data such as images or audio, extracting meaningful features through convolutional and pooling layers. On the other hand, traditional ML algorithms are distinguished by their speed and robust performance and are well-suited for handling structured tabular data. We adopted this hybrid architecture to leverage the strengths of both approaches to build a robust and trustworthy model for predicting Parkinson's disease compared to the models in previous literature.

Fig. 5
figure 5

The proposed Convolutional Neural Network with K-Nearest Neighbours algorithm (CNN-KNN) architecture

Fig. 6
figure 6

Standard architecture of Convolutional Neural Network

3.8 The development of the proposed CNN architecture

Firstly, two convolutional neural network (CNN) models were constructed with identical architectures for both the Spiral and Wave datasets, as illustrated in Fig. 7. Figure 7a depicts the architecture of the Spiral model, while Fig. 7b represents the architecture of the Wave model. These architectures showcasing all hyperparameters including the size of input layers, convolution layers, kernels, max pooling, activation functions, and output. We selected these hyperparameters using the pre-developed Random Search approach available in TensorFlow packages by comparing and sorting the testing results. The range of search was from 8 to 46 for numbers of filters, 3, 5 or 7 for kernel size, ‘same’ for padding, ‘sigmoid, tanh or ReLU’ for activation functions, 50 to 3000 for dense layers and ‘SoftMax’ for the two output layers. Moreover, we used ‘Adam’ as an optimizer with a learning rate equal to 0.00001 and ‘sparse categorical cross entropy’ as a loss function. Then, we trained and evaluated each model separately using the Spiral and Wave sets. Finally, we saved these models to be used later.

Fig. 7
figure 7

The proposed CNN models for PD classification based on spiral and wave images. (a) represents the architecture and the hyperparameters of the spiral model, while (b) depicts the architecture of the wave model

3.9 The development of the proposed CNN-KNN architecture

Based on the models in the previous section, we loaded the pre-trained models. These models have sequence layers: an input layer, three mixed convolutional with Pooling layers, a flattened layer, (3000, 1000, 500, 100) dense layers, and one output layer.

To preserve the value of the trainable parameters, we have frozen all first layers until the flattened layer and removed all subsequent dense layers. The idea is to use the convolutional layers of the pre-trained models as feature extractors after removing the last layers. We passed all images through the frozen convolutional layers; we extracted relevant features from the augmented dataset of spiral and wave separately.

The result of the predicting spiral dataset through the flattened layers of the spiral model was a new numerical dataset consisting of 1152 samples and 8192 features, and we got the same results from the wave model using the augmented wave images.

Then, we used each new numerical dataset to train KNN models separately. The first classifier is spiral KNN which has the value of hyperparameters adjusted on (n neighbours: 4, weights: 'distance', algorithm: 'brute', leaf size: 25, p: 2). The second classifier is wave KNN that has the value of hyperparameters adjusted on (n neighbours: 2, weights: 'distance', algorithm: 'brute', leaf size: 33, p: 1). These hyperparameters selected manually based on the testing results using trial and error approach.

We automated the proposed architecture (CNN-KNN) to receive the spiral test images one by one, extract features, feed them to the KNN classifier to predict them, and the same process for the wave model. The final prediction is based on the average prediction probabilities of one spiral image and one wave image provided via the spiral and wave classifiers.

Figure 8 represents the proposed CNN-KNN architecture to predict Parkinson’s disease. This architecture consists of three main sections: Input, Feature Extraction and Voting Classifier.

  • Input: It performs image preprocessing such as padding, resizing, normalization and data augmentation to feed the model with significant, uniform and relevant images.

  • Feature Extraction: It represents the convolutional layers that play a crucial role in feature extraction and provide flat numerical features that will be the foundation for subsequent classifiers (KNNs).

  • Voting Classifier: It is an ensemble voting classifier which consists of two sub-classifiers and produces the final prediction based on the average of the prediction probabilities.

Fig. 8
figure 8

The proposed ensemble voting classifier of PD via spiral and wave drawing based on hybrid CNN-KNN architectures

Table 5 represents the optimal hyperparameters of all elements used in this study. We mention that the spiral and wave models have the same architectures, while the KNNs classifiers have different hyperparameters. All these hyperparameters have been selected using the random search approach or manually to reach the performance required based on training and testing outcomes.

Table 5 The optimal hyperparameters of the models and classifiers used in this work

3.10 Feature extraction based convolutional layers

Convolutional layers in neural networks are instrumental for feature extraction from an image. Figure 9 clarifies how these layers scan the input data, capturing local patterns and features using learnable filters (kernels). During training, the filters adapt their weights through backpropagation and gradient descent, enabling them to specialize in detecting task-relevant features. The output of the convolutional layers consists of feature maps, representing high-level representations of the input data. These hierarchical features, ranging from edges to objects, can then be used in subsequent layers for tasks such as classification or regression. Overall, convolutional layers play a vital role in deep learning models, extracting meaningful features that enhance performance and generalization.

Fig. 9
figure 9

Feature extraction of RGB image based on convolutional layer, activation function and MaxPooling

According to Eq. 5, the dimension of output matrices of the convolutional layer can be calculated based on the size of the input image including (Width: Col, Height: Row and Number of Channels: Nc), the kernel size (Height, Width and depth of kernels: Fw, Fh and Fc) number of filters (Fn) with taking into consideration if padded image (p) and the stride of kernel across it (s).

$$\begin{array}{c}\left[Col,Row,{N}_{c}\right]*\left[{F}_{w},{F}_{h},{F}_{c}\right]=\\ \left[\left[\frac{Col+2p-{F}_{w}}{s}+1\right],\left[\frac{Row+2p-{F}_{h}}{s}+1\right],{F}_{n}\right]\end{array}$$
(5)

In Conv2D, one filter should have the same depth as the depth of the input matrix and produce one 2D matrix according to the default architecture of TensorFlow. As shown in Fig. 9, the input is an RGB image of size (6 × 6x3) and two kernels of size ((3 × 3x3),2). Each layer (kernel) in the filter scans one channel of the input image and produces one primary matrix of size (4 × 4).

Contrary to correlation operation, the convolution operation uses a filter rotated 180° degrees before applying it to the input. This operation is also known as kernel flipping. The reason behind this flipping is to ensure that the convolution operation aligns with the mathematical definition of convolution. It simplifies the mathematical formulation and aligns more closely with how convolution is defined in mathematical terms.

The pixel values of each primary matrix can be calculated using Eq. 6, whereas the pixel values of the final matrix are calculated by adding the Bias to the sum of the counterpart's pixels in primary matrices [65, 66] as defined in Eq. 7. This convolutional operation has returned two matrices of size ((4 × 4x1),2) due to the two filters used in this example.

$$G\left[n,m\right]=\left({f}^{*}h\right)\left[n,m\right]=\sum_{y=0}^{{F}_{h}-1}\sum_{x=0}^{{F}_{W}-1}h\left[x,y\right]f\left[x,y\right]$$
(6)
$${G}_{\mathrm{out}}\left(n,m\right)=\sum_{z=0}^{{\mathrm{depth}}-1}\left(\left(\sum_{y=0}^{{F}_{h}-1}\sum_{x=0}^{{F}_{W}-1}h\left[x,y\right]*f[x,y]\right)+{\mathrm{bias}}\right)$$
(7)

where the Receptive Field in each channel of the input image has been denoted by f, one layer of the kernel (filter) by h and the primary matrix by G[n, m] with the indexes of columns and rows (n, m) respectively.

Moreover, the Rectified Linear Unit (ReLU) defined in Eq. 8 has been used as an activation function. It is a non-linear function that introduces non-linearity into the network, allowing it to learn complex patterns, improve the learning efficiency in some cases and make nonlinear transformations. This function removes the negative pixels from the output matrix and replaces them with zero [67].

$$f\left(x\right)={x}^{+}=max\left(0,x\right)$$
(8)

Then, we used Max pooling to reduce the dimensions of output matrices while retaining their essential features [66]. Max pooling [68] is a subsampling operation that takes the maximum value from a group of values in a matrix.

We have dealt with a (2 × 2) max pooling operation that is given by Eq. 9.​ Where y is the maximum value within R that is a (2 × 2) window (or filter) slides over the input matrix elements (\({x}_{ij}\))

$$y={max}_{\left(i,j\right)\in R}{x}_{ij}$$
(9)

The output matrices will be an input of the next convolutional layer, and the previous processes will repeat until the Flatten Layer, which reshapes the most prominent features in the last matrices into a one-dimensional vector to be used as an input for the fully connected layers or any ML algorithm.

3.11 Ensemble voting classifier based on KNN algorithm

In this work, we used the k-Nearest Neighbours (k-NN) classifier, a simple, instance-based learning algorithm used in supervised machine learning. Instead of constructing a general model during the training phase like many other algorithms, it memorizes the entire training dataset. Predictions are made for a new data point by considering the k training examples closest to that point [69, 70]. For a new, unseen instance, the algorithm searches for the k-training samples closest to the point based on the hyperparameter (P = 1 or P = 2) using the most popular equations to calculate the distance. Since P equals one, in this case, the algorithm will use the Manhattan distance given by Eq. 10. If P equals two, the distance is calculated using the Euclidean distance given by Eq. 11. To automate the algorithm, the Minkowski distance [70] is a generalization of both the Euclidean distance and the Manhattan distance as defined in Eq. 12.

$${D}_{\mathrm{Manhattan}}\left(X,Y\right)={\Vert x-y\Vert }_{1}={\sum }_{i=1}^{n}\mid {x}_{i}-{y}_{i}\mid$$
(10)
$${D}_{\mathrm{Euclidean}}\left(X,Y\right)=\sqrt{{\sum }_{i=1}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2}}$$
(11)
$${D}_{\mathrm{Minkowski}}\left(X,Y\right)={\left({\sum }_{i=1}^{n}{\left|{x}_{i}-{y}_{i}\right|}^{p}\right)}^\frac{1}{p}$$
(12)

As mentioned, the weights hyperparameter in the two KNN algorithms has been set to (distance). Thus, the influence of each neighbour on the prediction is weighted by the inverse of their distance to the query point. This means that closer neighbours will significantly influence the prediction more than those further away. For each of the k neighbours, the weight \({w}_{i}\)​ of the \({i}^{th}\) neighbour is calculated using Eq. 13, where \(x\) is the query point and \({x}_{i}\) is the \({i}^{th}\) neighbour.

$${w}_{i}=\frac{1}{{\mathrm{Distance}}\left(x,{x}_{i}\right)}$$
(13)

For classification, instead of each of the K neighbours getting one vote, each neighbour gets \({w}_{i}\)​ votes. The class with the highest cumulative weighted vote is the predicted class. The proposed ensemble voting classifier has been built based on two independent KNN models: spiral KNN to predict spiral images and wave KNN to predict wave images. The final prediction of Parkinson's disease is the average probability of each model, as shown in Fig. 10. So, each model returns the probability in two columns corresponding to the classes (Parkinson and healthy).

Fig. 10
figure 10

The proposed ensemble voting classifier through spiral and wave images based on KNN algorithm

The resulting value for each class is the probability that the input sample belongs to that class based on the weighted votes of its k nearest neighbours. To compute weighted votes for each class c (Parkinson or healthy), sum the weights of the neighbours that belong to class c (Parkinson or healthy) using Eq. 14, where yi is the actual label of the class and c is the target class. Then, to convert these weighted votes into probabilities, divide the weighted vote for each class by the total weight of the k neighbours using Eq. 15, where P(c|x) represents the probability of class c given the input sample x.

$${\mathrm{WeightedVote}}\left(c\right)={\sum }_{i:{{\mathrm{y}}}_{i}={\mathrm{c}}}{w}_{i}$$
(14)
$$P\left(c|x\right)=\frac{{\mathrm{WeightedVote}}\left(c\right)}{{\sum }_{i=1}^{k}{w}_{i}}$$
(15)

In mathematical terms, for a given class c and an input sample x, Eq. 16 provides the class probabilities for input samples. Where \({w}_{i}\)​ is the weight of the \({i}^{th}\) neighbour and \(I\left({y}_{i}=c\right)\) is an indicator function that is (one) if the label \({y}_{i}\) of the \({i}^{th}\) neighbour is c and (zero) otherwise.

$$P\left(c|x\right)=\frac{{\sum }_{i\in \mathrm{neighbors of}{\mathrm{x}}}{w}_{i}\times I\left({y}_{i}=c\right)}{{\sum }_{i\in \mathrm{neighbors of}{\mathrm{x}}}{w}_{i}}$$
(16)

4 Results and discussion

This section represents the results of seven models, which are CNN model trained and tested using the original spiral set (Spiral-CNN No-Padding), CNN model trained and tested using the original wave set (Wave-CNN No-Padding), CNN model trained and tested using the padded spiral set (Spiral-CNN Padding), CNN model trained and tested using the padded wave set (Wave-CNN Padding), hybrid CNN-KNN model trained and tested using the padded spiral set (Spiral-CNN-KNN), hybrid CNN-KNN model trained and tested using the padded wave set (Wave-CNN-KNN) and the Ensemble Voting Classifier that has been developed based on the last two models (Ensemble Voting Classifier).

4.1 Spiral and wave models with No-Padding dataset

In methodology, we mentioned that we developed two CNN models and tuned their hyperparameters. Moreover, these models have been trained, evaluated and tested using the original spiral and wave dataset. Training and validation curves are graphical representations commonly used in ML and DL to monitor the performance of a model during training. Figure 11a represents the accuracy curves of training and validation that depict the performance of the Spiral Model on the No-Padding spiral set (Spiral-CNN No-Padding) over thirty successive epochs. In addition, Fig. 11b represents the Loss curve during the training and validation process. It represents how predictions were different from the desired targets.

Fig. 11
figure 11

The results of the CNN Model using the No-Padding spiral set (a) represents the accuracy curves of training and validation, while (b) represents loss curves of training and validation

Then, this model was evaluated using unseen images (test set) consisting of 30 images. Among these, 22 images were correctly classified as true positives (TP) and true negatives (TN), indicating the model's capability to identify Healthy people and PD with an accuracy reached 73.3%. In fact, the model demonstrated promising performance by correctly classifying 22 images, it also exhibited limitations, misclassifying 4 images as false positives (FP) and 4 images as false negatives (FN). This suggests opportunities for enhancing the model's sensitivity and precision, which are crucial for minimizing misclassifications. Figure 12 represents the symmetric confusion matrix of the model.

Fig. 12
figure 12

Confusion Matrix of CNN Model using the No-Padding spiral set

Also, Wave-Model (Wave-CNN No-Padding) has been trained, validated and evaluated using the No-Padding wave set. The accuracy curves of training and validation have been illustrated in Fig. 13a, whereas Fig. 13b represents the Loss curves. As we notice in the figures, the accuracy curves exhibited an increasing trend and stabilized at high values closer to each other than Spiral-Model, whereas loss curves decreased and stabilized at low values. This convergence is reflected well in the model's performance during the test. As shown in the confusion matrix in Fig. 14, the model misclassified 2 images as FP and 4 images as FN.

Fig. 13
figure 13

The results of the CNN model using the No-Padding wave set (a) represents the accuracy curves of training and validation, while (b) represents loss curves of training and validation

Fig. 14
figure 14

Confusion matrix of CNN model using the No-Padding wave set

4.2 Spiral and wave models with padded dataset

When we augmented the Spiral and Wave dataset used with the previous models, we noticed that some images had been distorted and some images lost part of them because the sketches were close to their edges (borders), as we mentioned previously. Furthermore, the outcomes of the previous models show high misclassification because of deformation of the images during the data augmentation processes. For these reasons, we padded all images to avoid excessive deformation. The purpose was to obtain meaningful images and preserve all information in the middle of these images. These padded images have been used to train and evaluate two CNN models.

The first model is the Spiral Model with the padded set (Spiral-CNN Padding). We trained and evaluated it using the Spiral padded images and obtained more accurate performance. Figure 15a represents the training and validation accuracy curves, whereas Fig. 15b represents Loss curves. The confusion matrix in Fig. 16 illustrates the critical performance indicators found using the testing set.

Fig. 15
figure 15

The results of the CNN model using the padded spiral set (a) represents the accuracy curves of training and validation, while (b) represents loss curves of training and validation

Fig. 16
figure 16

Confusion matrix of CNN Model using the padded spiral set

The second model is the Wave Model with the padded set (Wave-CNN Padding). This model has been trained and evaluated using the Wave padded images, and we noticed that the model was learning meaningful patterns from the padded Wave set and generalized well to the validation and testing sets. Figure 17a represents the training and validation accuracy curves, whereas Fig. 17b represents Loss curves. The confusion matrix in Fig. 18 illustrates the key performance indicators found using the testing set. The proposed solution minimizes models’ misclassification and increase the performance from 80% to 86.66%.

Fig. 17
figure 17

The results of the CNN model using the padded wave set (a) represents the accuracy curves of training and validation, while (b) represents loss curves of training and validation

Fig. 18
figure 18

Confusion matrix of CNN model using the padded wave set

4.3 Spiral and wave CNN-KNN architecture with padded dataset

Transfer learning (TL) is a technique in deep learning that involves leveraging the knowledge gained from pre-trained models to solve new, related tasks. We aimed to enhance the performance of our models using the pre-trained CNN model that trained with the padded Spiral and Wave dataset.

After removing the last layers, we used the convolutional layers of the pre-trained models as feature extractors. We passed all images through the frozen convolutional layers; we extracted relevant features from the images. The flattened numerical features have been used to produce a new dataset that has been used to train the k-Nearest Neighbours (KNN) Algorithm.

This hybrid CNN-KNN architecture has been trained and evaluated using the padded spiral set. We presented its performance in the confusion matrix in Fig. 19. The second hybrid model has also been trained and evaluated using just the padded wave set. The confusion matrix in Fig. 20 reflects how the performance increased compared to both Wave Models.

Fig. 19
figure 19

Confusion matrix of hybrid CNN-KNN architecture using the padded spiral set

Fig. 20
figure 20

Confusion matrix of hybrid CNN-KNN architecture using the padded wave set

4.4 Ensemble voting classifier

This work aims to predict Parkinson’s disease via hand tremors using two different approaches: spiral sketching and wave sketching. To predict the disease using spiral and wave, we built two separate models for each of them: Spiral-CNN-KNN and Wave-CNN-KNN. These models provide two independent predictions.

To obtain a single decision on whether or not a person has Parkinson's disease, we have built, trained and evaluated an Ensemble Voting Classifier based on both spiral and wave CNN-KNN architectures. The final decision of this classifier is the average of probabilities of the sub-classifiers. Then, this hybrid system was evaluated using unseen images (test set) comprising 60 combined samples, with each individual represented by one spiral and one wave image. The final classification decision is determined by averaging the predictions made by the sub-systems based on spiral and wave drawings. Based on the results in Fig. 21, 29 individuals were correctly classified as true positives (TP) and true negatives (TN), indicating the model's capability to identify Healthy people and PD with an accuracy reached 96.67%. In fact, the model demonstrated promising performance by correctly classifying 29 individuals, it also exhibited limitations, misclassifying 1 individual as a healthy while he/she actually has Parkinson's disease (FN). The proposed hybrid system provides high performance compared to the previous models. These results demonstrate the potential of our proposed hybrid approach in effectively identifying individuals with PD using both spiral and wave images.

Fig. 21
figure 21

Confusion matrix of the proposed ensemble voting classifier of PD via spiral and wave drawing based on hybrid CNN-KNN architectures

Table 6 represents a comparison between the key performance indicators of seven models, which are CNN model trained and tested using the original spiral set, CNN model trained and tested using the original wave set, CNN model trained and tested using the padded spiral set, CNN model trained and tested using the padded wave set, hybrid CNN-KNN model trained and tested using the padded spiral set, hybrid CNN-KNN model trained and tested using the padded wave set and the Ensemble Voting Classifier that has been developed based on the last two models.

Table 6 Comparison between six models developed as a foundation for the proposed classifier

For contrast and easy comparison between models across the different key performance indicators, Fig. 22 provides a visual representation of the models' performance side-by-side.

Fig. 22
figure 22

Visualization chart of the key performance indicators of models

Moreover, Fig. 23 represents three trends that support practical analysis of performance enhancement of Spiral models, Wave models and the Ensemble Voting classifier that reached 96.67% accuracy, 93.33% sensitivity (Recall), 100% precision and 96.55% of f1-score.

Fig. 23
figure 23

Upward trends of performance enhancement of models divided into three groups

Table 7 compares the proposed classifier (CNN-KNN Ensemble Voting Classifier) and several classifiers in the previous literature. This comparison concerns the publication date, the Dataset used for training and evaluating the classifiers, the methodology of feature extraction and the performance provided by each classifier.

Table 7 Comparison between the proposed classifier and related works in previous literature (only hand drawing dataset)

Diagnosing Parkinson's disease typically involves a comprehensive evaluation of multiple symptoms and clinical indicators rather than relying on a single symptom such as speech disorders, hand tremors, brain scans MRI, etc. Parkinson's disease is a complex neurodegenerative disorder, and its diagnosis is often based on a combination of motor and non-motor symptoms. As elucidated in the literature review section, certain diagnostic methodologies, including the analysis of speech disorders [30] and scans [58], exhibit high accuracy. However, relying solely on these methods is insufficient for achieving a precise diagnosis of the disease. Hence, the diversity of diagnostic methods for predicting Parkinson's disease based on Artificial intelligence and Machine learning can guide us to build a complete early diagnostic embedded system to predict the disease via several symptoms.

The findings can be summarized in four points: 1) Hyperparameter tuning is a crucial step to build a robust model compatible with a custom dataset to achieve high performance, reduce overfitting and enhance its ability to generalize to unseen data (Test Set). 2) Deformation of the dataset during the augmentation process may negatively impact the performance. As a proposed solution, we padded all original images to preserve their essential information. 3) CNN-KNN architecture may increase performance in some cases due to exploiting the advantages of Convolutional Layers as feature extractors and ML algorithms as classifiers. 4) Ensemble voting classifiers combine predictions from multiple individual models. They can often achieve higher accuracy and better overall performance than using a single model due to combining the strengths and weaknesses of spiral drawing and wave drawing to make the final decision concerning people with Parkinson's disease and healthy people.

5 Conclusion and future work

In this study, we proposed a novel hybrid CNN-KNN voting classifier for predicting Parkinson’s disease through hand tremors using a spiral-wave sketching dataset. Our classifier combines the advantages of both deep learning and machine learning techniques, and leverages the strengths and weaknesses of spiral drawing and wave drawing to make accurate predictions. Our classifier achieved 96.67% accuracy, 93.33% sensitivity and 100% precision on our dataset, which is higher than the existing methods that use only one type of drawing or one type of model. Our work enhances medical services by providing a fast and reliable tool for diagnosing Parkinson’s disease at an early stage. However, our work also has limitations and challenges that must be addressed in future research. First, our dataset is relatively small and imbalanced, which may affect the performance and robustness of our classifier. We plan to collect more data from different sources and apply more advanced data augmentation techniques to increase the diversity and quality of our data. Second, our classifier is based on hand tremors only, which may not capture all the symptoms and features of Parkinson’s disease. We intend to incorporate other modalities, such as voice, gait, or facial expressions, to improve our classifier’s comprehensiveness and accuracy. Third, our classifier has not been tested on actual patients or in clinical settings, which may raise some ethical or practical issues. We aim to conduct more experiments and evaluations with medical experts and patients to validate our classifier’s effectiveness and usability. We hope our work will inspire more research on Parkinson’s disease prediction using hand tremors and pave the way for developing more advanced and intelligent systems for enhancing medical services.