18IT048 — Practical_Exam_Work

Mansi Khatri
4 min readNov 17, 2021

Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of the classification result of the given dataset? Get the maximum classification accuracy possible by performing the following methods.

Data Set Name: Autistic Spectrum Disorder Screening Data for Adult

https://archive.ics.uci.edu/ml/machine-learning-databases/00426/

Dataset information

Spectrum Disorder (ASD) is a neurodevelopment condition associated with significant healthcare costs, and early diagnosis can significantly reduce these. Unfortunately, waiting times for an ASD diagnosis are lengthy and procedures are not cost-effective. The economic impact of autism and the increase in the number of ASD cases across the world reveals an urgent need for the development of easily implemented and effective screening methods. Therefore, a time-efficient and accessible ASD screening is imminent to help health professionals and inform individuals whether they should pursue a formal clinical diagnosis. The rapid growth in the number of ASD cases worldwide necessitates datasets related to behavioral traits.

In this dataset, we record ten behavioral features (AQ-10-Adult) plus ten individual characteristics that have proved to be effective in detecting ASD cases from controls in behavior science.

Data information

Encoding

Here, we are applying to encode, so for that, we have to remove categorical data features. so now our all data are in form of o or 1. so after encoding it's now more secure and machines can understand it easily.

Applying Encoding

Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values.

for normalizing I have converted all the data in intervals of [0,1]

Missing value handling

here We have 3 types to fill the average of the missing values/most frequent, replace with a random value and the last one is removing rows with missing values. but the most effective way is to apply average values.

removed missing value and apply normalization in the interval [0,1]

Feature Selection

It gives us those attributes that are most important for deciding the target value.

feature selection

After applying pre-processing,

we can see that our data is very accurate before applying the preprocessing technique, but if we talk about the particular testing model then by seeing the below figure we can say that in the random forest model we get 100% accuracy after applying preprocessing technique which was not previously that much accurate.

The testing model was applied to the dataset before and after data preprocessing.

Confusion Matrix

Confusion Matrix before and after data preprocessing.

Task 2

Generate the Dashboard of the preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

  1. pie chart: It shows the country of Argentina and Aruba by class/ASD means a number of people who have aided in this country.

2. donut chart: it shows the ratio of austim=No by gender of female and male that is 56.17 and 53.83 percent respectively.

3. stack chart: it shows the county of angela and Azerbaijan by its a9 score and autism = yes

4. column bar chart: it shows the result by the people who have a relationship with others and they are female

5. KPI: It shows the relation is relative and gender is female when jundice is no.

--

--