18IT048 — Practical_Exam_Work
Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of the classification result of the given dataset? Get the maximum classification accuracy possible by performing the following methods.
Data Set Name: Autistic Spectrum Disorder Screening Data for Adult
https://archive.ics.uci.edu/ml/machine-learning-databases/00426/
Dataset information
Spectrum Disorder (ASD) is a neurodevelopment condition associated with significant healthcare costs, and early diagnosis can significantly reduce these. Unfortunately, waiting times for an ASD diagnosis are lengthy and procedures are not cost-effective. The economic impact of autism and the increase in the number of ASD cases across the world reveals an urgent need for the development of easily implemented and effective screening methods. Therefore, a time-efficient and accessible ASD screening is imminent to help health professionals and inform individuals whether they should pursue a formal clinical diagnosis. The rapid growth in the number of ASD cases worldwide necessitates datasets related to behavioral traits.
In this dataset, we record ten behavioral features (AQ-10-Adult) plus ten individual characteristics that have proved to be effective in detecting ASD cases from controls in behavior science.
Encoding
Here, we are applying to encode, so for that, we have to remove categorical data features. so now our all data are in form of o or 1. so after encoding it's now more secure and machines can understand it easily.
Normalization
Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values.
for normalizing I have converted all the data in intervals of [0,1]
Missing value handling
here We have 3 types to fill the average of the missing values/most frequent, replace with a random value and the last one is removing rows with missing values. but the most effective way is to apply average values.
Feature Selection
It gives us those attributes that are most important for deciding the target value.
After applying pre-processing,
we can see that our data is very accurate before applying the preprocessing technique, but if we talk about the particular testing model then by seeing the below figure we can say that in the random forest model we get 100% accuracy after applying preprocessing technique which was not previously that much accurate.
Confusion Matrix
Task 2
Generate the Dashboard of the preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.
- pie chart: It shows the country of Argentina and Aruba by class/ASD means a number of people who have aided in this country.
2. donut chart: it shows the ratio of austim=No by gender of female and male that is 56.17 and 53.83 percent respectively.
3. stack chart: it shows the county of angela and Azerbaijan by its a9 score and autism = yes
4. column bar chart: it shows the result by the people who have a relationship with others and they are female
5. KPI: It shows the relation is relative and gender is female when jundice is no.