1. Web Scraping Using Python
Web scraping on flipkart’s Product smart band
What is Web Scraping?
Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to have. Let’s say you find data from the web, and there is no direct way to download it, web scraping using Python is a skill you can use to extract the data into a useful form that can be imported.
- I have used Python’s Beautiful Soup module for Data extraction.
- For Data manipulation and cleaning I have used Python’s Pandas library.
What is Beautiful Soup?
Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner. We can use a beautiful soup library to fetch data using Html tag, class, id, CSS selector, and many more ways
| Steps to perform web Scraping on any website
Step: 1
Firstly we should start by importing the necessary modules (pandas, .csv, bs4, request).
Using the requests library, we can fetch the content from the URL given.
Step: 2
After importing the necessary modules, you should specify the URL containing the dataset and pass it to request.get() to get the HTML of the page.
Step: 3
Now after getting the HTML page we have to decide what data we want to fetch. So according to that, we have to make the list of that item. Like, here I have extracted the smart band name, price, and discount that is currently available on the band.
Step: 4
so, here first I have to take a class block that contains all details of a particular smart band as a band name, price, rating, reviews, discount, etc…
Using inspector we can get the class name of any item.
Now we have to make a list of all the extracted data according to the smart band name so we are appending the data in the respectively created list before.
Step: 5
So I have done this thing because when I am downing the .csv file then I get some unwanted text and I found that that was occurring because of Rs. a symbol so using this code I have removed that Rs. symbol to get better readability.
Step: 6
The next step is to convert the list into a data frame and get a quick view of the first 5 rows using Pandas.
Step: 7
The last step is to convert the data in the .csv file so we can use these data in real life for making some kind of software or app that gives comparison chart to the user.