1. Web Scraping Using Python

Mansi Khatri
4 min readJul 27, 2021

--

Web scraping on flipkart’s Product smart band

What is Web Scraping?

Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to have. Let’s say you find data from the web, and there is no direct way to download it, web scraping using Python is a skill you can use to extract the data into a useful form that can be imported.

  • I have used Python’s Beautiful Soup module for Data extraction.
  • For Data manipulation and cleaning I have used Python’s Pandas library.

What is Beautiful Soup?

Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner. We can use a beautiful soup library to fetch data using Html tag, class, id, CSS selector, and many more ways

| Steps to perform web Scraping on any website

Step: 1

Firstly we should start by importing the necessary modules (pandas, .csv, bs4, request).

Using the requests library, we can fetch the content from the URL given.

Importing the necessary modules

Step: 2

After importing the necessary modules, you should specify the URL containing the dataset and pass it to request.get() to get the HTML of the page.

URL of Website

Step: 3

Now after getting the HTML page we have to decide what data we want to fetch. So according to that, we have to make the list of that item. Like, here I have extracted the smart band name, price, and discount that is currently available on the band.

List of data that we want to extract

Step: 4

so, here first I have to take a class block that contains all details of a particular smart band as a band name, price, rating, reviews, discount, etc…

Using inspector we can get the class name of any item.

Now we have to make a list of all the extracted data according to the smart band name so we are appending the data in the respectively created list before.

Step: 5

So I have done this thing because when I am downing the .csv file then I get some unwanted text and I found that that was occurring because of Rs. a symbol so using this code I have removed that Rs. symbol to get better readability.

Removing the Rs. Symbol

Step: 6

The next step is to convert the list into a data frame and get a quick view of the first 5 rows using Pandas.

First five records that we have fetched

Step: 7

The last step is to convert the data in the .csv file so we can use these data in real life for making some kind of software or app that gives comparison chart to the user.

Convert the data into .csv File

--

--

No responses yet