top of page
Search

Web Scraping

  • Writer: Abhishek ::
    Abhishek ::
  • Jun 8
  • 2 min read

Updated: Jun 18

ree

In your journey as a data analyst you will encounter various instances where you see some interesting data in a website and would like to analyze it. It is not manually possible or feasible to do so, in that case you will need to automate the scraping of data by writing a code that does it for you, this process is Web scraping.


However, scraping can raise serious ethical and legal concerns if performed irresponsibly. This report provides an overview of web scraping, highlights an example of legal scraping, and emphasizes the importance of adhering to cyber laws and website policies.


2. How Web Scraping Works


Web scraping involves :


  • Sending an HTTP request to a website.

  • Downloading the HTML content.

  • Parsing and extracting relevant data using tools like:

    • BeautifulSoup (Python)

    • Scrapy (Python)

    • Selenium (automated browser control)


3. Importance of Abiding by Cyber Laws When Scraping


3.1 Website Terms of Service (ToS)

Most websites have a clearly outlined Terms of Service published ignoring which may constitute:

  • Breach of contract

  • Unauthorized access under various cyber laws.


3.2 Legal Risks

Illegal web scraping leads to different consequences depending on the location:

  • In the United States, violating the Computer Fraud and Abuse Act (CFAA) can result in civil or criminal liability.

  • In Europe, scraping personal data without consent may violate the GDPR.

  • In India, scraping without authorization can fall foul of the Information Technology (IT) Act.


3.3 Ethical Concerns

  • Web scraping involves sending request hundreds of thousands of requests to the server which can overwhelm and overload the servers and can disrupt services.


3.4 Respecting Robots.txt

  • Most websites have a public file posted on the roof of the website, the file communicates whether web crawlers are allowed.

  • Ethical scrapers should always respect these directives.


4. Web Scraping example


Website:  Books to Scrape

Description: To avoid any ethical and legal concerns we will explore this website created specifically for web scraping. It contains a catalog of books with details like price, availability, and ratings.


Why it's legal:

  • The site is explicitly designed for scraping.

  • Its robots.txt file allows scraping.

  • The owner encourages its use for educational and testing purposes.


Objective: To extract book title, price, and availability information from the homepage.


We will now write a code- I used google colab for this project, and then extracted the results in an excel file.


The following link takes you to a Github repository where you can find the complete code needed to scrape the data from the website: https://github.com/asorari09/Web-scraping


The results are as follows:


ree


5. Conclusion


Web scraping is a powerful tool to derive valuable insights online. However, it must be

conducted responsibly, ethically, and legally.

As used in the example, always use an open, scrape-friendly website like Books to Scrape or leveraging official APIs, developers and researchers can avoid risk while benefiting from the immense possibilities of web scraping.

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page