Web Scraping with Python using BeautifulSoup
Introduction to Web Scraping
Web scraping involves extracting data from websites for analysis and storage. This technique is valuable when working with publicly available web data to research or learning purposes.
Required Libraries
Install these packages if not already available:
pip install requests
pip install beautifulsoup4
Basic Scraping Example
First, import the necesary modules:
import requests
from bs4 import BeautifulSoup
Here's a simple example to fetch a webpage:
target_url = 'http://example.com'
page_response = requests.get(target_url)
print(f"Status code: {page_response.status_code}")
Parsing Page Content
Extract and parse HTML content:
page_content = page_response.content
parsed_content = BeautifulSoup(page_content, 'html.parser')
# Extract page title
print(f"Page title: {parsed_content.title.string}")
# Find specific elements
main_content = parsed_content.find(id="main-content")
for item in main_content.find_all('h2'):
print(item.text)
Practical Applications
- Extract news headlines from news portals
- Gather product information from e-commerce sites
- Collect research data from academic websites
Practice Exercises
- Scrape top 10 movie titles from a movie ranking site
- Extract table data from a sample scraping practice website