Analyzing and Visualizing Retail Sales Data with Python
Prerequisites and Objectives
This exercise requires the installation of the numpy, pandas, and matplotlib libraries. The primary goals are to perform operations on CSV files, conduct data analysis using pandas, and create visualizations with matplotlib.
Generating Simulated Sales Data
The following script creates a CSV file named data.csv containing simulated daily sales data for a retail store throughout the year 2022.
import csv
import random
from datetime import date, timedelta
output_file = 'data.csv'
base_sales = 500
with open(output_file, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Date', 'Revenue']) # Write header
current_date = date(2022, 1, 1)
for day in range(365):
daily_revenue = base_sales + (day * 5) + random.randint(0, 99)
writer.writerow([current_date.isoformat(), daily_revenue])
current_date += timedelta(days=1)
Data Loading and Cleaning
Load the generated data into a pandas DataFrame and remove any rows with missing values.
import pandas as pd
sales_data = pd.read_csv('data.csv')
sales_data.dropna(inplace=True)
Creating a Daily Revenue Line Chart
Generate a line plot showing daily revenue trends and save it as daily_sales.jpg.
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(sales_data['Date'], sales_data['Revenue'], color='steelblue')
plt.title('Daily Store Revenue for 2022')
plt.xlabel('Date')
plt.ylabel('Revenue')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('daily_sales.jpg', dpi=100)
plt.show()
Craeting a Monthly Revenue Bar Chart
Aggregate the data by month, create a bar chart, and save it as monthly_sales.jpg.
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
sales_data['Month'] = sales_data['Date'].dt.month
monthly_totals = sales_data.groupby('Month')['Revenue'].sum()
plt.figure(figsize=(10, 6))
plt.bar(monthly_totals.index, monthly_totals.values, color='seagreen')
plt.title('Total Revenue by Month')
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.xticks(range(1, 13))
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.savefig('monthly_sales.jpg')
plt.show()
Identifying the Month with the Largest Revenue Increase
Calculate the month-over-month revenue change, find the largest increase, and write the corresponding month to a text file.
# Calculate revenue for each month
monthly_revenue = sales_data.groupby('Month')['Revenue'].sum().reset_index()
# Calculate the difference from the previous month
monthly_revenue['Previous_Month_Revenue'] = monthly_revenue['Revenue'].shift(1)
monthly_revenue['Increase'] = monthly_revenue['Revenue'] - monthly_revenue['Previous_Month_Revenue']
# Find the month with the maximum increase
max_increase_row = monthly_revenue.loc[monthly_revenue['Increase'].idxmax()]
target_month = int(max_increase_row['Month'])
# Write result to file
with open('peak_growth_month.txt', 'w') as f:
f.write(f'{target_month}')
Creating a Quarterly Revenue Pie Chart
Group the data by fiscal quarter, create a pie chart showing the distribution, and save it as quarterly_distribution.jpg.
# Assign quarter based on month
sales_data['Quarter'] = sales_data['Date'].dt.quarter
quarterly_totals = sales_data.groupby('Quarter')['Revenue'].sum()
quarter_labels = ['Q1', 'Q2', 'Q3', 'Q4']
colors = ['gold', 'lightcoral', 'lightskyblue', 'lightgreen']
explode = (0.05, 0, 0, 0) # Slightly separate the first slice
plt.figure(figsize=(8, 8))
plt.pie(quarterly_totals.values, labels=quarter_labels, colors=colors,
autopct='%1.1f%%', startangle=140, explode=explode)
plt.title('2022 Revenue Distribution by Quarter')
plt.savefig('quarterly_distribution.jpg')
plt.show()