This project is a comprehensive SQL + Python case study using a real-world e-commerce dataset to showcase business intelligence, data analysis, and storytelling skills. It is designed to demonstrate end-to-end capabilities for potential clients on UpWork or hiring managers.
- Source: Brazilian E-Commerce Public Dataset by Olist
- Description: Includes orders, customers, payments, reviews, products, sellers, and geolocation data from a multi-vendor marketplace.
- Analyze growth trends: revenue, order volume, and average order value
- Segment customers using RFM analysis
- Assess retention and churn behavior over time
- Deliver visual insights and business-ready metrics
- PostgreSQL
- Python (Pandas, Matplotlib, Seaborn)
- Jupyter Notebook
- VSCode (SQLTools Extension)
- Tracked total revenue and order volume by month
- Calculated and visualized average order value over time
- Output saved as
monthly_revenue.csv,monthly_order_count.csv, andmonthly_avg_order_value.csv
SELECT
TO_CHAR(o.order_purchase_timestamp, 'YYYY-MM') AS order_year_month,
ROUND(SUM(oi.price + oi.freight_value), 2) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_status = 'delivered'
GROUP BY order_year_month
ORDER BY order_year_month;
Figure 1: This line chart displays the total revenue generated from delivered orders over time. It highlights seasonal fluctuations, growth patterns, and potential dips in sales performance.
- Recency, Frequency, and Monetary scores calculated using PostgreSQL
- Customers classified into business segments:
| RFM Profile | Segment |
|---|---|
| R ≥ 3, F ≥ 3, M ≥ 3 | Champions |
| R ≥ 3, F ≥ 2 | Loyal |
| R = 4, F = 1 | New Customers |
| R ≤ 2, F ≥ 3 | At Risk |
| M ≤ 2, F ≤ 2 | Low Value |
| else | Other |
-
Segment-level metrics saved and visualized
-
Outputs:
customer_segments.csv,segment_metrics.csv, andsegment_analysis.png
Figure 2: The chart illustrates how different customer groups contribute to business value — helping identify high-volume vs. high-value targets like Champions and Loyal Customers, while spotlighting low-revenue or at-risk groups for strategic re-engagement.
This visual combines:
- 🔷 Blue bars: total number of customers in each segment
- 🟧 Orange bars: total revenue (in thousands of BRL) generated by that segment
- 🟢 Green line: average spend per customer in the segment
- Built cohort-based retention analysis by month
WITH customer_orders AS (
SELECT
c.customer_unique_id,
o.order_id,
TO_CHAR(o.order_purchase_timestamp, 'YYYY-MM') AS order_month
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_status = 'delivered'
),
cohorts AS (
SELECT
customer_unique_id,
MIN(order_month) AS cohort_month
FROM customer_orders
GROUP BY customer_unique_id
)
SELECT
c.customer_unique_id,
ch.cohort_month,
cu.order_month
FROM customer_orders cu
JOIN cohorts ch ON cu.customer_unique_id = ch.customer_unique_id
JOIN customers c ON c.customer_unique_id = cu.customer_unique_id;- Created a retention heatmap showing repeat behavior over time
Figure 3: This heatmap visualizes customer retention by monthly cohorts. The low-intensity values across most cohorts confirm that a majority of customers made only a single purchase and did not return in subsequent months. This insight highlights a key business challenge: limited repeat engagement, suggesting potential for targeted retention strategies or loyalty initiatives.
All key trends and patterns are visualized using matplotlib and seaborn for easy interpretation and presentation.
project-root/
│
├── data/ # Raw CSV files from Kaggle
├── sql/
│ ├── create_tables.sql # SQL schema creation
│ └── query_tables/ # Output CSVs and visual assets
├── notebooks/
│ └── ecommerce_analysis.ipynb # Full end-to-end notebook
└── README.md
│
└── figs/ # Saved plotted figures
- SQL-based data modeling and analytics
- Python data wrangling and visualization
- RFM segmentation and cohort analysis
- Business insights communication
- Reproducible project structure for clients or employers
🧠 Use This Project To:
- Showcase data storytelling on UpWork
- Submit a polished example in data analyst interviews
- Customize for future marketplace or CRM analytics projects
Created with 💼 job-readiness in mind.