A real-world, end-to-end data engineering project — from raw SQL Server data to a polished Power BI semantic model, using Medallion Architecture, PySpark, and metadata-driven pipelines.
🎯 What this series covers:
- End-to-end architecture on Microsoft Fabric (Lakehouse + Warehouse + Pipelines)
- Medallion Architecture — Landing (Bronze), Transient (Silver), Curated (Gold)
- Metadata-driven pipeline design with config tables for scalable ingestion
- Two-workspace strategy — Dev vs Prod separation for enterprise governance
- PySpark notebooks, Delta Lake upserts, and T-SQL Warehouse writes
- Star Schema + DAX Semantic Model consumed by Power BI
🏗️ The Business Problem:
A fast-growing ride-sharing company needed to go from raw transactional SQL Server data to an analytics platform answering: Which hours have the most cancellations? Which drivers perform best? Where is revenue leaking?
📐 Architecture Overview:
- Source: On-Premises SQL Server (cab bookings, trips, drivers)
- Bridge: On-Premises Data Gateway → Microsoft Fabric
- Ingestion: pl_ingest_landing → Lakehouse (Landing schema · Delta)
- Transform: PySpark Notebooks → Transient → Curated layers
- Warehouse: dwh_taxi_business (edw schema · T-SQL)
- Reporting: Semantic Model + Power BI (Star Schema)