End-to-End Data Engineering Portfolio

Oil PricePipeline

12 sequential stages from raw market data to interactive portfolio. PostgreSQL data warehouse, DuckDB/Parquet medallion lakehouse, FastAPI dual-backend, Docker, Kubernetes, CI/CD, and Prometheus monitoring.

Yahoo Finance
Ingestor
PostgreSQL
Lakehouse
DuckDB
FastAPI
Portfolio

0

Stages

0+

Price Records

0

Tests

0

Docs Lines

Architecture

System Diagrams

Six architecture diagrams generated with Mermaid CLI. Click any diagram to expand.

Stage Explorer

12 Stages

Click any stage to expand the implementation details, tech stack, and key decisions.

Designed a dimensional data warehouse using a Star Schema with one central fact table and three dimension tables. Followed Kimball methodology: surrogate integer keys, SCD Type 2 for slowly changing commodity attributes, and a date dimension pre-populated with 10+ years of calendar data.

PostgreSQLSQLStar Schema

Key Highlights

  • dim_date with ISO weekday, quarter, fiscal-year columns
  • dim_commodity with SCD Type 2 (valid_from / valid_to)
  • UNIQUE constraint on (date_key, commodity_key, source_key)

Live Dashboard

Data Visualisation

Charts connect to the live API when available, falling back to pre-computed sample data on this static site.

API Reference

REST Endpoints

5 data endpoints across dual backends. Click any endpoint to see parameters and a sample response.

Technologies

Full Stack

20 technologies across data engineering, API, infrastructure, and frontend.

Data

PostgreSQL

Data warehouse — Star Schema, stored procedures

DuckDB

In-process analytics over Parquet (Gold layer)

Apache Parquet

Columnar storage format for lakehouse layers

PyArrow

Schema-typed Parquet write with Hive partitioning

yfinance

Yahoo Finance OHLCV data source

API

FastAPI

Async REST API with dual PostgreSQL + DuckDB backends

Pydantic v2

Request validation, settings management

psycopg v3

Async PostgreSQL adapter with connection pooling

structlog

Structured JSON logging

Infra

Docker

Multi-service containerization with named volumes

Kubernetes

29 manifests: Deployments, Jobs, PVCs, HPA

Helm

Templated chart with environment-specific values

Prometheus

5 metrics scraped from MetricsMiddleware

Grafana

Auto-provisioned 10-panel API dashboard

GitHub Actions

6-job CI/CD: lint → test → build → deploy

ruff

Python linter + formatter (rules E/W/F/I/UP/B/SIM)

Frontend

Next.js 14

Static portfolio with App Router

Recharts

Interactive data visualisation charts

Framer Motion

Scroll-triggered animations and transitions

Mermaid

6 architecture diagrams exported to PNG