Ninad Patil · Data Engineer & Analyst

I work at the intersection of data engineering and analytics, designing pipelines that keep stakeholders out of CSV hell. My day-to-day includes building Azure Databricks Medallion architectures, orchestrating ingestion with ADF, and modelling trusted Power BI datasets that Finance, Operations, and Marketing use every day.

I enjoy turning ambiguous business questions into concrete models, metrics, and dashboards – from near real-time drink league boards at The DRG to SCD2 dimensions in Databricks and governed Snowflake warehouses. My portfolio reflects end-to-end thinking: ingestion, modelling, optimisation, BI, and documentation.

Experience

~4 years across Data Engineering and Data Analysis

Education

MSc in Advance Computer Science with Data Science (UK) with a focus on cloud, Python, and modern BI; strong foundation in statistics and SQL.

Currently

Building Azure-centric pipelines at The DRG, expanding portfolio projects in Databricks, dbt, Snowflake and streaming.

Designed parameterised ADF pipelines + Databricks notebooks for EPOS, bookings and rota data, enabling idempotent, drift-tolerant ingestion into Bronze/Silver layers.
Migrated MySQL workloads to Azure PostgreSQL with partitioning and indexing, improving key BI query latency by ~35%.
Automated daily extraction from ResDiary, Rotaready and other APIs, standardising file layouts and removing manual CSV wrangling.
Modelled governed gold datasets for Power BI (e.g. Drink League Board, voucher tracking), cutting refresh failures and boosting stakeholder adoption.
Reduced manual reporting by ~16 hours/week through orchestration, scheduled refreshes and distribution to 20+ sites.
Cut daily data refresh time by ~50%, improving BI readiness for Finance and Ops.

Automated SQL/Python ingestion and preprocessing, saving ~10 hrs/week and speeding release cycles.
Co-designed a governed preprocessing framework, standardising rules and reducing rework across releases.
Implemented recurring reconciliations and data quality checks, improving stakeholder trust in reports.
Built 20+ reports and dashboards supporting operational and regulatory decision-making.

Centralised student and operational data into structured models for analysis and reporting.
Built dashboards to track admissions, performance and attendance, helping optimise academic planning.
Collaborated directly with non-technical stakeholders, translating requirements into metrics and visuals.

Azure (ADLS Gen2, Databricks, ADF, Logic Apps, SQL DB, SQL Warehouse), Snowflake for analytics warehousing and dbt-driven transformations.

Azure Databricks Azure ADF ADLS Gen2 Snowflake

ETL/ELT, CDC, Medallion architecture, Delta Live Tables, SCD2 dims, Type-1 facts, data quality rules and observability.

Medallion Delta Lake DLT SCD2

Azure Data Factory (Copy, Mapping Data Flows, SHIR), Logic Apps, GitHub Actions and Databricks Asset Bundles for CI/CD.

ADF Pipelines GitHub Actions Databricks Asset Bundles

Python (Pandas, requests, automation), SQL (PostgreSQL, MySQL), DAX, REST / HTTP APIs, JSON / XML handling.

Python PostgreSQL DAX REST APIs

Power BI (DAX, Row-Level Security, modelling), advanced Excel, and clear stakeholder-facing data stories.

Power BI RLS Excel

Docker, Git/GitHub, documentation & runbooks, plus a strong focus on reproducible, production-friendly data workflows.

Docker GitHub Runbooks

Azure · Databricks · DLT

End to End Azure Databricks Data Engineering Project

This project is a full Azure Data Engineering build that ingests from a cloud-hosted Azure SQL Database into ADLS Gen2 using Azure Data Factory (ADF) with incremental loading and backfilling (not a full refresh). Data is refined in Azure Databricks with Spark Structured Streaming + Autoloader, governed by Unity Catalog, and modeled into a star schema with Slowly Changing Dimensions (SCD Type 2). The Gold layer is curated via Delta Live Tables (DLT), and deployments follow CI/CD best practices using Databricks Asset Bundles and GitHub. Logic Apps provide email alerts on ADF failures. The project also covers the full resource setup (RG, Storage with bronze/silver/gold, ADF, SQL DB, Databricks workspace).

AWS S3· Snowflake · dbt

Airbnb Lakehouse Pipeline S3 Snowflake dbt

This project showcases a full ELT pipeline built with AWS S3 → Snowflake → dbt, designed around a Bronze/Silver/Gold modeling approach. I ingested Airbnb-style CSV datasets into Snowflake using stages and COPY INTO, then used dbt to build incremental raw models (Bronze), cleaned and enriched models (Silver), and analytics-ready models (Gold). The Gold layer includes a metadata-driven OBT for maintainable joins and a Star Schema powered by SCD2 snapshots to preserve dimensional history for accurate time-based reporting

Kafka · Snowflake · dbt

Real Time Stock Market Data Pipeline

I built a real-time Stock Market Data Pipeline that streams live quotes end-to-end for fast, reliable analytics. Kafka ingests events to an S3 raw landing zone, and Airflow micro-batches them into Snowflake via stage-based PUT/COPY with idempotent loads. In Snowflake, dbt applies the Medallion pattern (Bronze/Silver/Gold), converting JSON VARIANT into clean, tested, lineage-tracked models. The entire stack runs in Docker Compose, enabling one-command spin-up and easy portability across environments. ~1-minute KPIs and candlesticks for BI with strong governance, observability, and clear separation of concerns.

Databricks · DLT

Databricks Medallion Lakehouse — Declarative Pipelines (DLT)

This project delivers an end-to-end Databricks Lakehouse built entirely with Lakeflow Declarative Pipelines (DLT). It implements the Medallion pattern: Bronze landing with Expectations, Silver enrichment via Auto-CDC (Type-1 upserts) exposed through stable views, and Gold with SCD2 dimensions, a Type-1 fact table, and a full-history materialized business view. The project supports streaming and batch in one pipeline, comes with reusable utilities, and provides SQL scripts to seed/increment data plus screenshots of each stage.

Azure · ADF · Medallion

Azure Data Factory Hybrid Medallion Lakehouse (On-Prem + API + SQL)

This project is a production-style Azure Data Factory (ADF) build that ingests data from on-prem file shares, REST APIs, and Azure SQL Database into ADLS Gen2. It models the data using the Medallion architecture: Bronze – fast, schema-light landing (CSV/JSON/Parquet/Delta) for raw capture; Silver – standardized Delta with data cleaning, type casting, derivations, and upsert logic keyed by business IDs; Gold – curated business views (joins, aggregations, dense ranking, Top-N) refreshed with overwrite. Orchestration is handled by a Parent Pipeline that chains the three ingestion paths and kicks off transformations. Hybrid connectivity to the on-prem share is enabled via Self-Hosted Integration Runtime (SHIR). A publish folder is included for ARM-based CI/CD.

dbt · Databricks

DBT Databricks Scd2 Data Quality

A production-style dbt Core project on Databricks (Unity Catalog) implementing the Medallion architecture (Bronze → Silver → Gold). Raw sources are declared in YAML and landed to Bronze, standardized and joined in Silver, and delivered as analytics marts in Gold with SCD Type-2 history via dbt snapshots. Quality is enforced through generic, singular, and custom generic tests, plus seeds for lookups and Jinja macros for reusable logic. The repo uses environment-driven profiles.yml, clear lineage, incremental dbt models, and a CI-friendly dbt build flow making it reliable, auditable, and deployment-ready.

More projects coming soon…

Direct

pninad009@gmail.com

Phone

+44 7471 566316

+91 75075 18753

Online

linkedin.com/in/pninad

Actively exploring data roles in the UK & EU

Best way to reach me: email or a short LinkedIn message with context about your stack and challenges.

Turning messy data into reliable decisions

About Me

Experience

Education

Currently

Work Experience

Data Engineer

Data Analyst

Data Analyst

Skills

Cloud / Platform

Data Engineering

Orchestration

Programming & Query

Analytics & BI

Other

Projects