Data Engineer with a passion
for building at scale

Ranjan Kumar Choubey

I am a Data Engineer at Deloitte, currently building enterprise-scale data pipelines for Walmart's data platform on Google Cloud. I engineer ingestion pipelines for 100+ tables across SAP systems into BigQuery, build Scala-based transformation JARs processing billions of rows, and orchestrate 20+ Airflow DAGs.

I hold an M.Tech in Computer Science (Data Science) from the Indian Statistical Institute, Kolkata — where I secured All India Rank 11 in the entrance exam. Before ISI, I spent 2 years at Tata Power building backend APIs, data pipelines, ML models, and automated reporting systems.

3+ Years Experience
AIR 11 ISI Entrance Exam
100+ Tables Ingested
20+ Airflow DAGs

Where I've worked

My professional journey in data engineering and software development.

Consultant — Data Engineer Deloitte
May 2025 – Present
Client: Walmart — Enterprise Data Platform (Ingestion & Consumption Layer)
  • Engineered ingestion pipelines for 100+ tables: SAP-ECC/EWM/CAR → ICDS → GCS Raw Zone → Catalog Zone → BigQuery External Tables for downstream analytics.
  • Built Scala-based JARs to process billions of rows, encapsulating complex BigQuery transformation logic for the consumption layer.
  • Optimized SQL query logic across the consumption layer, reducing query execution time by 40% and cutting compute costs on BigQuery.
  • Designed and managed 20+ Apache Airflow DAGs with retry logic, dependency management, SLA monitoring, and failure alerting.
  • Automated DAG and CCM configuration by building a Python tool that detects column sensitivity, maps schemas, and provisions GCS buckets.
  • Maintained CI/CD pipelines using Git for version-controlled deployments across dev, staging, and production.
Data Science Intern Allstate India
May 2024 – Jul 2024
  • Optimized baseline XGBoost model for payment default prediction, improving accuracy from 77% to 85% via geographic feature engineering.
  • Analyzed geographic data to identify default trends; presented to Bengaluru & US teams, leading to 3 targeted model enhancements.
Lead Engineer Tata Power
Sep 2022 – Aug 2023
  • Created ML forecasting model for day-ahead electricity trading on IEX, achieving 88% prediction accuracy.
  • Engineered automated pipeline integrating real-time plant telemetry (PI Server) with IEX market data, reducing manual collection by 50%.
  • Designed backend REST APIs using FastAPI and Node.js to serve data for 10+ Power BI dashboards.
Graduate Engineer Trainee Tata Power
Aug 2021 – Sep 2022
  • Deployed ML models on on-premise servers, managing full lifecycle from development to production with CI/CD.
  • Built email automation for employee onboarding — extracting, parsing, and updating DB, eliminating manual entry for 100+ employees/year.
  • Automated Power BI reporting for HR, operations, and engineering teams, reducing manual effort by 20%.

Academic background

Strong foundation in computer science and data science.

M.Tech — Computer Science (Data Science)
Indian Statistical Institute, Kolkata 2023 – 2025
AIR 11 in ISI Entrance AIR 1 EWS Category
B.Tech — Information Technology
BIT Sindri, Dhanbad 2017 – 2021
CGPA 8.89 — Rank 1 BITSAA Scholar '18 & '19

Tech stack

Technologies and tools I work with daily.

Languages
Python Scala SQL Bash
Data Engineering
Apache Airflow Apache Spark ETL/ELT Data Modeling CDC
Cloud & Warehousing
GCP BigQuery GCS Dataproc
Databases
BigQuery PostgreSQL MySQL SQL Server
Backend & APIs
FastAPI Node.js REST APIs
DevOps & Tools
Docker Git CI/CD Linux Power BI
ML & Analytics
Machine Learning Deep Learning PyTorch NLP

Real projects, real impact

Highlights from my professional experience.

Deloitte / Walmart

Enterprise Data Pipeline

End-to-end ingestion pipeline for 100+ SAP tables flowing through GCS Raw Zone, Catalog Zone, and into BigQuery External Tables for downstream analytics.

BigQuery GCS Airflow Scala Python
Deloitte / Walmart

DAG Config Automation Tool

Python tool that auto-detects column sensitivity, maps target schemas, renames SAP columns with business descriptions, and provisions GCS buckets — replacing hours of manual setup.

Python Airflow Automation GCS
Deloitte / Walmart

BigQuery Query Optimization

Optimized SQL logic across the consumption layer for large-scale datasets, reducing query execution time by 40% and cutting compute costs.

BigQuery SQL Performance
Tata Power

IEX Energy Trading Forecast

ML forecasting model for day-ahead electricity trading on Indian Energy Exchange, achieving 88% accuracy with real-time PI Server integration.

Python ML FastAPI Power BI
Tata Power

Automated Reporting System

End-to-end automated Power BI dashboards for HR, customer operations, and engineering teams. REST APIs with FastAPI and Node.js serving 10+ dashboards.

Power BI FastAPI Node.js REST APIs
Allstate India

Payment Default Prediction

Optimized XGBoost model improving payment default prediction from 77% to 85% accuracy using geographic feature engineering and hyperparameter tuning.

XGBoost Python Feature Eng.

Let's connect

Open to Data Engineering opportunities at product companies.