Portfolio

Projects

End-to-end data projects — from raw, messy data to decisions that matter.

5Completed
7Projects
100%Open Source
Projects
01
Churn Analytics RFM Feature Engineering Visualization
✓ Completed

Customer Churn Analysis & Segmentation

Why do customers leave — and who is most at risk right now? This end-to-end project answers both questions using machine learning and RFM segmentation, turning raw e-commerce data into a clear, actionable retention playbook.

The Problem

The business had no systematic way to identify which customers were drifting toward churn or to distinguish loyal customers from one-time buyers — making retention efforts scattershot and expensive.

My Approach

Combined RFM feature engineering with a logistic regression churn model optimised for recall. K-Means clustering grouped customers into actionable segments — loyal, at-risk, inactive — surfaced through an interactive dashboard.

Key Outcomes

  • High-recall churn model ensuring at-risk customers are never missed
  • Inactivity & refund behaviour identified as strongest churn signals
  • 5 distinct segments enabling targeted, cost-efficient retention campaigns
  • Dashboard connecting each insight directly to a recommended action

Tech Stack

PythonPandasscikit-learnK-MeansSeabornLogistic RegressionTableau
Ecommerce_project_portfolio.twb
Customer Churn Dashboard

Interactive Dashboard

Open in Tableau Desktop or the free Tableau Public Desktop to explore segment views, churn breakdown, and every filter.

Download Dashboard (.twb)

Free viewer: Tableau Public Desktop

ML Model — Live Demo
Launch Churn Predictor →
Logistic Regression · K-Means Clustering

Input a customer's RFM profile and get an instant churn probability score, customer segment label, and a recommended retention action — powered by the model trained in this project.

Days inactiveChurn %
Order frequencySegment
Total spendAction
02
02
Fraud Analytics Machine Learning scikit-learn Visualization
✓ Completed

Fraud Detection Command Centre

Financial fraud hides in behavioural patterns — not just transaction amounts. This end-to-end system engineers the right signals, trains a model to catch them, and surfaces everything through a real-time command centre built for fraud analysts.

The Problem

Rule-based fraud detection flags too many false positives and misses evolving patterns. The challenge: build a behavioural model accurate on imbalanced, imperfect real-world data and communicate findings clearly to non-technical stakeholders.

My Approach

Engineered behavioural features — transaction velocity, device/IP repetition, time-window anomalies. Applied SMOTE for class imbalance, compared three models, and selected Random Forest for its precision-recall balance, feeding output into Tableau.

Key Outcomes

  • Random Forest achieving strong precision-recall on imbalanced data
  • High-frequency burst patterns identified as primary fraud signal
  • Device & IP repetition flagging catching repeat offenders
  • Tableau dashboard showing fraud trends, risk entities & financial impact

Tech Stack

PythonRandom ForestXGBoostSMOTETableauFeature Engineering
Fraud_project_portfolio.twb
Fraud Detection Dashboard

Interactive Dashboard

Open in Tableau Desktop or the free Tableau Public Desktop to explore fraud trends, entity risk scores, and estimated financial impact.

Download Dashboard (.twb)

Free viewer: Tableau Public Desktop

ML Model — Live Demo
Launch Fraud Detector →
Random Forest · SMOTE · Feature Engineering

Simulate a transaction by adjusting velocity, amount, device signals, and timing. The model returns an instant fraud probability score, primary risk signal, and recommended action.

Transaction amountFraud %
Velocity / hourRisk level
Devices usedAction
03
03
ROI Analytics Visualization Machine Learning Data Cleaning
✓ Completed

Multi-Channel Marketing Attribution & ROI Optimization

A data-driven case study analysing multi-channel customer journeys and campaign performance to quantify channel contribution and predict revenue outcomes. Combines probabilistic attribution modelling with machine learning to support more efficient marketing budget allocation.

The Problem

Traditional attribution models such as last-click oversimplify customer journeys, misrepresenting the true contribution of channels. Businesses risk misallocating marketing budgets and overlooking inefficiencies in campaign performance.

My Approach

Built a Markov Chain attribution framework to capture transition behaviour across channels and quantify contribution using removal effects. In parallel, trained a tuned Random Forest revenue prediction model with time-aware validation for realistic forecasting.

Key Outcomes

  • Quantified true channel contribution using Markov Chain attribution
  • Achieved ~0.79 R² in revenue prediction on unseen data
  • CPA identified as the dominant driver of revenue
  • Demonstrated efficiency metrics outperform raw engagement metrics
  • Built ROI-driven marketing budget allocation framework

Tech Stack

PythonMarkov ChainRandom ForestFeature EngineeringPower BI
Multi-Channel Marketing Attribution
Marketing Attribution Dashboard

Interactive Dashboard

Open in Power BI Desktop to explore marketing attribution insights, customer journey behaviour, and data-driven opportunities for maximising campaign ROI.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

ML Model — Live Demo
Launch ROI Predictor →
Random Forest · Markov Chain Attribution

Input campaign metrics — spend, impressions, CPA, and channel mix — and get a predicted revenue outcome plus recommended budget reallocation based on channel attribution scores.

Channel spendRevenue $
CPA / conversionsROI %
Channel mixBest channel
04
04
Supply Chain Analytics Demand Forecasting Time Series Visualization
⚡ In Progress

Demand Forecasting & Inventory Optimization System

Addressing critical supply chain inefficiencies — stockouts, excess inventory, poor forecast accuracy — by combining advanced time series forecasting with inventory optimisation frameworks (Safety Stock, ROP, EOQ) and a Power BI command centre.

The Problem

Approximately 12% of SKU-days experience stockouts causing lost sales, while capital is tied up in slow-moving inventory. Excel-based forecasting fails to capture seasonality, and supplier delays further distort demand signals and disrupt inventory decisions.

My Approach

Compared ARIMA and Prophet forecasting models to improve demand prediction accuracy. Assessed inventory performance across SKUs and warehouses, evaluated supplier delays, and designed an optimisation framework incorporating Safety Stock, Reorder Point, and EOQ models.

Key Outcomes

  • Improved forecast accuracy significantly over Excel baseline
  • 1,900+ stockout events identified impacting product availability
  • Thousands of delayed shipments flagged affecting supply reliability
  • Seasonal demand patterns captured using advanced forecasting
  • Inventory optimisation framework (SS, ROP, EOQ) designed for better control

Tech Stack

PythonProphetARIMASafety Stock · ROP · EOQFeature EngineeringPower BI
Demand Forecasting & Inventory Optimization
Inventory Optimization Dashboard

Interactive Dashboard

Open in Power BI Desktop to explore demand forecasts, stockout risks, supplier reliability, and inventory performance across warehouses.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

ML Model — Live Demo
Launch Inventory Simulator →
Prophet · ARIMA · Safety Stock · EOQ · ROP

Input SKU demand data — average daily demand, lead time, and holding costs — and get a demand forecast, safety stock recommendation, reorder point, and optimal order quantity instantly.

Daily demand + lead timeForecast
Holding cost + order costEOQ
Service levelSafety Stock & ROP
05
05
Credit Risk Analytics Machine Learning Classification Risk Scoring Visualization
⚡ In Progress

Customer Credit Risk Assessment Model

Addressing high default rates, limited credit data, and ineffective approval systems by redesigning credit risk modeling through feature engineering, proxy target construction, and machine learning — enabling more accurate and risk-sensitive lending decisions.

The Problem

The lending system was underperforming with a default rate of 18.3% vs. 9.5% industry benchmark. A large portion of applicants lacked credit scores despite being creditworthy, while a single approval threshold applied across all loan sizes reduced decision precision. Informal income (35% of applicants) made traditional assessment unreliable, and the existing default variable showed weak predictive signal.

My Approach

Redesigned the modeling framework by reconstructing the target variable using a composite risk scoring approach. Introduced controlled randomness within credit score bands to reduce leakage, engineered key financial features such as loan-to-income and debt-to-income ratios, addressed class imbalance using SMOTE, and trained Logistic Regression, Random Forest, and XGBoost models.

Key Outcomes

  • Achieved up to 0.76 F1-score with XGBoost, balancing ~77% recall and ~75% precision
  • Improved detection of high-risk customers compared to baseline models
  • Reduced target leakage through stochastic feature engineering
  • Transformed an unreliable target into a structured, learnable risk signal
  • Demonstrated clear precision–recall trade-offs aligned with financial risk management

Tech Stack

PythonPandasNumPyScikit-learnXGBoostSMOTEFeature Engineering
Customer Credit Risk
Credit Risk Dashboard

Interactive Dashboard

Open in Power BI Desktop to explore customer risk segmentation, model predictions, and approval scenarios.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

ML Model — Live Demo
Launch Credit Risk Assessor →
Classification Model · Risk Scoring

Input borrower financial profile — income, debt ratio, credit history, loan amount — and get an instant default probability score, risk tier classification, and lending recommendation.

Income + debt ratioDefault %
Credit historyRisk tier
Loan amountRecommendation
06
06
Logistics Analytics Regression Feature Engineering Machine Learning
⚡ In Progress

Delivery Downtime Prediction & Route Optimization

Predicting logistics delivery delays using operational, driver, and environmental signals to reduce missed SLAs, improve customer satisfaction, and support smarter routing decisions.

The Problem

Logistics companies struggle to accurately predict delivery delays due to dynamic factors such as traffic congestion, driver performance, and environmental conditions. Poor predictions lead to missed SLAs, customer dissatisfaction, and inefficient planning.

My Approach

Built a regression-based ML system to predict delivery delay hours using operational and behavioral data. Key engineered features included proxy variables for traffic congestion and weather risk derived from GPS speed patterns and historical delay behavior. Tree-based ensemble models were trained to capture nonlinear relationships across routes, time, and driver performance.

Key Outcomes

  • Achieved R² ≈ 0.70 using Random Forest Regressor
  • Engineered traffic congestion and weather risk without external APIs
  • Identified and removed data leakage features that inflated performance (~0.99 R²)
  • Demonstrated strong impact of driver performance and route conditions on delays
  • Built a realistic, production-aligned feature pipeline
  • Established a robust baseline for delay prediction in logistics systems

Tech Stack

PythonPandasNumPyScikit-learnXGBoostPower BI
Delivery Downtime Prediction & Route Optimization
Logistics Dashboard

Interactive Dashboard

Open in Power BI to explore delivery delay drivers, predictive risk signals, and operational improvement opportunities.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

ML Model — Live Demo
Launch Delay Predictor →
Delay Prediction · Route Optimisation

Input shipment parameters — distance, carrier, weather conditions, and historical delay rate — and get a predicted delivery delay probability, estimated delay window, and route recommendation.

Distance + carrierDelay %
Weather + time windowETA adjustment
Historical delaysBest route
07
07
Visualization Tableau Excel Time Series
✓ Completed

Superstore Sales Executive Dashboard

A single-screen command centre that turns four years of Superstore data into executive-ready insights — combining performance analysis with forward-looking forecasts so decision-makers never have to dig through a spreadsheet again.

The Problem

Leadership needed visibility into which regions, segments, and product categories were driving — or dragging — profitability, with no easy way to project where sales were heading.

My Approach

Built a KPI-focused Tableau dashboard consolidating sales, profit, and order data. Layered in ARIMA/Prophet time series models for 12-month forecasting, and flagged loss-making categories directly in the viz.

Key Outcomes

  • Loss-making sub-categories surfaced and flagged automatically
  • Regional drill-downs revealing underperforming territories
  • 12-month forecast enabling proactive stock and budget planning
  • Time-to-insight reduced from hours of Excel work to one dashboard

Tech Stack

TableauExcelARIMAProphetData Cleaning
SuperStoreSales_Project1.twb
Superstore Dashboard

Interactive Dashboard

Open in Tableau Desktop or the free Tableau Public Desktop to explore every filter, drill-down, and forecast.

Download Dashboard (.twb)

Free viewer: Tableau Public Desktop

Jupyter Notebook Preview