Projects | TimothyMaina

01

Churn Analytics RFM Feature Engineering Visualization

✓ Completed

Customer Churn Analysis & Segmentation

Why do customers leave — and who is most at risk right now? This end-to-end project answers both questions using machine learning and RFM segmentation, turning raw e-commerce data into a clear, actionable retention playbook.

The Problem

The business had no systematic way to identify which customers were drifting toward churn or to distinguish loyal customers from one-time buyers — making retention efforts scattershot and expensive.

My Approach

Combined RFM feature engineering with a logistic regression churn model optimised for recall. K-Means clustering grouped customers into actionable segments — loyal, at-risk, inactive — surfaced through an interactive dashboard.

Key Outcomes

High-recall churn model ensuring at-risk customers are never missed
Inactivity & refund behaviour identified as strongest churn signals
5 distinct segments enabling targeted, cost-efficient retention campaigns
Dashboard connecting each insight directly to a recommended action

Tech Stack

PythonPandasscikit-learnK-MeansSeabornLogistic RegressionTableau

Ecommerce_project_portfolio.twb

Interactive Dashboard

Open in Tableau Desktop or the free Tableau Public Desktop to explore segment views, churn breakdown, and every filter.

Download Dashboard (.twb)

Free viewer: Tableau Public Desktop

Logistic Regression · K-Means Clustering

Input a customer's RFM profile and get an instant churn probability score, customer segment label, and a recommended retention action — powered by the model trained in this project.

Days inactive→Churn %

Order frequency→Segment

Total spend→Action

View on GitHub Next Project →

02

Fraud Analytics Machine Learning scikit-learn Visualization

✓ Completed

Fraud Detection Command Centre

Financial fraud hides in behavioural patterns — not just transaction amounts. This end-to-end system engineers the right signals, trains a model to catch them, and surfaces everything through a real-time command centre built for fraud analysts.

The Problem

Rule-based fraud detection flags too many false positives and misses evolving patterns. The challenge: build a behavioural model accurate on imbalanced, imperfect real-world data and communicate findings clearly to non-technical stakeholders.

My Approach

Engineered behavioural features — transaction velocity, device/IP repetition, time-window anomalies. Applied SMOTE for class imbalance, compared three models, and selected Random Forest for its precision-recall balance, feeding output into Tableau.

Key Outcomes

Random Forest achieving strong precision-recall on imbalanced data
High-frequency burst patterns identified as primary fraud signal
Device & IP repetition flagging catching repeat offenders
Tableau dashboard showing fraud trends, risk entities & financial impact

Tech Stack

PythonRandom ForestXGBoostSMOTETableauFeature Engineering

Fraud_project_portfolio.twb

Interactive Dashboard

Open in Tableau Desktop or the free Tableau Public Desktop to explore fraud trends, entity risk scores, and estimated financial impact.

Download Dashboard (.twb)

Free viewer: Tableau Public Desktop

Random Forest · SMOTE · Feature Engineering

Simulate a transaction by adjusting velocity, amount, device signals, and timing. The model returns an instant fraud probability score, primary risk signal, and recommended action.

Transaction amount→Fraud %

Velocity / hour→Risk level

Devices used→Action

View on GitHub Next Project →

03

ROI Analytics Visualization Machine Learning Data Cleaning

✓ Completed

Multi-Channel Marketing Attribution & ROI Optimization

A data-driven case study analysing multi-channel customer journeys and campaign performance to quantify channel contribution and predict revenue outcomes. Combines probabilistic attribution modelling with machine learning to support more efficient marketing budget allocation.

The Problem

Traditional attribution models such as last-click oversimplify customer journeys, misrepresenting the true contribution of channels. Businesses risk misallocating marketing budgets and overlooking inefficiencies in campaign performance.

My Approach

Built a Markov Chain attribution framework to capture transition behaviour across channels and quantify contribution using removal effects. In parallel, trained a tuned Random Forest revenue prediction model with time-aware validation for realistic forecasting.

Key Outcomes

Quantified true channel contribution using Markov Chain attribution
Achieved ~0.79 R² in revenue prediction on unseen data
CPA identified as the dominant driver of revenue
Demonstrated efficiency metrics outperform raw engagement metrics
Built ROI-driven marketing budget allocation framework

Tech Stack

PythonMarkov ChainRandom ForestFeature EngineeringPower BI

Multi-Channel Marketing Attribution

Interactive Dashboard

Open in Power BI Desktop to explore marketing attribution insights, customer journey behaviour, and data-driven opportunities for maximising campaign ROI.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

Random Forest · Markov Chain Attribution

Input campaign metrics — spend, impressions, CPA, and channel mix — and get a predicted revenue outcome plus recommended budget reallocation based on channel attribution scores.

Channel spend→Revenue $

CPA / conversions→ROI %

Channel mix→Best channel

View on GitHub Next Project →

04

Supply Chain Analytics Demand Forecasting Time Series Visualization

⚡ In Progress

Demand Forecasting & Inventory Optimization System

Addressing critical supply chain inefficiencies — stockouts, excess inventory, poor forecast accuracy — by combining advanced time series forecasting with inventory optimisation frameworks (Safety Stock, ROP, EOQ) and a Power BI command centre.

The Problem

Approximately 12% of SKU-days experience stockouts causing lost sales, while capital is tied up in slow-moving inventory. Excel-based forecasting fails to capture seasonality, and supplier delays further distort demand signals and disrupt inventory decisions.

My Approach

Compared ARIMA and Prophet forecasting models to improve demand prediction accuracy. Assessed inventory performance across SKUs and warehouses, evaluated supplier delays, and designed an optimisation framework incorporating Safety Stock, Reorder Point, and EOQ models.

Key Outcomes

Improved forecast accuracy significantly over Excel baseline
1,900+ stockout events identified impacting product availability
Thousands of delayed shipments flagged affecting supply reliability
Seasonal demand patterns captured using advanced forecasting
Inventory optimisation framework (SS, ROP, EOQ) designed for better control

Tech Stack

PythonProphetARIMASafety Stock · ROP · EOQFeature EngineeringPower BI

Demand Forecasting & Inventory Optimization

Interactive Dashboard

Open in Power BI Desktop to explore demand forecasts, stockout risks, supplier reliability, and inventory performance across warehouses.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

Prophet · ARIMA · Safety Stock · EOQ · ROP

Input SKU demand data — average daily demand, lead time, and holding costs — and get a demand forecast, safety stock recommendation, reorder point, and optimal order quantity instantly.

Daily demand + lead time→Forecast

Holding cost + order cost→EOQ

Service level→Safety Stock & ROP

View on GitHub Next Project →

05

Credit Risk Analytics Machine Learning Classification Risk Scoring Visualization

⚡ In Progress

Customer Credit Risk Assessment Model

Addressing high default rates, limited credit data, and ineffective approval systems by redesigning credit risk modeling through feature engineering, proxy target construction, and machine learning — enabling more accurate and risk-sensitive lending decisions.

The Problem

The lending system was underperforming with a default rate of 18.3% vs. 9.5% industry benchmark. A large portion of applicants lacked credit scores despite being creditworthy, while a single approval threshold applied across all loan sizes reduced decision precision. Informal income (35% of applicants) made traditional assessment unreliable, and the existing default variable showed weak predictive signal.

My Approach

Redesigned the modeling framework by reconstructing the target variable using a composite risk scoring approach. Introduced controlled randomness within credit score bands to reduce leakage, engineered key financial features such as loan-to-income and debt-to-income ratios, addressed class imbalance using SMOTE, and trained Logistic Regression, Random Forest, and XGBoost models.

Key Outcomes

Achieved up to 0.76 F1-score with XGBoost, balancing ~77% recall and ~75% precision
Improved detection of high-risk customers compared to baseline models
Reduced target leakage through stochastic feature engineering
Transformed an unreliable target into a structured, learnable risk signal
Demonstrated clear precision–recall trade-offs aligned with financial risk management

Tech Stack

PythonPandasNumPyScikit-learnXGBoostSMOTEFeature Engineering

Customer Credit Risk

Interactive Dashboard

Open in Power BI Desktop to explore customer risk segmentation, model predictions, and approval scenarios.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

Classification Model · Risk Scoring

Input borrower financial profile — income, debt ratio, credit history, loan amount — and get an instant default probability score, risk tier classification, and lending recommendation.

Income + debt ratio→Default %

Credit history→Risk tier

Loan amount→Recommendation

View on GitHub Next Project →

06

Logistics Analytics Regression Feature Engineering Machine Learning

⚡ In Progress

Delivery Downtime Prediction & Route Optimization

Predicting logistics delivery delays using operational, driver, and environmental signals to reduce missed SLAs, improve customer satisfaction, and support smarter routing decisions.

The Problem

Logistics companies struggle to accurately predict delivery delays due to dynamic factors such as traffic congestion, driver performance, and environmental conditions. Poor predictions lead to missed SLAs, customer dissatisfaction, and inefficient planning.

My Approach

Built a regression-based ML system to predict delivery delay hours using operational and behavioral data. Key engineered features included proxy variables for traffic congestion and weather risk derived from GPS speed patterns and historical delay behavior. Tree-based ensemble models were trained to capture nonlinear relationships across routes, time, and driver performance.

Key Outcomes

Achieved R² ≈ 0.70 using Random Forest Regressor
Engineered traffic congestion and weather risk without external APIs
Identified and removed data leakage features that inflated performance (~0.99 R²)
Demonstrated strong impact of driver performance and route conditions on delays
Built a realistic, production-aligned feature pipeline
Established a robust baseline for delay prediction in logistics systems

Tech Stack

PythonPandasNumPyScikit-learnXGBoostPower BI

Delivery Downtime Prediction & Route Optimization

Interactive Dashboard

Open in Power BI to explore delivery delay drivers, predictive risk signals, and operational improvement opportunities.

Download Dashboard (.pbix)

Free viewer: Power BI Desktop

Delay Prediction · Route Optimisation

Input shipment parameters — distance, carrier, weather conditions, and historical delay rate — and get a predicted delivery delay probability, estimated delay window, and route recommendation.

Distance + carrier→Delay %

Weather + time window→ETA adjustment

Historical delays→Best route

View on GitHub Next Project →

07

Visualization Tableau Excel Time Series

✓ Completed

Superstore Sales Executive Dashboard

A single-screen command centre that turns four years of Superstore data into executive-ready insights — combining performance analysis with forward-looking forecasts so decision-makers never have to dig through a spreadsheet again.

The Problem

Leadership needed visibility into which regions, segments, and product categories were driving — or dragging — profitability, with no easy way to project where sales were heading.

My Approach

Built a KPI-focused Tableau dashboard consolidating sales, profit, and order data. Layered in ARIMA/Prophet time series models for 12-month forecasting, and flagged loss-making categories directly in the viz.

Key Outcomes

Loss-making sub-categories surfaced and flagged automatically
Regional drill-downs revealing underperforming territories
12-month forecast enabling proactive stock and budget planning
Time-to-insight reduced from hours of Excel work to one dashboard

Tech Stack

TableauExcelARIMAProphetData Cleaning

SuperStoreSales_Project1.twb

Interactive Dashboard

Open in Tableau Desktop or the free Tableau Public Desktop to explore every filter, drill-down, and forecast.

Download Dashboard (.twb)

Free viewer: Tableau Public Desktop

View on GitHub ↑ Back to top