Kevin Waithaka Mugweru

Senior Data Scientist | Data & Business Analytics

Hi, I'm Kevin! Welcome to my corner of the internet. I'm a senior data scientist with 8 years of experience, including a 3-year detour into Operations and Strategic Management – the twist that lets me combine business insight with technical execution. I thrive at turning complex data into actionable strategy and can flex between senior data scientist, data analyst, and business-focused analytics roles. Explore my work, and if something sparks your interest, let's connect!

See my work

Portfolio

This portfolio showcases a complete end-to-end analytics platform engineered from the ground up — demonstrating how to architect an organization's entire data function for maximum technical rigor and business impact. Starting with synthetic data generation that simulates production-grade transaction patterns, flowing through real-time data engineering pipelines with automated orchestration, and scaling to production ML APIs serving risk assessment models. The platform integrates advanced statistical experimentation frameworks and rigorous testing methodology, culminating in unified business intelligence dashboards that transform technical infrastructure into strategic advantage. Each component demonstrates modern data engineering principles: from distributed processing and MLOps automation to sophisticated experimental design that drives data-driven decision making.

Simtom - Synthetic Data Generation Platform

Simtom - Synthetic Data Generation Platform

Python | FastAPI | Async Streaming | Plugin Architecture | Temporal Modeling

An open-source API generating realistic data for ML-Model testing in production environments

  • ✅ BNPL transaction generator with risk scoring
  • ✅ Multiple arrival patterns (Poisson, NHPP, Burst)
  • ✅ FastAPI streaming endpoints with time compression and dual-mode data generation modes (historical and current mode)
  • 🔄 Inter-feature correlation modeling for improved ML signal strength
  • 📝 Additional domain-specific generators
View Details
Flit Data Platform - Enterprise Data Warehouse

Flit Data Platform - Enterprise Data Warehouse

dbt | BigQuery | Redis | Python | Airflow | Data Engineering | ETL/ELT

Centralized data engineering platform serving experimentation, ML, BI, and AI teams with real-time Redis caching and comprehensive dbt transformations for analytics.

  • ✅ Complete dbt transformations for historical data ingestion and ML preparation
  • ✅ Synthetic data generation framework with experiment overlay
  • ✅ Live user assignment system for A/B testing experiments
  • ✅ Redis service handling real-time production transactions
  • 🔄 Batch Redis-to-BigQuery uploads with daily cadence for long-term storage
  • 📝 Airflow orchestration for ML model retraining pipelines
View Details
Flit Experiments - Statistical A/B Testing Framework

Flit Experiments - Statistical A/B Testing Framework

Python | Power Analysis | CUPED | Sequential Testing | Multi-Armed Bandits | Statistical Inference

Production-grade experimentation platform with advanced statistical methodology including CUPED variance reduction, sequential analysis, and automated business decision frameworks.

  • ✅ CLI-powered power analysis engine for sample size determination
  • ✅ Statistical testing suite (Welch's t-test, Mann-Whitney U, bootstrap CI)
  • ✅ CUPED variance reduction and stratified randomization
  • ✅ Sequential testing with automated business recommendation engine
  • 🔄 Multi-armed bandit implementation and factorial designs
  • 📝 O'Brien-Fleming sequential boundaries for early stopping rules
View Details
Flit ML API - Production ML Platform

Flit ML API - Production ML Platform

FastAPI | MLflow | Redis | Docker | Railway | SHAP | Multi-Model Serving

Live production ML API serving BNPL risk models with sub-100ms inference via FastAPI endpoints, champion/challenger testing, and comprehensive MLOps pipeline.

  • ✅ Live 4-model ensemble (Ridge, Logistic, Elastic Net, Voting) serving inference via API
  • ✅ Railway cloud deployment with Docker and shadow mode controller setup
  • ✅ Redis prediction logging and MLflow experiment tracking infrastructure
  • 🔄 MLflow experiments for model monitoring and A/B testing workflows
  • 🔄 Automated model drift detection and performance tracking
  • 📝 Automated retraining pipelines triggered by performance thresholds
View Details
Flit GPT - RAG Documentation System

Flit GPT - RAG Documentation System

Planning

Python | LangChain | ChromaDB | Ollama/Llama2 | Streamlit | Vector Search

Retrieval-augmented generation system enabling natural language queries across technical documentation and business knowledge. Built vector embedding pipeline for document indexing, conversational interface with context retention, and semantic search capabilities processing multi-format documents with citation tracking.

View Details
Flit Main - Unified Business Intelligence Platform

Flit Main - Unified Business Intelligence Platform

Planning

Streamlit | Python | Multi-API Integration | Data Visualization | Real-time Dashboards

Centralized business intelligence platform aggregating insights across data warehouse, experimentation results, ML model performance, and AI assistant usage. Built unified dashboard pulling real-time metrics from multiple microservices through API integration, featuring executive KPI monitoring, experiment result visualization, model drift detection, and cross-platform analytics with automated reporting capabilities.

View Details

Contact Me