Most teams stop at accuracy. I ask: why did the model decide that — and can you prove it?
I'm a Machine Learning Engineer specializing in Explainable AI (XAI) and clinical ML systems. My work sits at the boundary between statistical rigor and software engineering — building pipelines that are not only high-performing but accountable.
My research uncovered what I call the Explainability Paradox: visually convincing saliency maps that fail causal validity tests. That finding is now under peer review.
Most evaluation stops at accuracy_score. TrustLens goes deeper.
A single analyze() call surfaces calibration drift, subgroup bias, failure patterns, and representation quality — the things that matter in production, but don't appear on leaderboards.
from trustlens import analyze
report = analyze(model, X_val, y_val, y_prob=proba)
# → Calibration · Bias · Failure Modes · RepresentationLive on PyPI · Built with production CI/CD (multi-Python testing, Ruff, MyPy) · Active contributor community
→ Full writeup | PyPI package | Repository
Quantitative Faithfulness Benchmarking of CNNs vs. Vision Transformers: Implications for Clinical Trustworthiness
I Trained 3 different Models (VGG16, ViT B/16 and Custom CNN) and ran GradCAM++ and EigenCam on a chest X-ray dataset and found something counterintuitive: visually plausible heatmaps lacked causal validity. A 6-dimensional benchmark along with Pixel Deletion (AOPC/AUC) showed that patch-based Transformer attention was causally faithful where CNNs weren't — despite CNNs looking more "correct" to the human eye. I call this the Explainability Paradox.
Metrics used: Sparsity · Entropy · Inter-Method Agreement · AOPC/AUC · Bonferroni-corrected non-parametric testing
→ Project writeup | Repository
| System | Stack | Live | Highlight |
|---|---|---|---|
| CardioSense-AI | XGBoost · FastAPI · Docker · Optuna | 🟢 Live | 90.16% acc · 0.9524 AUC · "Least Effort Path" optimizer for patient intervention |
| Breast Cancer MLOps Suite | Random Forest · Z-Score Drift · Streamlit | 🟢 Live | 98.2% acc · Real-time out-of-distribution detection |
| Respiratory Disease Classifier | VGG16 · ViT-B/16 · GradCAM++ · LIME | Research | 99% recall for COVID-19 · Explainability Paradox discovery |
| Apple Sales Intelligence | Scikit-Learn · SciPy SLSQP · Streamlit | 🟢 Live | Constrained optimization for hardware-mix revenue maximization |
| Patient Safety Guardian | Gemini 2.5 Pro · Google ADK · Streamlit | 🟢 Live | Kaggle Agents Intensive · Multi-agent clinical safety net · 100% critical interaction detection |
I write derivation-first articles — intuition before formulas, full proofs included. No hand-waving.
| Article | What It Covers |
|---|---|
| Gradient Descent | Partial derivatives · learning rates · convergence from first principles |
| Lagrange Multipliers | Constrained optimization · dual problems · geometric intuition |
| Bias–Variance Trade-Off | The fundamental tension between model simplicity and prediction accuracy |
| Linear Regression | OLS derivation · normal equations · assumption breakdown |
| Logistic Regression | Sigmoid · MLE · cross-entropy loss gradient derivation |
ML / DL PyTorch · XGBoost · Scikit-Learn · VGG16 · ViT · Optuna
XAI SHAP · LIME · GradCAM++ · EigenCAM · Pixel Deletion (AOPC/AUC)
MLOps FastAPI · Docker · GitHub Actions CI/CD · Streamlit · REST APIs
Data Engineering Python · SQL · Pandas · NumPy · PCA · K-Means · Plotly
Drift Detection Z-Score · Counterfactual Analysis · Synthetic Stress Testing
"In God we trust. All others must bring data." — W. Edwards Deming
If your model can't explain itself, it has no business making decisions.



