Projects and Blog
Local RAG-Powered Semantic Search Engine for Open-Source Epidemiology Documentation
Built an end-to-end, offline-capable Retrieval-Augmented Generation (RAG) pipeline and semantic search backend for the Epiverse-Connect ecosystem using a two-stage retrieval system and local LLMs.
Semantic Search Backend for resource constrained enviroments.
Semantic search backend using text embeddings (multi-qa-MiniLM-L6-cos-v1) with FastAPI. Features stateless scheduled jobs (Python/R) on Azure Functions for data acquisition/embedding generation and a REST API for querying the knowledge base.
GutGenome Intelligence: LLM-Powered Microbiome Diagnostic System
Developed a sophisticated LLM pipeline that transforms raw 16S rRNA gene sequencing data into clinically-grounded veterinary diagnostic reports. The system leverages Google's Gemini LLM to interpret complex taxonomic profiles, providing automated analysis of dysbiosis scores, functional pathways (such as SCFA and LPS activity), and neuroinflammatory markers. It delivers actionable health insights, including dietary sensitivity predictions and FMT suitability assessments, bridging the gap between raw genomic data and clinical decision-making.
Customer Review Dashboard using Topic modelling
Built an interactive dashboard in Streamlit using Python and NLP techniques (sentiment analysis, topic modeling) to analyze unstructured customer reviews, uncovering key insights for product and marketing strategy.
Customer Conversion Prediction to optimise ad spend
Built a predictive supervised machine learning model to predict if a user installing the app will result in conversion. Achieved 93% accuracy on test set.
NGO Impact Assessment
Consulting project for VSM to help them get an insight into their program implementation and evaluation strategies. Conducted impact assessment utilizing data storytelling to streamline their grant reporting and fundraising strategy.