About me

Hello, I'm Mohan Bhosale, a Data Scientist with 4+ years of experience building ML systems that actually ship to production. I'm currently completing my Master's in Data Science at Northeastern University (GPA: 3.96) and most recently worked as a Data Scientist Co-op at Cohere Health, where I built predictive models analyzing 10M+ healthcare claims on AWS SageMaker and implemented end-to-end MLOps pipelines that helped optimize $8.5M in annual medical expenses.

My industry experience spans healthcare, ad-tech, and ed-tech. At Mediamint, I designed real-time Spark/Kafka pipelines, built customer segmentation models that lifted engagement by 25%, and led A/B testing that drove a 30% increase in ad performance. At Allround Club, I built a hybrid recommendation engine with TensorFlow and Apache Spark that boosted course purchases by 25% and ran churn analysis that cut attrition by 12%. I don't just train models. I build the data infrastructure, run the experiments, and deliver dashboards that stakeholders actually use.

My recent work focuses on LLMs and multi-agent AI systems in production settings. I've architected ClassifyAI, an 8-agent LLM system that automates full ML classification pipelines via LangGraph. I built HIMAS, a federated learning platform for privacy-preserving ICU mortality prediction that won 1st Place at the Google Cambridge MLOps Hackathon. I've also developed a multi-modal video ad classifier (F1: 0.81) using PyTorch and Transformers for Prof. Yakov Bart, and built Extended-Reality remote assistance systems with LLM-driven avatars in Prof. Mallesham's EXP research lab. My toolkit covers the full pipeline: PySpark ETL, feature engineering, LangChain/RAG, Docker/Kubernetes deployments, and cloud infrastructure across AWS, GCP, and Azure.

I'm looking for roles where I can combine deep ML expertise with real engineering discipline to solve hard problems at scale, especially in healthcare AI, ML infrastructure, or any team where data science means shipping production systems, not just notebooks.

What i'm doing

  • DS icon

    Building Production-Ready ML Solutions

    Experienced in architecting and deploying scalable machine learning models in production environments. Currently working with healthcare claims data to build predictive models using PySpark and advanced ML frameworks on AWS SageMaker, optimizing millions in medical expenses while improving patient outcomes.

  • Research icon

    Research

    Actively engaged in cutting-edge research in AI, computer vision, and NLP. My research interests include multi-modal learning, generative AI, and Large Language Models. I've developed innovative solutions like multi-modal video ad classifiers and RAG-based document analysis systems that push the boundaries of AI applications.

  • Analytics icon

    Data Analytics

    Expert in extracting actionable insights from complex, large-scale datasets using advanced statistical methods and visualization tools. Proficient in designing real-time data pipelines, conducting A/B testing, and creating comprehensive dashboards that drive data-informed decision-making and business strategy.

  • Coding icon

    Problem Solving

    Specialized in tackling intricate data science problems across diverse domains. From healthcare optimization to marketing automation, I consistently deliver innovative solutions that enhance efficiency, reduce costs, and drive measurable business impact through advanced analytics and machine learning techniques.

Resume

Latest Resume Link

Education

  1. Northeastern University Boston, MA, US

    Masters of Science in Data Science Jan, 2024 - May, 2026

    Relavant Coursework: Machine Learning (math-oriented), Statistics (R), Deep Learning, Computer Vision, Knowledge Graphs and LLMs, Competitive Programming, Applied Algorithms (Python), Advanced Database Systems (SQL, PostgreSQL), Engineering Cloud Computing.

  2. VIT University Vellore, TN, India

    Bachelor of Technology Jul, 2017 - May, 2021

    Relavant Coursework: Computational Thinking and Problem Solving (C), Computer Programming (C++), Linear Algebra, Computer System Architecture, Calculus.

Work Experience

  1. Cohere Health Boston, MA, USA

    Data Scientist Jan 2025 - Aug 2025

    A clinical intelligence fast-paced startup where I am currently working to improve/build data science solutions for the prior authorization process of the healthtech platform.


    • Built predictive models using PySpark and ML frameworks (TensorFlow/PyTorch) on AWS SageMaker for prior authorization process, analyzing 10M+ healthcare claims records to optimize $8.5M in annual medical expenses and reducing patients wait times

    • Implementing end-to-end MLOps pipelines integrating S3 data lakes and AWS Glue ETL workflows, with automated model

    • Designed Tableau dashboards tracking 15+ KPIs for clinical programs, reducing reporting time 40% through parameterized SQL queries

  2. MediaMint Hyderabad, TN, India

    Data Scientist Feb, 2022 - Jul, 2023

    A digital marketing firm where I leveraged data science techniques to significantly increase digital campaign performance, driving higher engagement and ROI.


    • Enhanced forecasting accuracy by 30% through the development and implementation of advanced time series models utilizing machine learning algorithms such as KNN, ARIMA, and SVR to predict campaign performance effectively

    • Boosted client ROI by 20% by designing and deploying interactive Tableau dashboards for marketing campaign analysis, enabling data-driven decision-making and optimization strategies

    • Optimized data interaction and storage by proficiently managing SQL queries to facilitate seamless operations on MySQL Server, improving data retrieval efficiency by 25%

    • Automated report generation, creating over 10 Python scripts that streamlined the analysis and reporting process for marketing platforms like CM360 & DV360 reducing manual effort by 60% and enhancing report accuracy

    • Led a cross-functional initiative to integrate predictive analytics into marketing strategies, resulting in a 15% increase in campaign engagement rates through targeted customer segmentation

  3. Allround Club Banglore, KA, India

    Data Analyst Jan, 2021 - Feb, 2022

    An early-stage edutech startup where I managed a team and employed data science techniques to significantly grow the customer base for the application.


    • Developed a robust recommendation system by applying K-means clustering algorithms, which enhanced customer experience and increased product engagement by 35%

    • Increased website conversion rates by 20% through detailed analysis and optimization of website traffic, leveraging high target-oriented strategies in a startup environment

    • Collaborated with the sales team to implement data-driven strategies, contributing to a 25% revenue increase by aligning sales initiatives with market and data insights.

    • Delivered over 20+ comprehensive reports to stakeholders, showcasing Month-over-Month growth, sales trends, and future predictions, which supported strategic decision-making and highlighted potential growth areas.

    • Initiated and led a project to refine customer segmentation, which improved marketing effectiveness by tailoring campaigns to specific customer groups based on data insights, ultimately increasing customer acquisition by 18%

My skills

Languages
  • Python icon
    Python
  • C icon
    C++
  • Java icon
    Java
  • R icon
    R Studio
  • SQL icon
    SQL
  • Matlab icon
    MATLAB
  • JavaScript icon
    JavaScript
Frameworks & Libraries
  • PyTorch icon
    PyTorch
  • TensorFlow icon
    TensorFlow
  • Keras icon
    Keras
  • OpenCV icon
    OpenCV
  • OpenAI icon
    OpenAI (GPT)
  • LLAMAv2 icon
    LLAMAv2
  • Hugging Face icon
    Hugging Face
  • NumPy icon
    NumPy
  • Pandas icon
    Pandas
  • Sklearn icon
    Sklearn
  • Scipy icon
    SciPy
  • TensorFlow icon
    Matplotlib
  • TensorFlow icon
    Seaborn
  • TensorFlow icon
    Spacy
  • TensorFlow icon
    NLTK
  • PowerBI
    PowerBI
Tools & OS
  • PostgresSQL
    PostgreSQL
  • Docker icon
    Docker
  • Apache Hadoop
    Apache Hadoop
  • Apache Spark icon
    Apache Spark
  • PySpark icon
    PySpark
  • Tableau icon
    Tableau
  • HTML icon
    HTML
  • CSS icon
    CSS
  • Linux icon
    Linux
Cloud
  • Azure Databricks
    Azure Databricks
  • Azure Data Factory
    Azure Data Factory
  • Azure Data Lake
    Azure Data Lake
  • Salesforce Marketing Cloud
    Salesforce Marketing Cloud
  • Salesforce CRM
    Salesforce CRM
  • AWS Sagemaker
    AWS Sagemaker
  • AWS EC2
    AWS EC2
  • AWS S3
    AWS S3
  • AWS Lambda
    AWS Lambda

Projects