Hello World!

I am Soham Naik,

Let's scroll and learn more about me!

About Me

I specialize in creating innovative solutions in machine learning, NLP, and computer vision. My experience ranges from developing data extraction and RAG pipelines to automating CI/CD processes using Azure DevOps. I’m passionate about transforming data into actionable insights and continuously expanding my technical expertise to tackle complex challenges.


Education

Master's in Information Management GPA: 3.79/4.0
University of Washington 2023-2025
Bachelor's in Computer Engineering GPA: 3.81/4.0
Pune University 2019-2023

Skills

The skills that I have aquired over a period of 3 years:

Languages: Python, R, SQL, HTML, CSS
Frameworks: PyTorch, Azure, LangChain, Flask, Spark
Tools: Pandas, Numpy, Matplotlib, Seaborn, Docker, Tableau
Gen-AI: RAG, Haystack, CoT Prompting, HuggingFace, Transformers, API Integrations
DS Skills: EDA, ETL, Deep Learning, Data Viz., Predictive Modelling, Statistical Modelling, Hypothesis Testing

Work Experience

Johnson and Johnson

Data Scientist (Capstone Project)

Jan 2025 - Present

A global leader in healthcare products and services.

Key Responsibilities

  • Performing ETL on 1M+ data points using Pandas and PySpark to process large-scale datasets efficiently and derive actionable KPIs
  • Designing interactive Tableau dashboards to conduct ad-hoc analyses to communicate complex trends to nontechnical users, driving 2x user engagement
  • Collaborating with data scientists to perform K-Means clustering and KDE analysis to improve the prediction accuracy of Bayesian classifiers

Castor

AI Engineer (Co-Op)

Jun 2024 - Present

Healthcare SaaS platform specializing in clinical research.

Key Responsibilities

  • Developed a scalable data extraction pipeline using Python, MS Copilot API and prompt engineering to transform EHR data, achieving 50% cost reduction through automated processing
  • Built a RAG pipeline in LangChain using vector stores, resulting in 30% faster document retrieval for source document verification
  • Set up a CI/CD pipeline using AWS Services for automated testing, building, and deployment of Docker containers to AWS, reducing deployment time by 70%

UW NLP Lab

GenAI Research Intern

Feb 2024 - Jul 2024

Academic lab focused on NLP and LLM research.

Key Responsibilities

  • Built a logistic regression model in PyTorch to evaluate the performance of fine-tuned Mistral-7B Hugging Face LLMs in resume selection tasks for bias
  • Used NLP to perform text similarity, keyword matching, and feature extraction for predictive accuracy. Validated the hypothesis of bias with 96% accuracy

Metro AG

Data Science Intern

Mar 2023 - Jun 2023

Global wholesale company serving professional customers.

Key Responsibilities

  • Developed SARIMA forecasting model for sales prediction, achieving a 20% improvement in accuracy
  • Built recommendation systems for B2B customersUsed Python, TensorFlow, scikit-learn and NumPy to develop an ID authentication tool, eliminating 20 hours of weekly manual processing workload
  • Implemented an innovative solution to extract and transform high-volume data entering PowerApps via SQL, saving $120,000 in annual licensing costs

Projects

Project 1

Ingredient Analyzer

A web application that allows users to upload images of ingredient labels, extract the text using OCR, and analyze the ingredients using Claude API to determine if they are healthy, artificial sweeteners, or have potential concerns.

Project 1

Forecasting Model

This project utilizes the SARIMA (Seasonal AutoRegressive Integrated Moving Average) model to forecast sales trends for a retail dataset. It preprocesses historical sales data, identifies seasonal patterns, and optimizes SARIMA parameters for accurate predictions. The model helps retailers anticipate demand fluctuations and optimize inventory.

Project 1

Gun Detection

This project implements a deep learning-based gun detection system using the YOLO (You Only Look Once) object detection algorithm. The model is trained on a dataset of firearm images to accurately identify guns in real-time, making it suitable for security and surveillance applications.

Publications

Say Hello