Professional Me

Prathmesh Lonkar Data Science • Analytics • Machine Learning • Generative AI

About Me

I'm Prathmesh Lonkar, a Data Scientist based in Los Angeles, with Master's in Applied Data Science at USC. I specialize in Data Mining, Predictive Analytics, GenAI, ETL, Data Warehousing, Machine Learning, and NLP. My blend of academic knowledge and hands-on experience equips me to deliver data-driven insights and solutions for strategic decision-making.

I specialize in:

🌟 Beyond the technical side, I bring empathy, reliability, and a “let-us-get-it-done” mindset to every team I’m part of. I believe in having fun, taking ownership of my work, and always trusting in my abilities. Known for my adaptability, precision, and strong work ethic, I bring a unique blend of creativity , curiosity, and analytical thinking to every project. Outside of work, I enjoy working on my physical and mental aspects through strength training and practicing meditation, seeking outer strength and inner peace!

Languages

My favorite languages for coding and data manipulation.

Databases

My preferred databases for building scalable applications.

Libraries

My go-to libraries for machine learning and data analysis.

Dashboards

My preferred tools for creating interactive dashboards and data visualizations.

AI & MLOps

My tools for AI development and ML deployment.

Tools

My essential tools for development and collaboration.

Cloud

My cloud platforms for hosting and deployment.

Big Data

My preferred technologies for large-scale data processing and analytics.


Work Experience

June 2024 - Dec 2024

GenAI Data Scientist Intern

Tenable Inc.

Spearheaded the development of Tenable VulnAdvisor, an AI-powered tool that automates CVE plugin metadata content creation, reducing manual remediation time by 80%. Integrated OpenAI LLMs with Retrieval-Augmented Generation (RAG), FAISS vector search, and Streamlit to generate plugin descriptions, solutions, and synopses. Accelerated vulnerability management workflows, enabling Tenable researchers to prioritize and remediate security issues more efficiently.

LLMs RAG FAISS Streamlit OpenAI MLOps LangChain AI Automation

Research & Development Data Scientist Intern

Tenable Inc.

Collaborated on a defense-focused initiative for the U.S. Department of Defense’s IAVM program, analyzing large-scale CVE datasets to identify high-risk vulnerability patterns and trends. Developed a predictive model with 95% recall for IAVM forecasting, enabling early remediation and reducing analysis time by 3–4 days per case. Built a user-friendly Streamlit dashboard to visualize priority CVEs, empowering Tenable researchers to act swiftly with actionable insights. Was rewarded with the internship extension for successful project delivery.

Data Collection SQL Data Visualization Snowflake Data Analysis Streamlit Pandas ML MLOps
Feb 2024 - Mar 2024

VC Research & Data Analytics Extern

Energy Innovation Capital

Performed in-depth market analysis of the Carbon Management energy sector to identify emerging technology trends, evaluate competitors, and uncover growth opportunities. Conducted due diligence on a prospective startup and prepared a strategic investment summary assessing market viability and financial potential. Delivered a data-driven presentation using MS Excel and Power BI to visualize key insights, supporting partner-level investment decisions.

Data Visualization Market Analysis Due Diligence Investment Research MS Excel Power BI
Jun 2021 - June 2023

Senior Product Engineer (ETL Data Analyst)

LTIMindtree Ltd.

Led the design and implementation of robust ETL solutions for two U.S.-based enterprise clients, optimizing data migration workflows and improving data processing speed by 30%. Built and maintained 100+ automated data pipelines using Python, SQL, and Fosfor Spectra, enabling seamless integration across cloud (Azure), CRM (Salesforce), file-based systems (SFTP), and Snowflake databases. Engineered a scalable, high-volume data migration framework using Salesforce BULK API, deployed across 50+ production pipelines. Drove product adoption by training 120+ individuals & stakeholders (including an Indian National Security Organization) and identified 70+ critical bugs with actionable UX and process improvement suggestions.

ETL Data Migration SQL Salesforce BULK API Python Data Cleaning Data Preprocessing Snowflake Stakeholder Training
Jan 2021 - May 2021

Data Science, ML & Product Intern

Trisim Technologies Pvt. Ltd.

Engineered the backend for a no-code Data Science/ML toolkit that enabled users to perform data analysis, visualization, model training & deployment via a user-friendly UI — reducing analysis time by over 50%. Developed modular Flask-based APIs to support end-to-end machine learning workflows. Collaborated on front-end design and interface usability, crafting a streamlined and accessible product for non-technical users.

Python Data Science Flask Machine Learning Product Development UI/UX Data Visualization

Featured Projects

YouTube

Social Analytics with YouTube Trending Data

Python Social Analytics Pandas EDA Feature Engineering Seaborn Matplotlib Random Forest Engagement Optimization A/B Testing Welch's t-test YouTube

Analyzed 3.3M+ trending YouTube video records (6.5GB dataset) to uncover patterns across video categories, upload timing, and engagement behavior. Engineered 13 new features, built a Random Forest model to predict view counts, and implemented A/B tests comparing post timing (morning vs evening, weekday vs weekend) to provide posting recommendations. Uncovered actionable strategies to improve reach and engagement for content creators based on temporal trends and audience interaction.

Check it out!

AWS

Custom Neural Network from Scratch

Python Numpy Neural Networks TensorFlow Sigmoid Gradient Descent Backpropagation Loss Function Jupyter

Built a custom neural network from the ground up using Python and NumPy to simulate a single-neuron sigmoid activated model. Trained on a dummy insurance dataset, the model includes custom backpropagation, manual gradient descent, and performance benchmarking against TensorFlow (Keras). Achieved exact similarity in metrics & results, validated weights, and visualized learning progress and predictions through plots and prediction curves.

Check it out!

AWS

End-to-End MLOps Workflow with AWS & Python

Python AWS SageMaker S3 Lambda API Gateway AWS CLI Boto3 Scikit-learn Model Deployment Jupyter Cloud

Built a complete ML pipeline using AWS S3, SageMaker, Lambda, and API Gateway — starting from dataset preprocessing and upload to S3, to model training, deployment, and real-time inference. Used Boto3 SDK to manage S3 storage, IAM roles, and SageMaker resources. Trained a Random Forest classifier on SageMaker, deployed it to a real-time endpoint, and validated predictions via REST API calls routed through API Gateway and AWS Lambda.

Check it out!

HF Agent GAIA

AI Agent Certified with GAIA Benchmark

Python HuggingFace LlamaIndex SmolAgents LangGraph OpenAI Tool-Augmented Reasoning OCR Whisper Gradio GAIA Benchmark

Engineered an AI agent certified on the GAIA (General AI Assistant) Benchmark via a HuggingFace Certification assessment for general-purpose assistants. Integrated tools like DuckDuckGo, Wikipedia search, Whisper (audio transcription), and Tesseract (OCR) for multi-modal task solving. Achieved a 35% score with full automation: file parsing, web search, YouTube transcription, and structured result submission via HuggingFace APIs.

Check it out!

Consumer Analysis

Consumer Behavior & Sales Analysis – Snack Industry

Python Pandas Seaborn Matplotlib EDA Customer Segmentation Sales Insights Feature Engineering T-Tests Retail Analytics

Analyzed 260K+ transactions and 72K+ customers to uncover sales trends by packet size, brand, and customer segments. Identified 175g packs and 'Kettle' as top performers, with holiday spikes driving seasonal demand during Christmas. Applied feature engineering and t-tests to reveal higher spend-per-transaction among younger mainstream buyers, guiding targeted marketing and inventory planning.

Check it out!

DataWhiz

DataWhiz – Conversational AI Agent for Instant Data Insights

Python LangChain OpenAI GPT-4 LLM AI Agent Streamlit Pandas Conversational AI

Developed a conversational AI agent enabling users to upload CSV files and query data in natural language. Leveraged LangChain and OpenAI GPT-4 to process datasets with over 10,000 rows, delivering responses within ~5 seconds. Successfully tested across finance, sales, and HR datasets, handling diverse data types including numeric, categorical, and date/time.

Check it out!

Customer Churn

Customer Churn Prediction – B2B Energy Sector

Python Pandas Matplotlib Seaborn Scikit-learn Feature Engineering Business Recommendation

Developed an XGBoost model achieving ~90% test accuracy to predict customer churn in a B2B energy services context. Engineered 30+ features, including net margin and forecasted consumption, and conducted statistical analysis to identify key churn drivers. Proposed targeted retention strategies, such as a 20% discount for high-value at-risk customers, based on model insights.

Check it out!

AI Bias BERT

Bias Detection in Named Entity Recognition using BERT

Python Pytorch Transformers Hugging Face NLP Bias Analysis Fairness in AI Data Visualization

Investigated gender bias in BERT-based NER models using 139 years of U.S. census name data. Built a PyTorch pipeline to process 1,000+ gendered prompts, revealing 2× higher Type-3 error rates for female names in PERSON entity recognition. Visualized bias trends across templates, professions, and time periods.

Check it out!

DocuBot

DocuBot: Intelligent PDF Query System

LangChain LLM NLP Text Processing Vector Database OpenAI GPT model PyPDF2 FAISS Embedding

Developed a conversational PDF query system using LangChain, OpenAI GPT models, and FAISS for efficient document retrieval. Optimized document parsing and embedding with PyPDF2 and a chunk overlap strategy for better context retention. Integrated FAISS vector storage for accurate retrieval, enabling precise, context-aware responses to user queries.

Check it out!

Recommendation

YelpRec: Scalable Hybrid Recommendation System

Python Pyspark Spark RDD XGBoost CatBoost Hybrid Recommendation System Collaborative Filtering

Built a hybrid recommendation system using item-based collaborative filtering and machine learning models (XGBoost, CatBoost) on the Yelp dataset. Built a scalable data pipeline with PySpark and Spark RDDs to process over 1 million records, enabling rapid experimentation and optimized model performance with an RMSE of 0.9798.

Check it out!

NLP

Customer Review Sentiment Analysis & Buying Behavior Prediction

Python NLP Sentiment Analysis Web Scraping Classification Models EDA Feature Engineering NLTK BeautifulSoup

Led an end-to-end analytics project analyzing web-scraped customer reviews from 1000+ flights using NLP techniques. Built a sentiment classifier using NLTK to label feedback as positive, neutral, or negative. Engineered features and developed a classification model to predict customer buying behavior, uncovering key drivers through EDA and visualizations. Delivered insights via presentation-ready visualization and reports.

Check it out!

Custoemer Demographic

Customer Segmentation & Marketing Strategy Analysis

Python Power BI MS Excel Data Cleaning Customer Segmentation Data Quality

Conducted a comprehensive analysis on customer demographics and transaction data to identify high-value customer segments. Performed data quality assessments, cleaned and transformed datasets, and developed interactive Power BI dashboards. Provided strategic marketing recommendations based on insights derived from customer behavior and purchasing patterns.

Check it out!

NASA

NASA HiRise Martian Frost Detection

Python tensorflow Keras CNN GPU Optimization Data Augmentation Image Classification Deep Learning Transfer Learning

Developed a CNN-based deep learning model to classify frost presence in high-resolution Martian images from NASA's HiRISE dataset, achieving up to 95% test accuracy. Implemented transfer learning with EfficientNetB0, ResNet50, and VGG16 architectures. Applied data augmentation and preprocessing techniques to enhance model performance and generalization.

Check it out!

Recommendation

APV-Database: SQL & NoSQL Database from Scratch

Python CLI Query Parsing Dynamic Hashing NoSQL Relational DB

Co-developed a custom DBMS supporting both relational and NoSQL models, featuring a human-readable query language and CLI interface. Engineered dynamic hashing with linear probing and chaining for optimized data retrieval. Implemented full CRUD operations, grouping, aggregation, joins, and filtering for NoSQL design.

Check it out!

Turbine

Turbine Failure Data Analysis

Python Pandas Matplotlib Seaborn Time Series Analysis Data Visualization ARIMA Predictive Analytics

Conducted exploratory data analysis on turbine failure datasets to identify failure patterns and predict potential breakdowns. Utilized time series analysis and visualizations to uncover trends, aiding in the development of preventive maintenance strategies.

Check it out!

Patents

Five-Bar Mechanism Based Quadruped

Patent No. 529583, Indian Patents Office, Issued March 2024

The proposed invention belongs to the field of Mechatronics and relates to a five-bar mechanism-based quadruped. It has a separate five-bar mechanism for each leg of the quadruped and can be arranged in two ways, actuators being parallel to each other and with actuators being co-axial. The quadruped is driven by eight motors where each leg is driven by two motors. Advantages of the invention being its ability to walk straight, turn, cross obstacles, and climb slopes.


Certifications

Present

McKinsey Forward Program

McKinsey

June 2025

Introduction to Generative AI

Amazon Web Services

May 2025

AI Agent Certification

HuggingFace

May 2025

AI Agent Fundamentals

HuggingFace

May 2025

Deloitte Data Analytics Job Simulation

Forage | Deloitte

February 2025

BCG X Data Science Job Simulation

Forage | BCG X

December 2023

British Airways Data Science Job Simulation

Forage | British Airways

November 2023

KPMG Data Analytics Consulting Virtual Internship

Forage | KPMG

July 2022

IBM Data Science Professional

Coursera | IBM

June 2022

Data Science Foundations

IBM

May 2022

Big Data Foundations

IBM

May 2022

Hadoop Foundations

IBM

May 2022

Spark Foundations

IBM

April 2022

Python Specialization

Coursera | University of Michigan

December 2021

Microsoft Certified Azure Fundamentals

Microsoft

February 2021

Data Visualization

Kaggle

January 2021

Data Cleaning

Kaggle

January 2021

Intermediate Machine Learning

Kaggle

July 2020

Leadership & Emotional Intelligence

Coursera | Indian School of Business

May 2020

Successful Presentation

Coursera | University of Colorado Boulder

May 2020

Business Writing

Coursera | University of Colorado Boulder

April 2020

Graphic Design

Coursera | University of Colorado Boulder