I'm Prathmesh Lonkar, a Data Scientist based in Los Angeles, with Master's in Applied Data Science at USC. I specialize in Data Mining, Predictive Analytics, GenAI, ETL, Data Warehousing, Machine Learning, and NLP. My blend of academic knowledge and hands-on experience equips me to deliver data-driven insights and solutions for strategic decision-making.
I specialize in:
🌟 Beyond the technical side, I bring empathy, reliability, and a “let-us-get-it-done” mindset to every team I’m part of. I believe in having fun, taking ownership of my work, and always trusting in my abilities. Known for my adaptability, precision, and strong work ethic, I bring a unique blend of creativity , curiosity, and analytical thinking to every project. Outside of work, I enjoy working on my physical and mental aspects through strength training and practicing meditation, seeking outer strength and inner peace!
My favorite languages for coding and data manipulation.
My preferred databases for building scalable applications.
My go-to libraries for machine learning and data analysis.
My preferred tools for creating interactive dashboards and data visualizations.
My tools for AI development and ML deployment.
My essential tools for development and collaboration.
My cloud platforms for hosting and deployment.
My preferred technologies for large-scale data processing and analytics.
Spearheaded the development of Tenable VulnAdvisor, an AI-powered tool that automates CVE plugin metadata content creation, reducing manual remediation time by 80%. Integrated OpenAI LLMs with Retrieval-Augmented Generation (RAG), FAISS vector search, and Streamlit to generate plugin descriptions, solutions, and synopses. Accelerated vulnerability management workflows, enabling Tenable researchers to prioritize and remediate security issues more efficiently.
Collaborated on a defense-focused initiative for the U.S. Department of Defense’s IAVM program, analyzing large-scale CVE datasets to identify high-risk vulnerability patterns and trends. Developed a predictive model with 95% recall for IAVM forecasting, enabling early remediation and reducing analysis time by 3–4 days per case. Built a user-friendly Streamlit dashboard to visualize priority CVEs, empowering Tenable researchers to act swiftly with actionable insights. Was rewarded with the internship extension for successful project delivery.
Performed in-depth market analysis of the Carbon Management energy sector to identify emerging technology trends, evaluate competitors, and uncover growth opportunities. Conducted due diligence on a prospective startup and prepared a strategic investment summary assessing market viability and financial potential. Delivered a data-driven presentation using MS Excel and Power BI to visualize key insights, supporting partner-level investment decisions.
Led the design and implementation of robust ETL solutions for two U.S.-based enterprise clients, optimizing data migration workflows and improving data processing speed by 30%. Built and maintained 100+ automated data pipelines using Python, SQL, and Fosfor Spectra, enabling seamless integration across cloud (Azure), CRM (Salesforce), file-based systems (SFTP), and Snowflake databases. Engineered a scalable, high-volume data migration framework using Salesforce BULK API, deployed across 50+ production pipelines. Drove product adoption by training 120+ individuals & stakeholders (including an Indian National Security Organization) and identified 70+ critical bugs with actionable UX and process improvement suggestions.
Engineered the backend for a no-code Data Science/ML toolkit that enabled users to perform data analysis, visualization, model training & deployment via a user-friendly UI — reducing analysis time by over 50%. Developed modular Flask-based APIs to support end-to-end machine learning workflows. Collaborated on front-end design and interface usability, crafting a streamlined and accessible product for non-technical users.
Analyzed 3.3M+ trending YouTube video records (6.5GB dataset) to uncover patterns across video categories, upload timing, and engagement behavior. Engineered 13 new features, built a Random Forest model to predict view counts, and implemented A/B tests comparing post timing (morning vs evening, weekday vs weekend) to provide posting recommendations. Uncovered actionable strategies to improve reach and engagement for content creators based on temporal trends and audience interaction.
Check it out!Built a custom neural network from the ground up using Python and NumPy to simulate a single-neuron sigmoid activated model. Trained on a dummy insurance dataset, the model includes custom backpropagation, manual gradient descent, and performance benchmarking against TensorFlow (Keras). Achieved exact similarity in metrics & results, validated weights, and visualized learning progress and predictions through plots and prediction curves.
Check it out!Built a complete ML pipeline using AWS S3, SageMaker, Lambda, and API Gateway — starting from dataset preprocessing and upload to S3, to model training, deployment, and real-time inference. Used Boto3 SDK to manage S3 storage, IAM roles, and SageMaker resources. Trained a Random Forest classifier on SageMaker, deployed it to a real-time endpoint, and validated predictions via REST API calls routed through API Gateway and AWS Lambda.
Check it out!Engineered an AI agent certified on the GAIA (General AI Assistant) Benchmark via a HuggingFace Certification assessment for general-purpose assistants. Integrated tools like DuckDuckGo, Wikipedia search, Whisper (audio transcription), and Tesseract (OCR) for multi-modal task solving. Achieved a 35% score with full automation: file parsing, web search, YouTube transcription, and structured result submission via HuggingFace APIs.
Check it out!Analyzed 260K+ transactions and 72K+ customers to uncover sales trends by packet size, brand, and customer segments. Identified 175g packs and 'Kettle' as top performers, with holiday spikes driving seasonal demand during Christmas. Applied feature engineering and t-tests to reveal higher spend-per-transaction among younger mainstream buyers, guiding targeted marketing and inventory planning.
Check it out!Developed a conversational AI agent enabling users to upload CSV files and query data in natural language. Leveraged LangChain and OpenAI GPT-4 to process datasets with over 10,000 rows, delivering responses within ~5 seconds. Successfully tested across finance, sales, and HR datasets, handling diverse data types including numeric, categorical, and date/time.
Check it out!Developed an XGBoost model achieving ~90% test accuracy to predict customer churn in a B2B energy services context. Engineered 30+ features, including net margin and forecasted consumption, and conducted statistical analysis to identify key churn drivers. Proposed targeted retention strategies, such as a 20% discount for high-value at-risk customers, based on model insights.
Check it out!Investigated gender bias in BERT-based NER models using 139 years of U.S. census name data. Built a PyTorch pipeline to process 1,000+ gendered prompts, revealing 2× higher Type-3 error rates for female names in PERSON entity recognition. Visualized bias trends across templates, professions, and time periods.
Check it out!Developed a conversational PDF query system using LangChain, OpenAI GPT models, and FAISS for efficient document retrieval. Optimized document parsing and embedding with PyPDF2 and a chunk overlap strategy for better context retention. Integrated FAISS vector storage for accurate retrieval, enabling precise, context-aware responses to user queries.
Check it out!Built a hybrid recommendation system using item-based collaborative filtering and machine learning models (XGBoost, CatBoost) on the Yelp dataset. Built a scalable data pipeline with PySpark and Spark RDDs to process over 1 million records, enabling rapid experimentation and optimized model performance with an RMSE of 0.9798.
Check it out!Led an end-to-end analytics project analyzing web-scraped customer reviews from 1000+ flights using NLP techniques. Built a sentiment classifier using NLTK to label feedback as positive, neutral, or negative. Engineered features and developed a classification model to predict customer buying behavior, uncovering key drivers through EDA and visualizations. Delivered insights via presentation-ready visualization and reports.
Check it out!Conducted a comprehensive analysis on customer demographics and transaction data to identify high-value customer segments. Performed data quality assessments, cleaned and transformed datasets, and developed interactive Power BI dashboards. Provided strategic marketing recommendations based on insights derived from customer behavior and purchasing patterns.
Check it out!Developed a CNN-based deep learning model to classify frost presence in high-resolution Martian images from NASA's HiRISE dataset, achieving up to 95% test accuracy. Implemented transfer learning with EfficientNetB0, ResNet50, and VGG16 architectures. Applied data augmentation and preprocessing techniques to enhance model performance and generalization.
Check it out!Co-developed a custom DBMS supporting both relational and NoSQL models, featuring a human-readable query language and CLI interface. Engineered dynamic hashing with linear probing and chaining for optimized data retrieval. Implemented full CRUD operations, grouping, aggregation, joins, and filtering for NoSQL design.
Check it out!Conducted exploratory data analysis on turbine failure datasets to identify failure patterns and predict potential breakdowns. Utilized time series analysis and visualizations to uncover trends, aiding in the development of preventive maintenance strategies.
Check it out!Patent No. 529583, Indian Patents Office, Issued March 2024
The proposed invention belongs to the field of Mechatronics and relates to a five-bar mechanism-based quadruped. It has a separate five-bar mechanism for each leg of the quadruped and can be arranged in two ways, actuators being parallel to each other and with actuators being co-axial. The quadruped is driven by eight motors where each leg is driven by two motors. Advantages of the invention being its ability to walk straight, turn, cross obstacles, and climb slopes.