Hello, I'm Nik

I'm Data Scientist

Teaching machines to do the heavy lifting
– one magical dataset at a time!

I don't just analyze data — I listen to it.

02. About Me

I'm Nikhil

Premachandra Rao

Atlanta, Georgia
26 years old
Data Scientist
5+ Years of work Experience

Data Scientist with 5+ years of experience designing and deploying end-to-end data and machine learning solutions across finance and healthcare domains. I specialize in transforming raw data into actionable insights through predictive modeling, cloud-native pipelines, and scalable analytics systems. Skilled in Python, SQL, Airflow, Snowflake, and Azure ML, I combine a strong foundation in mathematics and machine learning with hands-on expertise in AWS, dbt, and real-time data platforms. Passionate about building intelligent systems that enhance decision-making, drive operational efficiency, and create measurable business impact.

Nikhil Premachandra Rao

03. Timeline

Oct 2025 - Present

KATBOTZ

Data Analyst

University of Massachusetts Dartmouth

Master of Science - Data Science

Aug 2023 - Sept 2025

May 2022 - Jul 2023

HCL Tech

Data Scientist

Oracle Cerner

Software Intern

Jan 2020 - Apr 2022

May 2019 - Oct 2019

CMTI

Software Engineer Intern

Arcapsis

Freelance Data Scientist

May 2018 - Nov 2018

2016 - 2020

Siddaganga Institute of Technology

Bachelor of Engineering - Computer Science

04. Noteworthy Projects

AutoAPI Project

Featured Project .01

AutoAPI

AutoML project is a web-based application designed to automate machine learning model building and deployment. The project includes database integration, a user-friendly dashboard, and APIs for error diagnostics and additional functionalities

PythonFlaskFastAPIGitDocker

Featured Project .02

ModelMind

This project automates ML model selection by profiling datasets and predicting the best-performing algorithm based on the first 10 rows. It detects task type (regression/classification), applies rule-based filters, evaluates multiple models, and recommends the top performers based on R² and RMSE scores.

PythonSK-learnPandasAuto ML LogicModel Evaluation
ModelMind Project
LLM-Powered CSV Chatbot

Featured Project .03

LLM-Powered CSV Chatbot

This project enables natural language querying and analysis of CSV datasets using a local LLM (Ollama). Users can upload datasets and ask questions, which are interpreted by a Flask-based chatbot that wraps, routes, and executes Python code dynamically. Automated environment setup and cleanup scripts streamline deployment and maintenance.

PythonFlaskPandasOllamaLLM IntegrationShell Scripting

Other Noteworthy Projects

IMDB Sentiment Analysis

A deep dive into movie review sentiment analysis using classic ML and transformer-based models. Includes TF-IDF + Logistic Regression, fine-tuned BERT/RoBERTa, misclassification analysis, visualizations, and a live sentiment prediction demo.

PythonBERTRoBERTaTF-IDFTransformers

Predict Diabetes Web App

Diabetes Prediction Project leverages machine learning to develop a robust model for early diabetes detection with comprehensive data analysis and web-based interface.

FlaskScikit-learnPandasHealthcare ML

Purchase Prediction using XGBoost

An XGBoost-based machine learning model to predict customer purchase behavior (CH or MM) using the OJ dataset from the ISLR2 library. Includes data preprocessing, model training, and evaluation with a 50:50 train-test split.

XGBoostRISLR2Classification

AppSuccess Predictor

This project predicts whether a smartphone user will download an app after clicking a mobile ad using Random Forest. It includes data preprocessing, feature engineering, model building, and evaluation with key insights on feature importance.

Random ForestRMobile AnalyticsFeature Engineering

Note Forgery Detection

ML project for detecting genuine vs. forged banknotes using wavelet-based feature extraction from 400x400 grayscale images, trained on the A6DATA.csv dataset for financial security applications.

Computer VisionWavelet AnalysisImage ProcessingSecurity ML

Wage Prediction Model

Analyzing wage data with visualizations and applying multiple models for prediction and performance comparison.

Random ForestXGBoostCaretggplot2

Flask Web Application for Data Analysis and Prediction

A Flask-based web app that processes and visualizes data, and provides predictions based on a trained linear regression model.

FlaskPandasDockerScikit-learnKube

XGBoost Model for Predicting Customer Purchase Behavior

Training an XGBoost model on the OJ dataset for binary classification, followed by feature importance analysis, model evaluation, and hyperparameter tuning.

XGBoostCaretggplot2pROCPROC

Random Forest Model for App Download Prediction

Building a random forest model to predict app download probability, including feature engineering and model performance evaluation.

Random ForestCaretpROCggplot2

Logistic Regression for Classifying Genuine vs Forged Banknotes

Building and evaluating a logistic regression model to classify banknotes based on wavelet-transformed features and their entropy.

CaretpROCggplot2GLM

Genetic Algorithm for Solving the Traveling Salesman Problem (TSP)

Implementing a Genetic Algorithm with Selection, Crossover, Mutation, and Fitness Evaluation to solve the Traveling Salesman Problem.

PythonNumPyRandomEuclidean Distance Formula

CSV Query Performance

Benchmarking SQL query performance on CSV datasets of varying sizes to assess how file size impacts query efficiency.

PythonPandasNumPySQLitePyArrow

05. Certifications

Supervised Machine Learning: Regression and Classification

DeepLearning.AI, Stanford University

June 2024

ID: MCQEAHC8LAGS

Advanced Learning Algorithms

DeepLearning.AI, Stanford University

Sept 2024

ID: WN99E17LTHMI

Unsupervised Learning, Recommenders, Reinforcement Learning

DeepLearning.AI, Stanford University

Sept 2024

ID: IGZ9WVSUZCYZ

ChatGPT Prompt Engineering

DeepLearning.AI, Stanford University

Jan 2025

ID: 822e8d23-6e6a-4cea-b49c-b195106074d5

Machine Learning

DeepLearning.AI, Stanford University

June 2024

ID: PCEFF94NBT9G

Deep Learning

NVIDIA, Deep Learning Institute

April 2025

ID: M0S7oiZMQcO9R966P9O6-Q

06. Tech Stack

PythonRJavaCC++VBShellAntRubyPyTorchScikit-learnPandasNumPyMatplotlibSeabornFlaskPowerBIData ModelsExploratory Data Analysis (EDA)Data VisualizationData Cleaning and PreprocessingAdvanced Mathematical Statistics
PythonRJavaCC++VBShellAntRubyPyTorchScikit-learnPandasNumPyMatplotlibSeabornFlaskPowerBIData ModelsExploratory Data Analysis (EDA)Data VisualizationData Cleaning and PreprocessingAdvanced Mathematical Statistics
Statistical ModelingHypothesis TestingReg AnalysisLinear AlgebraHadoopHDFSMapReduceKafkaHiveAvroOozieHuechopsAWS (EMR, Step Functions, S3, Lambda, CloudWatch, IAM)DockerKubernetesJenkinsSpinnakerCI/CDSQL ServerOracleHBaseFirebaseMongoDBJIRA
Statistical ModelingHypothesis TestingReg AnalysisLinear AlgebraHadoopHDFSMapReduceKafkaHiveAvroOozieHuechopsAWS (EMR, Step Functions, S3, Lambda, CloudWatch, IAM)DockerKubernetesJenkinsSpinnakerCI/CDSQL ServerOracleHBaseFirebaseMongoDBJIRA
GitIntelliJPostmanMavenGradleSplunkDeep learningData miningData analyticsETLCommunication skillsTeamworkCritical ThinkingStrategic planningProblem-solvingAdaptability
GitIntelliJPostmanMavenGradleSplunkDeep learningData miningData analyticsETLCommunication skillsTeamworkCritical ThinkingStrategic planningProblem-solvingAdaptability

07. Contact me

Slide Into My Inbox!

Got a question, a wild idea, or just want to send me a meme?

Hit me up—or even better, let's grab a coffee (virtual or real, your choice)!

Loading contact form...