Research Data Scientist · Michigan Medicine, Ann Arbor, MI

Turning complex
data into scientific
discovery

Research data scientist at the intersection of data science and bioinformatics, helping research teams develop computational infrastructure and analytics solutions for complex biological systems. Specializing in neural activity analysis, multi-omics integration, and scalable HPC pipelines.

40%
Reduction in Processing Time
HPC infrastructure optimization
25%
Accuracy Improvement
Neural decoding models
25M+
Records Analyzed
Large-scale data processing
9
Publications
First-author & collaborative
5+
Team Members Led
Cross-functional collaboration
Featured Research Contribution · Princeton University Press
Large-Scale Social Network Research
Contributed analytical research to Dr. Elizabeth Bruch's landmark study of large-scale social network behavior — analyzing interaction patterns across 25 million+ users to uncover the geographic and cultural dynamics of modern human partnering. One of the largest social network analyses of its kind ever conducted.
Princeton University Press · Forthcoming
Research focus & domain expertise

I'm a Research Data Scientist at the Michigan Neuroscience Institute, Michigan Medicine, where I develop AI-driven analytics pipelines that transform large-scale multimodal biomedical data into actionable scientific insights.

My work spans the full data science stack — from designing HIPAA-compliant clinical databases and HPC-optimized pipelines, to building generative AI knowledge bases and interactive research dashboards. I've led cross-functional teams of 5 scientists while securing over $500K in competitive research grants.

I thrive at the intersection of machine learning, biology, and clinical research — translating complex analytical findings into strategies that drive real scientific progress and policy.

M.S. Computer Science (Data Science) · DePaul University, 2017  ·  B.E. Computer Science & Engineering · Gujarat Technological University, 2014

Neural
Neuroscience & Neural Data
Real-time neural signal analysis, behavioral decoding, high-dimensional time-series processing from electrophysiology recordings.
Neural Decoding Electrophysiology Behavioral Analysis TIFF Imaging
Omics
Bioinformatics & Genomics
Multi-omics integration, single-cell & spatial transcriptomics, reproducible bioinformatics pipelines for genomic data.
Single-cell RNA-seq Spatial Transcriptomics Multi-omics Bioconductor
AI/ML
Machine Learning & AI
Supervised/unsupervised learning, deep learning, generative AI, LLM fine-tuning, causal inference, and NLP pipelines.
PyTorch TensorFlow Generative AI LLM Fine-tuning NLP
EHR
Clinical & Health Data
HIPAA/IRB-compliant research, EHR analytics, longitudinal clinical study design, REDCap database management.
EHR Analytics HIPAA Compliance REDCap IRB Protocols
NetSci
Social Network & Survey Data
Large-scale social network analysis, multilingual NLP, sentiment analysis, survey data modeling at millions-user scale.
Network Analysis Sentiment Analysis Multilingual NLP Survey Modeling
HPC
Data Engineering & Infrastructure
HPC cluster optimization, ETL pipeline design, relational databases, Docker, and scalable cloud-ready data workflows.
HPC Clusters SQL / MySQL Docker ETL Pipelines Git
Stats
Statistical Modeling
Multilevel modeling, causal inference, longitudinal analysis, reproducible statistical workflows in R and Python.
Multilevel Models Causal Inference R / lmer SciPy
Viz
Visualization & Dashboards
Interactive research dashboards, publication-quality figures, and real-time data visualization for scientific audiences.
Plotly Tableau R Shiny ggplot2
Where I've made impact
Research Data Scientist
Michigan Neuroscience Institute, Michigan Medicine · Ann Arbor, MI
Aug 2020 – Present
  • Built full-stack analytical dashboard processing 30–50GB TIFF datasets with real-time video synchronization and interactive cell selection, cutting neural activity research workflow time by 70%
  • Designed scalable ML pipelines for neural decoding using 50K+ time frames & 600+ features, automating behavioral prediction workflows across high-dimensional datasets
  • Developed a GenAI literature retrieval knowledge base using Llama 3.2 fine-tuned on large-scale medical research texts for semantic insight generation
  • Led HIPAA/IRB-compliant longitudinal clinical study — designed database to manage patient enrollment, interventions, and outcomes
  • Scaled HPC clusters to reduce data processing time by 40% and lower computing costs by $10K
  • Secured over $500K in competitive research funding through grant writing and technical analysis planning
  • Led and mentored a team of 5 research scientists — training junior staff, conducting code reviews, and ensuring timely delivery of all project milestones
Research Data Analyst
Complex Systems, University of Michigan · Ann Arbor, MI
Mar 2018 – Aug 2020
  • Analyzed social network data from 25M+ users to uncover large-scale interaction patterns — research contributed to a forthcoming Princeton University Press publication by Dr. Elizabeth Bruch
  • Built NLP-based de-identification pipeline ensuring regulatory compliance and accelerating human-subjects data processing
  • Designed relational databases and automated ETL workflows, reducing data preparation time by 50%
  • Developed reproducible statistical analyses and visualizations that improved cross-disciplinary collaboration and decision-making
Research Data Analyst
DePaul University · Chicago, IL
Sep 2016 – Mar 2018
  • Led NLP-driven multilingual sentiment analysis (English & Spanish) from employee survey data, producing actionable organizational insights
  • Visualized emotional patterns and attitudes using R and Tableau, driving measurable improvements in employee satisfaction strategy
Data Analytics & ML Intern
Presence Health · Chicago, IL
Jun – Aug 2016
  • Analyzed 500K+ EHRs under HIPAA using time series, logistic regression & clustering — contributed to a 15% improvement in targeted healthcare outcomes across Illinois
  • Built predictive time-series models and interactive dashboards for regional health metric forecasting
  • Implemented cross-hospital patient matching algorithm improving research data integration efficiency
Projects & Research

Click any project card for a full overview of the computational methods and my contribution.

Neuroscience · ML
Neural Decoding & Behavioral Prediction
Scalable ML pipeline processing 50K+ time frames and 600+ features from hippocampal CA1 calcium imaging to decode behavioral states and automate prediction workflows.
600+ features50K+ time framesPyTorch · Sklearn
Click for details →
Social Network · NLP · Princeton Press
Large-Scale Social Network Analysis
Analyzed interaction patterns across 25M+ users to uncover geographic and cultural dynamics of human partnering behavior — contributed to a forthcoming Princeton University Press publication.
25M+ usersPrinceton PressR · Python
Click for details →
Generative AI · Research Tools
GenAI Research Knowledge Base
Fine-tuned Llama 3.2 on large-scale medical research texts to build a semantic literature retrieval and notes generation system for biomedical research.
Llama 3.2Fine-tuned LLMRAG pipeline
Click for details →
Healthcare Analytics · EHR · HIPAA
Healthcare Outcomes Prediction
Analyzed 500K+ EHRs under HIPAA using time series analysis, logistic regression and clustering to identify community health barriers across Illinois.
15% outcome uplift500K+ EHRsHIPAA compliant
Click for details →
Bioinformatics · Genomics
Multi-omics & Spatial Transcriptomics
Integrated single-cell and spatial transcriptomics datasets to map gene expression in neurological tissue with reproducible Bioconductor pipelines.
Single-cell RNA-seqSpatial omicsR · Bioconductor
Click for details →
NLP · Sentiment Analysis · Published
Multilingual Sentiment Analysis
NLP pipeline for English and Spanish employee survey data. Published in International Hospitality Review. Drove measurable improvements in organizational strategy.
PublishedEN + ESR · Tableau
Click for details →
Key publications
eNeuro · 2022
First Author
Forebrain Glucocorticoid Receptor Overexpression Alters Behavioral Encoding of Hippocampal CA1 Pyramidal Cells in Mice
Gavade S, Wei Q, Johnston C, Kounelis-Wuillaume S, et al.  ·  doi: 10.1523/ENEURO.0126-22.2022
Psychoneuroendocrinology · 2024
Ventral Subiculum Control of Avoidance Behavior and Hypothalamic-Pituitary-Adrenal Axis Reactivity via the Bed Nucleus of the Stria Terminalis in Male and Female Mice
Marsh JS, Teixeira C, Gavade S, Spencer-Segal JL  ·  doi: 10.1016/j.psyneuen.2024.107229
Frontiers in Behavioral Neuroscience · 2023
Corticosterone Enhances Formation of Non-Fear but Not Fear Memory During Infectious Illness
Hill A, Johnston C, Agranoff I, Gavade S, Spencer-Segal J  ·  doi: 10.3389/fnbeh.2023.1144173
International Hospitality Review · 2018
Co-Author
Translating Emotional Insights from Hospitality Employees' Comments: Using Sentiment Analysis to Understand Job Satisfaction
Young LM, Gavade SR  ·  doi: 10.1108/IHR-08-2018-0007
Google Scholar Profile Presented at 5+ national & international conferences
REDCap Certified PEERRS: Human Subjects Protections CITI Core IRB Training
Let's collaborate

Open to research collaborations, consulting on data science for biomedical and social research, and senior research data scientist roles. Let's build something that matters.

Based in Ann Arbor, MI
Open to hybrid & remote opportunities

Verified. Reproducible. p < 0.05.