About me

Hello and welcome!

I’m a passionate researcher with a Ph.D. in Measurement & Statistics from the University of Washington. Currently, I’m pursuing my research interests as a Machine Learning (ML) postdoctoral researcher at Truveta, where I focus on predicting cancer risk using predictive modeling techniques that leverage ML and large language models (LLMs) to analyze patients’ medical histories with timestamps.

During my doctoral studies, I developed a deep expertise in quantitative methods, with an emphasis on predictive models, multilevel analysis, latent variables, and social network analysis. My dissertation explored the intersection of factor analysis and machine learning, assessing the performance of novel exploratory factor modeling techniques through Monte Carlo simulations.

As a postdoctoral fellow at the Harvard Graduate School of Education, I advanced statistical methods for text analysis in randomized controlled trials (RCTs). I developed the R package ‘rcttext’ to automate human coding tasks and analyze large-scale text data from RCTs, applying advanced statistical techniques for more robust insights. I also led research comparing the performance of LLMs with traditional machine learning algorithms for writing quality estimation and stance classification.

Beyond my methodological work, I am dedicated to educational, behavioral, and psychological science research that promotes equity and drives social change. I firmly believe in the transformative power of research to make a positive impact on society. In my free time, I enjoy spending time with family and friends, playing soccer, and continuously learning new things.

Research Projects

Predicting Cancer Risk Using Medical Histories
Developing Statistical Methods for Text Analysis in RCTs (see the related projects)
Investigating Large Language Models vs. Machine Learning for Automated Essay Scoring (see the project slides)
Extending Factor Analysis with Machine/Deep Learning Insights (see the project slides)
Examining International Students’ Professional Development, Well-being, and Experiences with Racism & Discrimination
Comparing the relative benefits of exponential random graph models with latent space models – two different approaches for predicting the formation of a tie in a network – including missing data handling (see the project slides)

Youngwon Kim

Research Projects