About me

I am a 1st year Master student in Data Science at Harvard and just graduated from UCLA double major in Statistics and Math/Econ. My research interest is statistical machine learning, especially the applications in Genomics and Bioinformatics. I am grateful to be advised by Prof. Jessica Jingyi Li and Prof. Mathieu Bauchy in my undergraduate research. Also, I took internships at Adobe, Thumbtack, TAL Education Group, and onomy.

My research focuses on developing statistical tools for Single-cell RNA sequencing data. I built ML-based cell type similarity trees to refine ambiguity and subjectivity in the cell type annotation and designed an R package for scGTM(Cui et al., 2022) that uses GAM motivated model to fit gene expression trends along cell pseudotime.

Besides applying Statistics in Biology, I am also interested in model interpretability and robustness. I conducted research in Symbolic Regression to predict material’s fracture energy with geometric features to overcome the interpretability issues in CNN.

Here is a list of Machine Learning projects I did as an intern:

  1. Sales Lead Score Model
  2. Click Sequence Clustering
  3. Reddit Sentiment Analysis and Topic Modeling
  4. Markov Chain x Customer Adoption Journey