
On Thursday, I was lucky to present Fused Extended Two-Way Fixed Effects to the causal inference reading group at USC. I’m very grateful to Angela Zhou, Zijun Gao, and Dennis Shen for hosting me, Jacob Bien for putting me in…
I have a new paper on arXiv (link) that proposes a novel machine learning estimator for difference-in-differences with staggered adoptions, fused extended two-way fixed effects (FETWFE). Its main advantage over existing methods is that it is more efficient. Unlike existing…
On Saturday I gave the presentation “Predicting Purchases, Rare Diseases, and More: Using Ordinal Regression to Estimate Rare Event Probabilities” at Data Con LA 2023. I discussed using the proportional odds ordinal regression model to improve the estimation of probability…
Simulation studies (sometimes called synthetic data experiments or Monte Carlo simulations) are useful tools for generating evidence about whether a statistical claim is true. For example: Here’s the idea: Recently I taught a tutorial on the basics on simulation studies…
I’m excited to announce that “Predicting Rare Events by Shrinking Towards Proportional Odds” has been accepted to the Fortieth International Conference on Machine Learning (ICML 2023)! In the paper, we propose PRESTO, a novel method for improving classification in the…
In a 2022 research paper that I wrote with my advisor Jacob Bien, we proposed a novel feature selection method called cluster stability selection. Cluster stability selection is a method for identifying features that are useful for predicting a response…
I recently took the first-year screening exam for Ph.D. students in the statistics group in the Department of Data Sciences and Operations at Marshall. Since I started applying to grad school, I’ve been writing up review notes to help me…
I was a part of Team Save the WoRld along with Faizan Haque, Javier Orraca, Sam Park, and Shruhi Desai in the OCRUG Hackathon 2019 held at UC Irvine on May 18th and 19th. (In fact, I am writing this…
Today I gave an in-class presentation at USC on two papers in multi-task learning (or multivariate regression–linear regression when the response is a vector rather than one number). You could simply train a separate model for each response, but when…
In another post, I described how I fit a model to predict how well Buffer—a digital subscription-based firm that publicly releases much of its financial data—retains its customers. I used a methodology developed by Daniel McCarthy, Peter Fader, and Bruce Hardie (paper…
Stay in the loop with everything you need to know.