Author: gfaletto

I have a new paper on arXiv (link) that proposes a novel machine learning estimator for difference-in-differences with staggered adoptions, fused extended two-way fixed effects (FETWFE). Its main advantage over existing methods is that it is more efficient. Unlike existing methods, it leverages our knowledge that treatment effects for nearby times are likely to be…

Presentation at Data Con LA 2023 on PRESTO

Aug 14, 2023

—

by

On Saturday I gave the presentation “Predicting Purchases, Rare Diseases, and More: Using Ordinal Regression to Estimate Rare Event Probabilities” at Data Con LA 2023. I discussed using the proportional odds ordinal regression model to improve the estimation of probability estimates in classification with class imbalance. I built up to discussing PRESTO, the method developed…

How to conduct a synthetic data experiment

Jul 8, 2023

—

by

Simulation studies (sometimes called synthetic data experiments or Monte Carlo simulations) are useful tools for generating evidence about whether a statistical claim is true. For example: Here’s the idea: Recently I taught a tutorial on the basics on simulation studies for undergraduate students as a part of the USC JumpStart program. I taught the basics…

PRESTO accepted to ICML 2023

Apr 27, 2023

—

by

I’m excited to announce that “Predicting Rare Events by Shrinking Towards Proportional Odds” has been accepted to the Fortieth International Conference on Machine Learning (ICML 2023)! In the paper, we propose PRESTO, a novel method for improving classification in the class imbalance setting. You can read my brief summary of the paper on Twitter.

cssr R Package

Mar 13, 2023

—

by

In a 2022 research paper that I wrote with my advisor Jacob Bien, we proposed a novel feature selection method called cluster stability selection. Cluster stability selection is a method for identifying features that are useful for predicting a response variable. It has applications in medical research (including genomics and genetics), economics, analyzing survey data,…

My Math Review Notes

Jul 3, 2019

—

by

I recently took the first-year screening exam for Ph.D. students in the statistics group in the Department of Data Sciences and Operations at Marshall. Since I started applying to grad school, I’ve been writing up review notes to help me with math. Originally the purpose of the notes was to help me study for the…

Our Entry in the OCRUG Hackathon 2019

May 19, 2019

—

by

I was a part of Team Save the WoRld along with Faizan Haque, Javier Orraca, Sam Park, and Shruhi Desai in the OCRUG Hackathon 2019 held at UC Irvine on May 18th and 19th. (In fact, I am writing this blog post at the tail end of our time before we present our results!) The…

Presentation on Multi-Task Learning

Apr 1, 2019

—

by

Today I gave an in-class presentation at USC on two papers in multi-task learning (or multivariate regression–linear regression when the response is a vector rather than one number). You could simply train a separate model for each response, but when the responses are related, there are advantages to considering them all at the same time…

The McCarthy/Fader/Hardie Model for Customer Retention

Nov 20, 2018

—

by