Time: 3:00pm to 5:00pm
Venue: Room 7-208, 7/F, Lau Ming Wai Academic Building
Many contemporary large-scale applications involve building interpretable models linking a large set of potential covariates to a response in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively control the fraction of false discoveries even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of \textit{model-free} knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. The key innovation of our method is to construct knockoff variables probabilistically instead of geometrically. This enables model-free knockoffs to deal with arbitrary (and unknown) conditional models and any dimensions, including when the dimensionality $p$ exceeds the sample size $n$, while the original knockoffs procedure is constrained to homoscedastic linear models with $n \ge p$. Our approach requires the design matrix be random (independent and identically distributed rows) with a covariate distribution that is known, although we show our procedure to be robust to unknown/estimated distributions. To our knowledge, no other procedure solves the \textit{controlled} variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn's disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data. This is a joint work with Emmanuel Candès, Yingying Fan and Lucas Janson.
Jinchi Lv is McAlister Associate Professor in Business Administration in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California, Associate Professor in Department of Mathematics at USC, and an Associate Fellow of USC Dornsife Institute for New Economic Thinking (INET). He received his Ph.D. in Mathematics from Princeton University in 2007 under the supervision of Professor Jianqing Fan. His research interests include deep learning, causal inference, personalized medicine and choices, scalable Bayesian inference, large-scale inference and false discovery rate control, networks, high-dimensional statistics, big data problems, statistical machine learning, neuroscience and business applications, and financial econometrics. His papers have been published in journals in statistics, economics, information theory, biology, and computer science, and one of them was published as a Discussion Paper in Journal of the Royal Statistical Society Series B (2008). He serves as an associate editor of the Annals of Statistics (2013-present) and Statistica Sinica (2008-2016). He is the recipient of the Royal Statistical Society Guy Medal in Bronze (2015), NSF Faculty Early Career Development (CAREER) Award (2010), USC Marshall Dean's Award for Research Excellence (2009), and Zumberge Individual Award from USC's James H. Zumberge Faculty Research and Innovation Fund (2008).