2021-2022 Invited Speaker Series

Spring 2022

Stories about Statistics

Dr. John Bailer | Department of Statistics, Miami University

Date: April 22, 2022, virtual via Zoom

Abstract: Statistics as a discipline has changed dramatically over the last ⅓ of a century. Reflections on changes in research, teaching, curriculum, university life and more are presented from a personal, idiosyncratic and occasionally biographical perspective. There may even be some talk about statistics behind the stories and the stories behind the statistics.

Creating Three Types of Websites with Blogdown

Dr. Yihui Xie | RStudio

Date: April 7, 2022, virtual via Zoom

Abstract: This talk will be a tutorial that teaches you how to get started with blogdown to create and maintain a personal website. I will show three choices of website themes (minimal, intermediate, and sophisticated, respectively). To gain most from this talk, the audience should have basic knowledge of GIT and GitHub (make sure you know how to push to GitHub). If time permits and anyone is interested, I can also talk briefly about the pagedown package.

Clustering Higher-Order Data

Dr. Paul McNicholas | McMaster University, Ontario, Canada

Date: March 29, 2022, virtual via Zoom

Abstract: There is an extensive body of literature on clustering univariate and multivariate data. However, attention to the use of multidimensional arrays for clustering has thus far been limited to two-dimensional arrays, i.e., matrices or order-two tensors. Work on clustering data matrices, or three-way data, is presented before an approach for clustering multi-way data is introduced. The latter is based on a finite mixture of multidimensional arrays., i.e., a finite mixture of d-dimensional arrays, for d>2. For both matrix- and tensor-variate approaches, the Gaussian component approach is introduced first but approaches that use non-Gaussian components are also discussed. Simulated and real data are used for illustration.

A Peek into Statistics in Precision Medicine via Interaction Trees and Forest

Dr. Xiaogang Su | Department of Mathematical Sciences University of Texas at El Paso

Date: March 8, 2022, virtual via Zoom

Abstract: Precision medicine aims to integrate comprehensive patient data (including medical/health records, demographics, genetic and environmental information, and lifestyles) to deepen disease understanding, aid drug discovery, and optimize delivery of personalized therapies. To advance precision medicine, a thorough assessment of heterogeneous treatment effects is essential. Concerning data collected from randomized trials, we explore stratified and individualized treatment effects with a machine learning approach - interaction trees (IT) and random forest of interaction trees (RFIT). While IT and RFIT inherit many useful features of CART and random forests (RF), substantial modifications and expansions are made for enhancement. These include a smooth sigmoid surrogate (SSS) splitting method, leaf fusion for tree model determination, valid statistical inference, aggregated grouping for refined stratification, and an infinitesimal jackknife (IJ) method to compute the standard error of each estimated individualized treatment effect. An empirical illustration of the proposed techniques is made via analysis of quality of life (QoL) data collected from a randomized intervention trial with breast cancer survivors.

Central quantile subspace and its applications

Dr. Eliana Christou | Department of Mathematics and Statistics, the University of North Carolina at Charlotte

Date: March 3, 2022, virtual via Zoom

Abstract: Quantile regression (QR) is becoming increasingly popular due to its relevance in many scientific investigations. There is a great amount of work about linear and nonlinear QR models. Specifically, nonparametric estimation of the conditional quantiles received particular attention, due to its model flexibility. However, nonparametric QR techniques are limited in the number of covariates. Dimension reduction offers a solution to this problem by considering low-dimensional smoothing without specifying any parametric or nonparametric regression relation. The existing dimension reduction techniques focus on the entire conditional distribution. We, on the other hand, turn our attention to dimension reduction techniques for conditional quantiles and introduce a new method for reducing the dimension of the predictor X. The performance of the methodology is demonstrated through simulation examples and data applications, especially to financial data. Finally, various extensions of the method are presented, such as nonlinear dimension reduction and the use of categorical predictors.

Fall 2021

Mapping The Fed’s Reaction Function with Directed Acyclic Graphs

Dr. James Caton | North Dakota State University

Date: October 15, 2021 from 11:45am-12:45pm, virtual via Zoom

Abstract: Under the monetary policy framework introduced by Benjamin Bernanke during the 2008 financial crisis, the value of currency in circulation as a proportion of the value of the assets side of the balance sheet has become a choice variable for implementing policy. Before 2008 changes in this variable appear to be incidental, thus providing a natural experiment to evaluate the effects of changes before and after implementation of the new policy framework. An inflation rate that is less than the target inflation rate and an unemployment rate in excess of the target unemployment rate lead policymakers to lower the federal funds rate target and increase the size of the balance sheet in excess of the value of circulating currency. The intention of such expansion is to reduce the interest rate without also lifting short-run inflation. Likewise, changes in the reverse directions lead to a lifting of the federal funds rate target and a shrinking of the balance sheet. While the response of policy to changing economic conditions is straightforward, the response of the macroeconomy to these policy changes needs clear elaboration. We present directed acyclic graphs (DAGs) to evaluate the response of monetary policy to changing economic indicators. We are interested in mapping the effects of changes in the federal funds rate as well as the effects of changes in the value of assets held by the Federal Reserve. We evaluate the causal chains presented in each DAG by considering the relevant partial correlations for structurally connected variables as well as causal force across time as indicated by corresponding vector autoregressions.

Robust Estimates of Insurance Misrepresentation through Kernel Quantile Regression Mixtures

Dr. Jianxi Su | Purdue University

Date: Friday, November 19, 11:45pm-12:45om, virtual via Zoom

Abstract: Identifying frauds in insurance claims has been a very active research area in actuarial and data sciences over recent years. Another equally important yet much less touched topic is misrepresentation identification. Misrepresentation occurs when a policy applicant makes untrue statements on certain rating factors so as to alter the insurance eligibility and/or premium.

This talk pertains to a class of mixture models based on quantile regression in reproducing kernel Hilbert spaces for studying insurance misrepresentation. Compared to the existing parametric approaches, the proposed framework features a more flexible statistics structure which could alleviate the risk of model misspecification, and is in the meantime more robust to outliers in the data. The proposed framework can not only estimate the prevalence of misrepresentation in the data, but also help identify the most suspicious individuals for the validation purpose.

Growth dynamics for plant high-throughput phenotyping studies using hierarchical functional data analysis.

Dr. Yuhang Xu | Bowling Green State University

Date: December 3, 2021 from 11:45am-12:45pm, virtual via Zoom

Abstract: In modern high-throughput plant phenotyping, images of plants of different genotypes are repeatedly taken throughout the growing season, and phenotypic traits of plants (e.g., plant height) are extracted through image processing. It is of interest to recover whole trait trajectories and their derivatives at both genotype and plant levels based on observations made at irregular discrete time points. We propose to model trait trajectories using hierarchical functional principal component analysis (HFPCA) and show that the problem of recovering derivatives of the trajectories is reduced to estimating derivatives of eigenfunctions, which is solved by differentiating eigenequations. Simulation studies show that the proposed procedure performs better than its competitors in terms of recovering both trait trajectories and their derivatives. Interesting characteristics of plant growth dynamics are revealed in the application to a modern plant phenotyping study.