The projects for the 2019-2020 academic year are posted below. Information about how to apply for this program are posted here. Projects from previews years can be viewed on the project archive page.

**Project: Inference for stochastic Frank-Wolfe algorithms** (Faculty Mentor: Krishna Balasubramanian, Statistics)

Frank-Wolfe or conditional gradient descent is a popular algorithm for constrained optimization. However, when used in the statistical context, little is known about the inferential aspects of this algorithm. The student will be expected to perform numerical simulations by running Frank-Wolfe algorithm on some test cases to explore the distributional behavior of the Frank-Wolfe algorithm. Depending on interest, some theoretical studies might also be considered. Anticipated number of students: 1.

**Project: Network outlier detectio**n (Faculty Mentor: Can Le, Statistics)

Network analysis has become an important topic of research in many areas, including social science, computer science and statistics. A network dataset typically consists of a set of nodes and a set of links between the nodes. An outlier can be defined as a node that differs significantly from other nodes and can cause serious problems in statistical analysis. In this project we consider the problem of identifying and removing such outliers from network data. Specifically, we propose to rank nodes by using the weights in the Pietsch factorization theorem, which can be computed by solving an optimization problem. Pietsch factorization theorem is an important result in functional analysis and has been successfully applied to the network community detection problem. This project aims to extend the applicability of this result to the network outlier detection problem for more general network settings. ** **Anticipated number of students: up to 4.

**Project: Exploring the microbiome and compositional data** ( Faculty Mentor: Hans Mueller, Statistics)

Participating students will study and implement methods for the analysis of compositional data, with emphasis on (a) longitudinal versions and (b) microbiome data. *Prerequisites*: Very good programming skills with R, strong mathematical background. Anticipated number of students: up to 3.

**Project: Exploring principal components analysis via bootstrap methods** (Faculty Mentor: Miles Lopes, Statistics)

Principal components analysis (PCA) is one of the most widely used tools for data analysis. In essence, it is concerned with extracting information from the eigenvalues/vectors of covariance matrices, and using this information to represent data in a way that is easier to understand. In particular, it is important to construct confidence intervals for the unknown eigenvalues of the true covariance matrix associated with the data. To construct these intervals in practice, it is common to use "bootstrap methods", but it is sometimes hard to know when these methods actually work. In this project, the student will carry out numerical experiments to investigate situations where bootstrap methods are not well understood. *Prerequisites:* The student needs to be comfortable with a programming language such as R, Python, or Matlab. Also, it would be very helpful for the student to have taken STA 135 (Multivariate Data Analysis). Anticipated number of students: 1.

**Project: Higher-dimensional data analysis using autocorrelation wavelets via Julia** (Faculty Mentor: Naoki Saito, Mathematics and Statistics)

We will extend the autocorrelation wavelets for higher-dimensional data analysis, e.g., image analysis and multiple time series. Signal representations using autocorrelation wavelets are redundant and non-orthogonal, yet they have certain desirable properties compared to the conventional wavelet transforms, e.g., complete symmetry without losing vanishing moments; edge detection and characterization capabilities; and shift invariance. The 1D version of the autocorrelation wavelets have been successfully implemented and tested in the Julia programming language by the previous RTG project. However, extending the 1D version to higher dimensional data is nontrivial. For example, the most interesting aspect of this project is how to develop efficient detectors for oriented edges in images. This project also has a link with visual neuroscience, and we plan to have group reading on such material. Participants will implement the codes in the Julia programming language. *Prerequisites:* Participants are expected to have some experience and knowledge in the Julia programming language. Also, it would be preferable if a participant have some experience in using the software management system "git", but that is not mandatory. Knowledge of the Julia language is expected. Anticipated number of students: 2.

**Project: Topological analysis of cancer genomic data **(Faculty Mentor: Javier Arsuaga, Molecular and Cellular Biology & Mathematics)

Cancer is driven by genes that either promote cell division or fail to stop it. How these genes are regulated and coordinated to eventually trigger cancer progression remains to be determined. I. This project students will develop methods in Topological Data Analysis to identify regulation mechanisms of genes in breast cancer. Anticipated number of students: 1.

**Project: Analysis of high-dimensional proteomics data using networks and topological data analysis** (Faculty Mentors: Dietmar Kuelz, Animal Science; Javier Arsuaga, Mathematics and Molecular and Cellular Biology; Wolfgang Polonik, Statistics)

The goal of this project is to analyze differential protein expression in response to environmental conditions in fish, with the goal of addressing real scientific questions. The high-dimensional data will be analyzed using (i) certain type of network analysis (k-core method and refinements thereof) and (ii) methods from topological data analysis (TDA). *Prerequisites:* Basic knowledge in statistics and linear algebra, as well as good computing skills are expected. Some background in topology would be helpful but is not necessary. Anticipated number of students: 1 or 2