Current Projects

The projects for the 2018-2019 academic year are listed below.   Applications to participate in these projects will open soon. Projects from previews years can be viewed on the project archive page.

Project 1: Applied Functional Data Analysis

Faculty Mentor: Hans-Georg Müller (Statistics)

Participating students will work on the analysis of data that consist of samples of trajectories from various areas, such as growth curves, house price trajectories, lifetables or neurocognitive scores and fMRI as a function of age from the Human Connectome Project. Students will download and pre-process data, apply methodology using existing R packages, develop visualizations of data and their analysis, identify problems where new technology is required and participate in the development of  solutions for these problems. Students are expected to participate in the project for 3+ quarters, will enroll in STA 199 for each quarter during the project period, and will aim to write a scientific report and to prepare a presentation upon completing the project.

Prerequisites: Strong programming skills, especially in R (STA 141 a plus) and excellent knowledge of linear models, multivariate methods and probability (STA 106, 108, 135, 130A/131A).  

Project 2: Data analysis using autocorrelation wavelets via Julia

Faculty Mentor: Naoki Saito (Mathematics)

We will explore the so-called autocorrelation wavelets as a tool for data analysis including time series analysis. Signal representations using autocorrelation wavelets are redundant and non-orthogonal, yet they have certain desirable properties compared to the conventional wavelet transforms, e.g., complete symmetry without losing vanishing moments; edge detection and characterization capabilities; and shift invariance. Participants will implement the codes in the Julia programming language, migrating from our preliminary MATLAB codes, and examine various nonstationary time series datasets.

Prerequisites:  Participants are expected to have some programming experience and knowledge in either MATLAB, R, Python, or Julia (Note that experience on Julia is not required to undertake this project).

Number of students: 2

Project 3: On the interplay between sampling and optimization

Faculty Mentor: Krishna Balasubramanian (Statistics)

Optimization and sampling are arguably the computational backbones of frequentist and Bayesian statistics respectively. The goal of this project will be to explore the interplay between optimization and sampling techniques for large-scale data analysis. In particular the use of higher-order geometric information for the purpose of optimization/sampling will be examined. Computational connections between bayesian and frequentist statistics will also be examined. The RTG students will participate, depending on their interest/expertise, in at least one of the following: software development, real-world data analysis, algorithm development and theoretical analysis. 

Number of students: up to 2

Project 4: Analysis of high-dimensional proteomics data using networks and topological data analysis

Faculty Mentors: Dietmar Kueltz (Animal Science), Javier Arsuaga (Mathematics and Molecular and Cellular Biology), Wolfgang Polonik (Statistics)

The goal of this project is to analyze differential protein expression in response to environmental conditions in fish, with the goal of addressing real scientific questions. The high-dimensional data will be analyzed using (i) certain type of network analysis (k-core method and refinements thereof) and (ii) methods from topological data analysis (TDA).

Prerequisites: Basic knowledge in statistics and linear algebra, aa well as good computing skills are expected. Some background in topology would be helpful but is not necessary.

Number of students: up to 5

Project 5: Checking the Quality of Solutions to Optimization Problems

Faculty Mentor: Miles Lopes (Statistics)

In many applications, it is necessary to optimize a function that depends on random variables. In this type of situation, the solution of the optimization problem is itself random, and it will vary over repeated experiments. For this reason, it is important to understand how much the random solution fluctuates, and to estimate its "typical quality". In this project, the student will study a recently proposed method for checking the quality of random solutions. The goal will be to implement this method in some examples, and possibly develop improvements that may lead to a better method.

Prerequisites: The student needs to be comfortable programming in R or Matlab, and it would be desirable if the student has any experience (or interest) in solving numerical optimization problems.

Number of students: 1

Project 6: Topological analysis of chromosome conformation capture data

Faculty Mentor: Javier Arsuaga (Mathematics and Molecular and Cellular Biology))

Genomes are highly condensed in all organisms and their three dimensional organization remains to be determined. Theoretical studies of polymers in confined volumes predict that chromosomes should be highly entangled; entanglement however inhibits function. In this project we will develop statistical methods to study the interplay between chromosome structure and its topology. We will use chromosome conformation capture data (CCC) to produce 3D reconstructions of different organisms and determine whether these reconstructions are consistent with topological measures derived from data. The project will mostly use tools from topological data analysis (TDA), including persistence homology and knot theory invariants. 

Prerequisites: excellent programming skills, experience in managing large datasets, interest in molecular biology, working knowledge of statistical inference and linear models. GPA at least 3.5

Number of students: 1

Project 7: Analysis of data from wearable devices

Faculty Mentor: Jane-Ling Wang (Statistics)

Wearable devices, such as fitbit and iWatch, have facilitated continuous or quasi-continuous recording of activity data for many days and for a large number of individuals.  They constitute a rich source of health data but the sheer volume and complexity of such data pose great challenges to researchers. Since they are data recorded over a time period, they can be treated as functional data, which are data in the form of functions.   The aim of this project is to download, process and analyze data generated from a wearable device, and to study the pattern of daily activities and its relation to health outcome.  In particular, we will explore approaches developed for functional data and develop new methods that are specifically tailored for wearable device data. 

Prerequisites: Ability or willingness to learn to handle big complex data and strong programming skills.  

Number of students: up to 3