Current Projects

The projects for the 2017-2018 academic year are listed below.  Applications to participate in these projects are closed. Projects from previews years can be viewed on the project archive page.

Project 1: Analysis of Complex Functional Data

Faculty Mentor: Hans-Georg Müller

The goal of this project is to apply and extend existing methods for functional data with complex structure, such as point processes, densities and functional data with spatio-temporal dependencies. The developments will be primarily data-driven. Possible data sources are brain imaging (fMRI), demography and transportation and methods include appropriate versions of functional principal component analysis. The RTG students will download and preprocess data and participate in data analysis and code development.

Prerequisites: Required-Strong computing skills, Python or Matlab or R, Probability at the level of 131A, calculus and linear algebra, 106/108. Desirable but not required-131BC, 135, 141A, 141BC, 106/108.

Number of students: up to 3

Project 2: Analysis and visualization of data from a social website for sharing music and memories

Faculty Mentor: Petr Janata (Center for Mind and Brain; Department of Psychology)

The aim of this project is to analyze and display data that are being collected on a social website called MEAMCentral, a place where people can associate memories with music.  Depending on the interests of the project team, analyses that can be performed include: user interactions with the website, natural language processing of memory content, and music similarity based on associated memories or meta-information about the music. Music and memory data are stored in a graph database that can be retrieved using SPARQL or PROLOG queries. Analyses may be written in R and/or Python, and results should be visualized via web browser using the D3 JavaScript library.

Prerequisites:  Skills include facility with GitHub and R. Some prior experience with database querying, e.g. using SQL or SPARQL, Python, and D3 or JavaScript would be helpful.

Project 3: Survey of nonlinear dimension reduction

Faculty Mentor: Xiaodong Li

The goal of this project is to understand and explore the concepts, applications and empirical behaviors of nonlinear dimension reduction methods, such as MDS, Isomap, LLE, tSNE, etc, particularly for data visualization.

Prerequisites: STA106, STA108, MAT22A/MAT167.

Number of students: 3-4

Project 4: Analysis of high-dimensional proteomics data using networks and topological data analysis

Faculty Mentors: Javier Arsuaga, Dietmar Kueltz, Wolfgang Polonik

The goal of this project is to analyze differential protein expression in response to environmental conditions in fish, with the goal of addressing real scientific questions. The high-dimensional data will be analyzed using (i) certain type of network analysis (k-core method and refinements thereof) and (ii) methods from topological data analysis (TDA). While the participating students will be split into two groups, one working with networks and the other with TDA, it is expected that all participants will collaborate and participate in joint discussions and training sessions.

Prerequisites: Basic knowledge in statistics and linear algebra, aa well as good computing skills are expected. Some background in topology would be helpful but is not necessary.

Number of students: up to 6

Project 5: Causation, confounding and mediation

Faculty Mentor: Christiana Drake

Statisticians and data analysts get data from many sources. In agriculture, investigators often conduct experiments. Studies involving humans are more complicated. Clinical trials are used to assess the efficacy of new drugs. These studies are mostly randomized trials. Epidemiologists and public health workers often study potentially harmful substances and studies are observational in nature. Biologists, also, often have to rely on observational studies to investigate biological processes in the progression of diseases. We will examine the ideas underlying causal inference in observational studies through a concept called the Rubin Causal model. We will look at confounding as an obstacle in studies of causal inference and methods to address confounding. We also define the concept of a mediator and study how it differs from a confounder. The concepts will be introduced through reading short papers that introduce the concepts. We will use data to apply the concepts using SAS and R as needed.

Prerequisites: STA106/108 and STA131A-C, or equivalent

Number of students: up to 5