Statisticians help to design data collection plans, analyze data appropriately and interpret and draw conclusions from those analyses. The central objective of the undergraduate major in Statistics is to equip students with consequently requisite quantitative skills that they can employ and build on in flexible ways.

Majors should understand 1) the fundamentals of probability theory, 2) statistical reasoning and inferential methods, 3) statistical computing, 4) statistical modeling and its limitations, and 5) have skill in description, interpretation and exploratory analysis of data by graphical and other means. Additionally, 6) graduates are also expected to learn to communicate effectively.

## Courses and Program Objectives

The statistics curriculum was designed to help students achieve these learning outcomes. Numbers in square brackets refer to those objectives enumerated above that are particularly relevant to the individual courses.

Statistics 133: Concepts in Computing with Data. [3,5,6]

This course focuses on how to use the computer to conduct a statistical analysis of data, including how to acquire, clean and organize data, analyze data using computationally intensive statistical methods, and report findings. Students gain experience in computing as a supporting skill for statistical practice and research. They learn how to use existing high-level general purpose software to create new algorithms and functionality and to express statistical ideas and computations, and they learn about different data technologies and tools, when to use them, and what are their trade-offs. Students acquire skills in basic numeracy, graphics, modern computationally intensive methods, and simulation. Programming concepts include variables, data types, trees, control flow. Data technologies topics include the digital representation of data, regular expressions for text manipulation, relational database management systems, the eXtensible Markup Language (XML), Web services for distributed functionality and methods, and Web publication. Extensive written reports are an integral part of the course.

Statistics 134: Concepts of Probability. [1]

This is an introduction to probability theory, aimed at students who have had at least one year of calculus. The course covers the laws of probability, expectation, and conditioning, as well as all the standard distributions of random variables both discrete and continuous. Functions of random variables - sums, order statistics, and so on - are studied thoroughly, as are limit laws such as the law of large numbers and the central limit theorem, and the standard models: Bernoulli trials, sampling with and without replacement, Poisson process, univariate and bivariate normal, covariance and correlation. The course serves as preparation for later more systematic study of mathematical statistics and stochastic processes.

Statistics 135: Concepts of Statistics. [2,4,6]

This is a comprehensive survey course in statistical theory and methodology, aimed at the goals of understanding the fundamental principles of statistical reasoning, achieving proficiency in data analysis, and developing written communications skills. To these ends, topics include descriptive statistics and data analysis, fundamental concepts of the theory of estimation and hypothesis testing, and methodology such as sampling, goodness-of-fit testing, analysis of variance, and least squares estimation. The laboratory includes computer-based data- analytic applications to a variety of subject matter and requires written reports.

Data/Statistics C140: Probability for Data Science [1]

An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra.

Data/Statistics C102 Data, Inference, and Decisions

This course develops the probabilistic foundations of inference in data science, and builds a comprehensive view of the modeling and decision-making life cycle in data science including its human, social, and ethical implications. Topics include: frequentist and Bayesian decision-making, permutation testing, false discovery rate, probabilistic interpretations of models, Bayesian hierarchical models, basics of experimental design, confidence intervals, causal inference, Thompson sampling, optimal control, Q-learning, differential privacy, clustering algorithms, recommendation systems and an introduction to machine learning tools including decision trees, neural networks and ensemble methods.

Statistics 150 Stochastic Processes

This course is especially recommended for students, with a strong interest in probability theory or stochastic models, including models in finance, ecology, epidemiology, geophysics and other fields. Random walks, discrete time Markov chains, Poisson processes. Further topics such as: continuous time Markov chains, queueing theory, point processes, branching processes, renewal theory, stationary processes, Gaussian processes.

Statistics 151A Linear Modelling: Theory and Applications

This course is especially recommended for students with an interest in economics, social science, or statistical models and data analysis more generally. A coordinated treatment of linear and generalized linear models and their application. Linear regression, analysis of variance and covariance, random effects, design and analysis of experiments, quality improvement, log-linear models for discrete multivariate data, model selection, robustness, graphical techniques, productive use of computers, in-depth case studies.

Statistics 152 Sampling Surveys

This course is especially recommended for students with an interest in social science, marketing, and data collection more generally. Theory and practice of sampling from finite populations. Simple random, stratified, cluster, and double sampling. Sampling with unequal probabilities. Properties of various estimators including ratio, regression, and difference estimators. Error estimation for complex samples.

Statistics 153 Time Series

This course is especially recommended for students with an interest in physical science, communication and information theory, economics, finance, or actuarial work. An introduction to time series analysis in the time domain and spectral domain. Topics will include: estimation of trends and seasonal effects, autoregressive moving average models, forecasting, indicators, harmonic analysis, spectra.

Statistics 154 Modern Statistical Prediction and Machine Learning

Theory and practice of statistical prediction. Contemporary methods as extensions of classical methods. Topics: optimal prediction rules, the curse of dimensionality, empirical risk, linear regression and classification, basis expansions, regularization, splines, the bootstrap, model selection, classification and regression trees, boosting, support vector machines. Computational efficiency versus predictive performance. Emphasis on experience with real data and assessing statistical assumptions.

Statistics 155 Game Theory

This course is especially recommended for students with an interest in mathematics, optimization or strategy, including business decisions. General theory of zero-sum, two-person games, including games in extensive form and continuous games, and illustrated by detailed study of examples.

Statistics 156 Causal Inference

This course will focus on approaches to causal inference using the potential outcomes framework. It will also use causal diagrams at an intuitive level. The main topics are classical randomized experiments, observational studies, instrumental variables, principal stratification and mediation analysis. Applications are drawn from a variety of fields including political science, economics, sociology, public health, and medicine. This course is a mix of statistical theory and data analysis. Students will be exposed to statistical questions that are relevant to decision and policy making.

Statistics 157 Seminar on Topics in Probability and Statistics

Substantial student participation required. The topics to be covered each semester that the course may be offered will be announced by the middle of the preceding semester; see departmental bulletins. Recent topics include: Bayesian statistics, statistics and finance, random matrix theory, high-dimensional statistics.

Statistics 158 Experimental Design

This course will review the statistical foundations of randomized experiments and study principles for addressing common setbacks in experimental design and analysis in practice. We will cover the notion of potential outcomes for causal inference and the Fisherian principles for experimentation (randomization, blocking, and replications). We will also cover experiments with complex structures (clustering in units, factorial design, hierarchy in treatments, sequential assignment, etc). We will also address practical complications in experiments, including noncompliance, missing data, and measurement error.

Statistics 159 Reproducible and Collaborative Data Science

A project-based introduction to statistical data analysis. Through case studies, computer laboratories, and a term project, students will learn practical techniques and tools for producing statistically sound and appropriate, reproducible, and verifiable computational answers to scientific questions. Course emphasizes version control, testing, process automation, code review, and collaborative programming. Software tools may include Bash, Git, Python, and LaTeX.

Statistics 165 Forecasting

Forecasting has been used to predict elections, climate change, and the spread of COVID-19. Poor forecasts led to the 2008 financial crisis. In our daily lives, good forecasting ability can help us plan our work, be on time to events, and make informed career decisions. This practically-oriented class will provide students with tools to make good forecasts, including Fermi estimates, calibration training, base rates, scope sensitivity, and power laws.

Statistics majors pursue many different careers. Some go to graduate programs in Statistics or other mathematical or scientific disciplines, some to MBA programs, some become actuaries, some go into teaching, and some into industry or government.

The major has a great deal of flexibility for students to acquire the skills and knowledge to take the next step, through upper-division electives and a mandatory three-course "cluster" outside the department. Many Statistics majors are double and triple majors; quantitative upper-division courses in their other majors are often suitable for the cluster.

Students intending to pursue graduate work in Statistics are encouraged to take Mathematics courses for their cluster.

Students intending to become actuaries can take courses in Economics and Demography, and are encouraged to take Statistics 151A, 152 and 153 for their electives. Some of these count towards "Validation by Educational Experience."

There is a special track for students who intend to become teachers. Such students take four Mathematics courses to prepare them for teaching a broader Mathematics curriculum, and are required to take only two upper-division Statistics elective courses.

Students interested in MBA programs are encouraged to take Business or Economics courses for their cluster, and to take Statistics 151A, 153 and either 152 or 155 as their upper-division electives.