Undergraduate ML Courses

F18, S19
Calc II and (6.00 or 6.01)
Introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction; formulation of learning problems; representation, over-fitting, generalization; clustering, classification, probabilistic modeling; and methods such as support vector machines, hidden Markov models, and neural networks. Students taking graduate version complete additional assignments. Meets with 6.862 when offered concurrently. Recommended prerequisites: 6.006 and 18.06. Enrollment may be limited; no listeners.

Calc II
Probabilistic modeling for problems of inference and machine learning from data, emphasizing analytical and computational aspects. Distributions, marginalization, conditioning, and structure; graphical and neural network representations. Belief propagation, decision-making, classification, estimation, and prediction. Sampling methods and analysis. Introduces asymptotic analysis and information measures. Computational laboratory component explores the concepts introduced in class in the context contemporary applications.

(2.087, 6.0002, 6.01, 18.03, or 18.06) & (6.008, 6.041B, 14.30, 16.09, or 18.05)
Hands-on analysis of data demonstrates the interplay between statistics and computation. Includes four modules, each centered on a specific data set, and introduced by a domain expert. Provides instruction in specific, relevant analysis methods and corresponding algorithmic aspects. Potential modules may include medical data, gene regulation, social networks, finance data (time series), traffic, transportation, weather forecasting, policy, or industrial web applications. Projects address a large-scale data analysis question. Students taking graduate version complete additional assignments. Limited enrollment; priority to Statistics and Data Science minors and to juniors and seniors.

basic probability, e.g. 6.041A or equivalent
Introduction to the methodological foundations of data science, emphasizing basic concepts, but also modern methodologies. Learning of distributions and their parameters. Testing of multiple hypotheses. Linear and nonlinear regression and prediction. Classification. Learning of dynamical models. Uncertainty quantification. Model validation. Causal inference. Applications and case studies drawn from electrical engineering, computer science, the life sciences, finance, and social networks.

Graduate ML Courses

F18, S19
18.06 and (6.041B or 18.600)
Principles, techniques, and algorithms in machine learning from the point of view of statistical inference; representation, generalization, and model selection; and methods such as linear/additive models, active learning, boosting, support vector machines, non-parametric Bayesian methods, hidden Markov models, Bayesian networks, and convolutional and recurrent neural networks. Recommended prerequisite: 6.036 or other previous experience in machine learning.

6.041B, 6.867, 18.06
Among different approaches in modern machine learning, the course focuses on a regularization perspective and includes both shallow and deep networks. The content is roughly divided into two parts. In the first part, key algorithmic ideas are introduced, with an emphasis on the interplay between modeling and optimization aspects. Algorithms that will be discussed include classical regularization networks (regularized least squares, SVM, logistic regression),stochastic gradient methods, implicit regularization, sketching, sparsity based methods and deep neural networks. In the second part, key ideas in statistical learning theory will be developed to analyze the properties of the various algorithms previously introduced. Classical concepts like generalization, uniform convergence and Rademacher complexities will be developed, together with topics such as bounds based on margin, stability, and privacy. The final part of the course focuses on deep learning networks. It will introduce an emerging theoretical framework addressing three key puzzles in deep learning: approximation theory -- which functions can be represented more efficiently by deep networks than shallow networks -- optimization theory -- why can stochastic gradient descent easily find global minima -- and machine learning -- whether classical learning theory can explain generalization in deep networks. It will also discuss connections with the architecture of visual cortex, which was the original inspiration of the layered local connectivity of modern networks and may provide ideas for future developments of deep learning.

18.06 and (6.008, 6.041B, or 6.436[J])
Introduction to statistical inference with probabilistic graphical models. Directed and undirected graphical models, and factor graphs, over discrete and Gaussian distributions; hidden Markov models, linear dynamical systems. Sum-product and junction tree algorithms; forward-backward algorithm, Kalman filtering and smoothing. Min-sum and Viterbi algorithms. Variational methods, mean-field theory, and loopy belief propagation. Particle methods and filtering. Building graphical models from data, including parameter estimation and structure learning; Baum-Welch and Chow-Liu algorithms. Selected special topics.

linear algebra and probability (e.g. 18.06/18.700 and 6.041/6.431)
In this research-oriented course we will introduce graphical models in the framework of exponential families. We will see that polynomial equations and combinatorial constraints naturally arise and call for algebraic and combinatorial methods to advance the statistical methodology. In particular, we will highlight the role of conic duality for Gaussian graphical models and polyhedral geometry for discrete graphical models. We will also develop methods for causal inference making use of the inherent combinatorial and algebraic structure in directed graphical models. Finally, we will discuss graphical models with hidden variables by highlighting the connections to tensor decompositions. The overarching goal of this course is to provide an overview of the interplay of techniques from combinatorics, and applied algebraic geometry, with problems arising in statistics, in particular in graphical models. Specific topics include exponential families, Grobner bases, conditional independence ideals, Bayesian networks, determinantal varieties, and hyperbolic polynomials.

6.008, 6.041B, or 6.436[J]
Introduction to principles of Bayesian and non-Bayesian statistical inference. Hypothesis testing and parameter estimation, sufficient statistics; exponential families. EM agorithm. Log-loss inference criterion, entropy and model capacity. Kullback-Leibler distance and information geometry. Asymptotic analysis and large deviations theory. Model order estimation; nonparametric statistics. Computational issues and approximation techniques; Monte Carlo methods. Selected topics such as universal inference and learning, and universal features and neural networks.

Machine Learning for Healthcare
6.034 or 6.438 or 6.806 or 6.036 or 6.867 or 9.520
Introduces students to machine learning in healthcare, including the nature of clinical data and the use of machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, and improving clinical workflows. Topics include causality, interpretability, algorithmic fairness, time-series analysis, graphical models, deep learning and transfer learning. Guest lectures by clinicians from the Boston area and course projects with real clinical data emphasize subtleties of working with clinical data and translating machine learning into clinical practice. Limited to 55.

6.867 or 6.437 or 6.438, 6.041B, 18.06
As both the number of data sets and data set sizes grow, practitioners are interested in learning increasingly complex information and interactions from data. Probabilistic modeling in general, and Bayesian approaches in particular, provide a unifying framework for flexible modeling that includes prediction, estimation, and coherent uncertainty quantification. In this course, we will cover the modern challenges of Bayesian inference, including (but not limited to) speed of approximate inference, making use of distributed architectures, streaming data, and complex data interactions. We will study Bayesian nonparametric models, wherein model complexity grows with the size of the data; this allows us to learn, e.g., a greater diversity of topics as we read more documents from Wikipedia, identify more friend groups as we process more of Facebook's network structure, etc.

Modern machine learning systems are often built on top of algorithms that do not have provable guarantees, and it is the subject of debate when and why they work. In this class, we will focus on designing algorithms whose performance we can rigorously analyze for fundamental machine learning problems. We will cover topics such as: nonnegative matrix factorization, tensor decomposition, sparse coding, learning mixture models, matrix completion and inference in graphical models. Almost all of these problems are computationally hard in the worst-case and so developing an algorithmic theory is about (1) choosing the right models in which to study these problems and (2) developing the appropriate mathematical tools (often from probability, geometry or algebra) in order to rigorously analyze existing heuristics, or to design fundamentally new algorithms.


Probability and Statistics

Probabilistic Systems Analysis (Intro to Probability)
F18, S19
Fundamentals of Probability (Graduate)
Probability and Statistics
Probability and Random Variables (Undergraduate)
Calc II
Fundamentals of Statistics

Linear Algebra and Optimization

Linear Algebra (Undergraduate)
F18, S19
Matrix Methods in Data Analysis, Signal Processing, and ML (Undergraduate)
Optimization Methods (Undergraduate)
Introduction to Mathematical Programming (Graduate)