About This Course
This is an introductory-level course in supervised learning, with a
focus on regression and classification methods. The syllabus
includes: linear and polynomial regression, logistic regression and
linear discriminant analysis; cross-validation and the bootstrap,
model selection and regularization methods (ridge and lasso);
nonlinear models, splines and generalized additive models; tree-based
methods, random forests and boosting; support-vector machines. Some
unsupervised learning methods are discussed: principal components and
clustering (k-means and hierarchical).
This is not a math-heavy class, so we try and describe the methods
without heavy reliance on formulas and complex mathematics. We focus
on what we consider to be the important elements of modern data
analysis. Computing is done in R. There are lectures devoted to R,
giving tutorials from the ground up, and progressing with more
detailed sessions that implement the techniques in each chapter.
The lectures cover all the material in
An Introduction to Statistical
Learning, with Applications in R by James, Witten, Hastie and
Tibshirani (Springer, 2013). The pdf for this book is available for free on the book website.
First courses in statistics, linear algebra, and computing.
Available for download here.
Ismael Lemhadri is a PhD candidate in the Department of Statistics at Stanford University. Prior to joining Stanford, he completed his undergraduate studies in France at Ecole Polytechnique with a focus on Applied Mathematics, and also spent some time at Jump Trading in London, developing proprietary trading algorithms.
He is a recipient of the Monahan Foundation Fellowship, and the Stanford Graduate Fellowship.
Mona Azadkia, Souvik Ray and Chenyang Zhong.
Course Production Team
Will Fithian and Sam Gross produced and formatted the quiz questions and review questions. Daniela Witten helped present some of the material in Chapter 5. Wes Choy managed the video production. Greg Maximov filmed and edited most of the course videos, as well as the interviews and group recordings. Greg Bruhns, Monica Diaz and Marc Sanders assisted with Open edX.
Frequently Asked Questions
Do I need to buy a textbook?
No, a free online version of An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and
Tibshirani (Springer, 2013) is available from that website. Springer has agreed to this, so no need to worry about copyright. Of course you may not distribute printed versions of this pdf file.
Is R and RStudio available for free?
Yes. You get R for free from
http://cran.us.r-project.org/. Typically it installs with a click. You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.