Machine Learning Lectures by Prof PS Sastry

Why Yet Another Lecture Series on Machine Learning?

[RR] Machine Learning is an interesting field in that it brings in ideas and connects so many disciplines. It is multidisciplinary in every sense of the word. It is not uncommon to find the same concept called differently, often by the same lecturer (or book author) based on the context. This can be confusing, or even frustrating, for budding students of the field. (Learning ML could be a tough learning problem!) Only those experts who have a good knowledge of the various connecting disciplies can bring out the clarity. Prof PS Sastry is one such rare person often known only in smaller circles. (Read the Preface for more.) He brings in a wealth of experience (> 25 years) in ML and all related fields, and he has spent time and effort to share them through these lectures. As you will see, he will not only teach you Machine Learning, but also put you on a strong theoretical footing towards mastering it. I won't say more, the lectures will speak for himself.

About the Lectures

[Sastry] The lectures are designed for graduate students (i.e. first year ME or research students). They are intended to give the students a fairly comprehensive view of fundamentals of Machine Learning, in particular, classification and regression. However, not all topics are covered. For example, we do not discuss Decision tree classifiers. Also, the course deals with neural networks models only from the point of view of classification and regression. For example, no recurrent neural network models (e.g., Boltzman machine) are included. The main reason for leaving out some topics is to keep the course content suitable for a one semester course.

Pre-requisites

[RR]

Probability theory and Statistics: This is critical. Quite a good knowledge of basic probability theory is assumed. Concepts like Random variables, Expectation, Probability density/mass functions, Conditionals, Bayes theorem, etc are discussed throughout the lectures.
Optimization theory: Optimization in the sense of maximizing/minimizing functions is frequently discussed. Other intermittently discussed concepts include gradient descent, KKT conditions, etc.
Calculus: Concepts such as derivates, integrals, partial derivates are discussed often in the context of theoretical proofs.
Real analysis (Optional): Familiarity with real analysis - as always - is useful to understand the theoretical parts fully.
What if I don't meet all pre-requisites?
- If you don't have a good background in probability, unfortunately it is going to be a tough ride throughout. Suggest you to spend sometime to learn it until you are quite comfortable with all the key concepts listed above. Or you may opt to look for a more basic ML course to follow. Just note that you will lack all the theoretical knowledge required to master ML.
- For other pre-requisites, you may be able to manage by referring to Wikipedia. For specific questions in the context of ML, some help is provided. Pls post them in the comments section.

Lecture Details

Lecture 1 - Introduction to Machine Learning / Pattern Recognition

[RR] This lecture gives a general intro to the field. Don't be alarmed by the title Pattern recognition. It is 'essentially' Machine Learning. They say: Computer science community calls it ML whereas the Electrical Engineers name it Pattern Recognition. (Didn't I warn you the field is multidisciplinary?)

Topics discussed include i) The problem & example applications ii) Design of ML/PR system iii) Notation used iv) 2 class problem and M class problem v) Illustration of the 'Spot the Right Candidate' problem vi) Issues in designing classfiers vii) Learning from examples viii) Function learning and examples ix) The key concept of 'Generalization' x) Statistical view of ML/PR xi) Optimal classifier.

Text transcription of the video (PDF)
Slides (PDF)
Questions/Feedback/Comments

Lecture 2 - Overview of Pattern Classifiers

[RR] This lecture gives a broad overview of the classifiers (as defined in last lecture).

This lecture discusses i) the bayes classifier and its optimality, ii) challenges in implementing bayes classifier, iii) loss functions iv) risk minimization v) overview of non-bayes classifiers: nearest neighbour classifier, discriminant functions, linear (vs) non-linear models, neural networks, decision tress, SVM.

Text transcription of the video (PDF)
Slides (PDF)
Questions/Feedback/Comments

(Under construction)
Module2 - Bayesian decision making and Bayes Classifier 
Lecture 3 - The Bayes Classifier for minimizing Risk
Lecture 4 - Estimating Bayes Error; Minimax and Neymann-Pearson classifiers
Module3 - Parametric Estimation of Densities
Lecture 5 - Implementing Bayes Classifier; Estimation of Class Conditional Densities
Lecture 6 - Maximum Likelihood estimation of different densities
Lecture 7 - Bayesian estimation of parameters of density functions, MAP estimates
Lecture 8 - Bayesian Estimation examples; the exponential family of densities and ML estimates
Lecture 9 - Sufficient Statistics; Recursive formulation of ML and Bayesian estimates
Module4 - Mixture Densities and EM Algorithm
Lecture 10 - Mixture Densities, ML estimation and EM algorithm
Lecture 11 - Convergence of EM algorithm; overview of Nonparametric density estimation
Module5 - Nonparametric density estimation
Lecture 11 - Convergence of EM algorithm; overview of Nonparametric density estimation
Lecture 12 - Nonparametric estimation, Parzen Windows, nearest neighbour methods
Module6 - Linear models for classification and regression
Lecture 13 - Linear Discriminant Functions; Perceptron -- Learning Algorithm and convergence proof
Lecture 14 - Linear Least Squares Regression; LMS algorithm
Lecture 15 - AdaLinE and LMS algorithm; General nonliner least-squares regression
Lecture 16 - Logistic Regression; Statistics of least squares method; Regularized Least Squares
Lecture 17 - Fisher Linear Discriminant
Lecture 18 - Linear Discriminant functions for multi-class case; multi-class logistic regression
Module7 - Overview of statistical learning theory, Empirical Risk Minimization and VC-Dimension
Lecture 19 - Learning and Generalization; PAC learning framework
Lecture 20 - Overview of Statistical Learning Theory; Empirical Risk Minimization
Lecture 21 - Consistency of Empirical Risk Minimization
Lecture 22 - Consistency of Empirical Risk Minimization; VC-Dimension
Lecture 23 - Complexity of Learning problems and VC-Dimension
Lecture 24 - VC-Dimension Examples; VC-Dimension of hyperplanes
Module8 - Artificial Neural Networks for Classification and regression
Lecture 25 - Overview of Artificial Neural Networks
Lecture 26 - Multilayer Feedforward Neural networks with Sigmoidal activation functions; 
Lecture 27 - Backpropagation Algorithm; Representational abilities of feedforward networks
Lecture 28 - Feedforward networks for Classification and Regression; Backpropagation in Practice
Lecture 29 - Radial Basis Function Networks; Gaussian RBF networks
Lecture 30 - Learning Weights in RBF networks; K-means clustering algorithm
Module9 - Support Vector Machines and Kernel based methods
Lecture 31 - Support Vector Machines -- Introduction, obtaining the optimal hyperplane
Lecture 32 - SVM formulation with slack variables; nonlinear SVM classifiers
Lecture 33 - Kernel Functions for nonlinear SVMs; Mercer and positive definite Kernels
Lecture 34 - Support Vector Regression and ε-insensitive Loss function, examples of SVM learning
Lecture 35 - Overview of SMO and other algorithms for SVM; ν-SVM and ν-SVR; SVM as a risk minimizer
Lecture 36 - Positive Definite Kernels; RKHS; Representer Theorem
Module10 - Feature Selection, Model assessment and cross-validation
Lecture 37 - Feature Selection and Dimensionality Reduction; Principal Component Analysis
Lecture 38 - No Free Lunch Theorem; Model selection and model estimation; Bias-variance trade-off
Lecture 39 - Assessing Learnt classifiers; Cross Validation;
Module11 - Boosting and Classifier ensembles
Lecture 40 - Bootstrap, Bagging and Boosting; Classifier Ensembles; AdaBoost
Lecture 41 - Risk minimization view of AdaBoost