Lecture 01

Machine Learning (CSCI 5525)

Jan 27th, 2020

General Information


Course Staff

Office Hours

  • Steven Wu: Keller Hall 6-225E
    • Monday 4:00 - 5:00 PM
  • Dai Wei: Keller Hall 2-246
    • Wednesday 1:30 – 2:30 PM


  • Ideally you will have completed
    • CSCI 5521 or
    • equivalently other introduction to machine learning courses.


  • You should also have
    • undergraduate level training or coursework in linear algebra, multivariate calculus, and basic probability and statistics, and
    • programming skills with Python (and Python notebook)


  • Q: “I took Andrew Ng’s ML course online. Then I have the background for this course, right?”



  • Canvas: We will be using Canvas for all assignments and grades. Please also post all questions on Canvas as discussions instead of sending emails.


  • Email: Well, I am bad at emails. Not the best way to reach me. Include the substring “CSCI 5525” to begin a meaningful subject line.

  • Please post questions about course material and homework assignments on Canvas first, and then use emails only after an appropriate amount of time has passed without a response. Please use your UMN email account.



  • Five homeworks with both written and programming components.
  • Late homeworks will not accepted. No grace days!
  • Your lowest homework score will be dropped.
  • Collaboration policy: you can discuss with other students about the homework, but you must write up and code up the solutions on your own! You also must mention the names of the students you discuss with.

Homework: Written components

  • Derivation and understanding of the algorithms

  • Submission guidelines:

    • All submissions in pdf Please type up your written homeworks using LaTeX.
      • LaTeX is a high-quality typesetting system.
    • We will set up OverLeaf templates.
      • Very easy to use!

Homework: Programming components

  • All programming in Python. Please get familiar with:

Grading policy

  • Homework 60% (15% for each homework)
  • Midterm 15%
  • Final 20%
  • Class Participation 5%


UNITE Videos

  • Lectures will be available on UNITE Media Portal (but with a delay)

Flu season

  • If you feel sick, consider skipping the lecture.


What is ML?

  • “Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.” — Wikipedia Page on Machine Learning

Supervised learning

Labeled examples: (x_1, y_1), (x_2, y_2), \ldots (x_n, y_n)

  • each x_i is a feature vector (or representation) of an instance (e.g. image, audio, medical record)

  • each y_i is a task-specific label (e.g. cats versus dogs images, male versus female voices, risk of lung cancer)

Goal: learn a predictor \hat f\colon X \rightarrow Y based on labeled examples, that accurately predicts the labels of future x.

Supervised learning topics

  • Linear regression
  • Logistic regression
  • Support vector machines
    • Constrained optimization, Lagrangian duality
    • Margin maximization
  • Non-linear methods: kernels
  • Neural networks
    • Optimization: (Stochastic) gradient descent

Probably skipping

  • Nearest neighbors
  • Naive Bayes
  • other stuff you might have seen in CSCI5521
  • We will not cover the cutting edge of deep learning. We plan to offer a new course dedicated to deep learning in Spring 2020.

The problem of over-fitting

  • Suppose we observe data (x_1, y_1),\ldots , (x_n, y_n) drawn from a distribution.
  • Train the following predictor: \hat f(X)= \begin{cases} y_i, & \text{if}\ X=x_i \\ \text{"Gopher!"}, & \text{otherwise} \end{cases}


  • \hat f has training error = 0, but errs on every example it hasn’t seen.
    • Well, except for Gopher.
  • How do we formally study this phenomenon?

Machine learning theory

  • Generalization
    • Complexity measures of function classes
      • VC dimension
      • Rademacher complexity
  • Tools: concentration of measure
    • Chernoff bounds

Ensemble methods

Turning weak learners into strong learners.

  • Boosting method
    • Adaboost
  • Bagging = Bootstrap aggregating
    • Random Forest

Generative models

  • Variational Autoencoders (VAE)
  • Generative adversarial nets (GANs)

Interactive learning

  • Online learning
    • sequential decision-making
    • e.g. traffic routing, portfolio optimization
  • Multi-armed bandit learning
    • e.g. clinical trials, online advertising, contents recommendation
  • Reinforcement learning
    • learner interacts with the environment
    • e.g. video games, educational software, healthcare decision making, robotics or people-facing applications


  • Most lectures are paired with a reading.
  • These are optional and classes will not exactly follow the readings, but you will get more out of the lectures if you skim the readings beforehand (or afterwards).


Should I buy the books?

  • You are welcome to buy physical copies if you wish—they’re good books!
  • But the online versions will suffice for this course.