CS 4641/7641: Machine Learning (Summer 2020)

Course Information

Instructor:
Mahdi Roozbahani
Head TA:
Ruijia Wang
(rwang@gatech.edu)
TA:
Rodrigo Borela
(rborelav@gatech.edu)
TA:
Kevin Tynes
(kdtynes@gatech.edu)
TA:
Danrong Zhang
(dzhang373@gatech.edu)
TA:
Moamen Soliman
(soliman@gatech.edu)

Course Overview

This course introduces techniques in machine learning with an emphasis on algorithms and their applications to real-world data. We will investigate the following question: how to computationally extract useful knowledge from data for decision making and task support? We will focus on machine learning methods, which are organized into three parts:

  1. Basic math for data science and machine learning

    • Linear algebra
    • Probability and statistics
    • Information theory
  2. Unsupervised machine learning for data exploration

    • Clustering analysis
    • Dimension reduction
    • Kernel density estimation
  3. Supervised learning for predictive data analysis

    • Tree-based models
    • Linear classification and regression
    • Neural networks

Prerequisites for this course include 1) basic knowledge of probability, statistics, and linear algebra; 2) Basic programming experience in Python, especially Jupyter Notebook.

Schedule

Date Topic Assignment Due Readings
May 11, 2020 Course Overview;
Math Basics: Linear Algebra;
Class Video Lecture
Piazza Signup GT Honor Code
May 13, 2020 Math Basics: Linear Algebra;
Class Notes;
Class Video Lecture;
Correlation vs Covariance
Linear Algebra Review by Zico Kolter
May 18, 2020 Math Basics: Probability and Statistics;
Class Notes;
Class Video Lecture;
AS1 Out Probability Theory Review by Andrew Moore;
The Differences Between Data, Information and Knowledge;
May 20, 2020 Math Basics: Information Theory;
Class Notes;
Class Video Lecture;
The Differences Between Data, Information and Knowledge;
May 25, 2020 No Class - Memorial Day Team Creation Due
May 27, 2020 Data Analysis Toolbox; Project Information;
Class Notes-Info theory and optimization;
Class Video Lecture;
As1 Due (Updated - 29th) Visual Information Theory by Chris Olah;
Spring 2020 projects;
KKT for inequality constrained optimization;
GitHub Pages;
YAML Configuration;
NumPy Tutorial;
Matplotlib Tutorial;
Project Examples;
seaborn: statistical data visualization;
June 01, 2020 Clustering Analysis and K-Means;
K-Means Class Notes;
Hierarchical Clustering;
Hierarchical Class Notes;
Class Video Lecture;
Curse of dimensionality (Euclidean space example);
Jupyter Notbook (Kmeans and DBSCAN);
Understanding the concept of Hierarchical clustering Technique;
June 03, 2020 Gaussian Mixture Model;
Class Notes;
Class Video Lecture;
AS2 Out
Start working on Project Proposal!
GitHub Student Application;
Jupyter Notbook (Kmeans and DBSCAN);
The Heilmeier Catechism;
Overleaf for GT students;
June 08, 2020 Density-Based Clustering;
Class Notes;
Class Video Lecture;
June 10, 2020 Evaluation of Clustering Algorithms;
Class Notes;
Class Video Lecture;
June 15, 2020 Density Estimation;
Class Notes;
Class Video Lecture;
Proposal: 3 minutes recorded video and one page report due KDE interactive visualization ;
KDE sampling ;
KDE SKLearn and sampling ;
Jupyter Notebook Kernel Density Example;
June 17, 2020 Dimension Reduction;
Class Notes;
Class Video Lecture;
Image reconstruction using PCA ;
Feature extraction using PCA ;
PCA for images ;
PCA as linear combination of features ;
PCA and Linear Discriminant Analysis ;
June 22, 2020 Linear Regression;
Class Notes;
Class Video Lecture;
AS2 Due Simple Linear Regression in Matrix Format;
Adding Noise to Regression Predictors;
June 24, 2020 Regularization and Linear Regression;
Class Notes;
Class Video Lecture;
AS3 Out
June 29, 2020 Naïve Bayes and Logistic Regression;
Class Notes;
Class Video Lecture;
July 01, 2020 Decision Tree;
Decision Tree Class Notes;
Ensemble Learning and Random Forest;
Ensemble Learning Class Notes;
Class Video Lecture;
July 06, 2020 Support Vector Machine;
Support Vector Machine Class Notes;
Class Video Lecture;
KKT and SVM;
July 08, 2020 Kernel Method \ SVM;
Kernel Method \ SVM Class Notes;
Class Video Lecture;
AS4 Out - Updated (Out at July 10th) AS3 Due - Updated (Due at July 10th)
July 13, 2020 Neural Networks(Forward pass and Back propagation);
Class Notes;
Class Video Lecture;
NN Playground ;
The role of a hidden layer;
Back propagation numerical example;
More detailed introduction;
July 15, 2020 Neural Networks and Deep learning (CNN);
Class Notes;
Class Video Lecture;
All projects (GitHub links and 7 minutes BlueaJeans recorded link) should be submitted by the end of the day 11:59 pm CNN Live Demo;
A guide to an efficient way to build CNN and optimize its hyper-parameters;
Back Propagation in CNN;
Transfer learning in CNN;
Project Scoring Guidance;
Project Page and Video
July 20, 2020 Course Review;
Class Notes;
Class Video Lecture;
AS4 Due (You are allowed to resubmit your AS4 by the end of the day July 26 without any penalty)

Office Hours and Questions

  • Office Hours:

    • Instructor: I will stay on BlueJeans after each class
    • Ruijia: Friday - 4:30 to 5:30 pm
    • Danrong: Tuesday - 3:30 to 4:30 pm
    • Kevin: Monday - 12:30 pm to 01:30 pm
    • Rodrigo: Wednesday - 3:30 to 4:30 pm
    • Moamen: Thursday - 3:30 to 4:30 pm
    • We will have one-on-one office hours starting next week. Please follow the instruction on this Google Sheet to assign yourself for a 10 minutes slot of a BlueJeans video-meeting. If you require more than 10 minutes, please ask TAs that you will be waiting so they come back to your BlueJeans meeting once they are done with other students. You just need to add your name, question of interest and your BlueJeans meeting link. Please do not change the other part of the Google Sheet. Please do not enter to other students' BlueJeans meeting. The meeting is just for the student who created the link and relevant TA.

  • Piazza will be the main place for course discussions and announcements. If you have questions, please ask it on Piazza first because 1) other students may have the same question; 2) you will get help much faster.

  • If it’s something you do not like to discuss publicly on Piazza, you can use private messaging on Piazza.

  • Anytime you want to send a private message just to me on Piazza, please make sure to add our HEAD TA too in case I may miss your message.

Grading

  • Assignments (50%)

    • There will be four assignments. Each one is designed for testing your understanding of the taught algorithms. Assignments will have programming and written analysis.
    • You will need to submit all your assignments using ipynb. In ipynb, you can use markdown text editor. Here is a quick guideline how to use Markdown in ipynb.
    • You are required to use Markdown and Latex for the written questions.
    • All assignments follow the “no-late” policy. Assignments received after the due time will receive zero credit.
    • There are some bonus questions in assignments for Undergrad students. The bonus questions are required to be answered for all Grad students and they are not considered as bonus points.
    • All students are expected to follow the Georgia Tech Academic Honor Code.
    • You can easily export your Jupyter Notebook to a Python file and import that to your desired python IDE to debug your code for assignments.
    • You are not allowed to share any assignment codes or answers with other students whatsoever. Piazza is the best place to have discussion regarding assignments. Discussions are just for the better understanding of questions and should not directly answer the questions.
  • quizzes (10%)

    • There will be about 15 quizzes throughout the semester.
    • We will consider your top 10 quizzes's score. Each quiz will have 1% of your final score.
    • When we finish a topic, a quiz will be given on Canvas. For example, when we are done with the logistic regression,we will have a quiz.
    • Quizzes are time sensitive. You need to study the course lectures in order to receive a good grade.
    • Quizzes measure your understanding of the topics and they will be more conceptual questions.
  • Project Proposal (10%)

    • A project proposal should be just one page pdf (less than 500 words single spaced)
    • A project proposal should include:
      • Introduction/Background
      • Methods
      • Potential results
      • Discussion
      • At least three references (preferably peer reviewed)
    • A checkpoint to make sure you are working on a proper machine learning related project.
    • Your group needs to submit a presentation of your proposal. Please provide us a public link which includes a 3 minutes recorded video.
  • Project (25%)

    • You are expected to complete a project on machine learning with real-life data. Your project needs to be clear about 1) the data you are using; 2) the problem you are attempting to solve; 3) the method you are using; 4) the results and conclusion you attain.
    • You will need to turn in a GitHub page for your project. The project presentation and report must be combined into one deliverable using a GitHub page. For the project presentation, you just need to scroll down on your GitHub page when you present your project (make sure you have visible images and graph).
    • Each project needs to be completed in a team of 5 people (you will be forming your team on your own. In case you can't find any team, we will randomly assign you a team). Team members need to clearly claim their contributions in the project report.
    • Each recorded presentation cannot exceed beyond 7 minutes. If your presentation takes more than 7 minutes, you will lose points. You need to submit a public link for your recorded video.
    • Refer to Project hints for your project's template, creating GitHub page, and also some general hints to improve the accuracy of your predictive model.
    • If you are in a Grad students team, you are required to have both unsupervised and supervised learning in your project.
    • Google colaboratory allows free access to run your Jupyter Notebook. I strongly suggest to use it for your project, specially for teams that are going to employ Deep Learning.
  • Class participation (5%)

    • Attendance is Mandatory
    • Your class participation score will be graded based on attendance and possibly in-class quizzes. For some lectures, I will take attendance using CANVAS. Also BlueJeans provides us the list of students who atteneded the whole lecture.
    • Participation in class discussions (including asking relevant questions in class, volunteering to answer questions on Piazza) will be considered when determining your final grade. It will be especially useful when you are right on the edge of two letter grades.
  • Bonus points

    • About Bonus points: Bonus points will be counted to always be beneficial for your final grade. What do I mean by that? it means that if for some reasons I may need to curve the grades, bonus points will be applied to your grade after curving not before curving.
    • Undergrads and grads: Piazza has statistics which give us many measurements regarding how much a student has been involved on Piazza's activities such as viewing posts, answering questions, asking questions and so on. Not only we use this to account for a minor part of the Class Participation score, we will use the statistics to give students bonus points. Bonus points will be applied to students who answer the other students' questions correctly. At the end of the semester, we will define a minimum and maximum number of involvement considering all the students, and based on those, some students will receive at most 3% bonus points. It is possible to receive less than 3% bonus based on your activities on Piazza. The otehr way to achieve upto 3% bonus points would be answering the challenging questions we may have in some of the hws.
    • Undergrads: As you all know, we have bonus points for hws. Bonus points will be different for different hws.For example, hw 1 may have 30 bonus points, hw 2 may have 20 bonus points and so on. If you receive all the bonus points for all your hws, we will add %5 to your final grade. Note that these are different than the challenging questions. Challenging questions are bonus for both grad and undergrad.
  • Course Policy

    • Regrade policy: Disputes of grading on assignments, exams, and project must be discussed within one week of their return or grade posting. Should you find yourself having an issue with a grade, contact the TA first. After you talk with your TA, if you are not satisfied you may contact the course instructor.
    • Assignment extension: Assignments are designed in a way that students would have several weeks to finish them. Assignments will not be extended under any circumstances whatsoever. You will receive 0 credit if you submit an assignment after the deadline.
    • Missed exam policy: There will be no makeups for missed exams. Any request for exceptions to this policy should be made in advance when at all possible. Requests should be due to incapacitating illness, emergency such as death in the family or something similarly serious, and should be accompanied by supporting documentation and to be submitted to Dean of Students. Excuses such as not being aware of the exam will not be considered.

Dataset Ideas (may need API, or scraping) Thanks to Polo

Office of Disability Services

The Georgia Institute of Technology has policies regarding disability accommodation, which are administered through The Office of Disability Services: http://disabilityservices.gatech.edu. For students with disabilities, please contact this Office to request classroom accommodations.

Resources

Recommended books:

Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.

Grade Calculator


Are you an undergrad or gard student?
Undergrad Grad

Assignments
Grade Undergrad bonus Genaral bonus
HW1
HW2
HW3
HW4

Project/Exam/Participation
Grade
Project Proposal
Project
Midterm
Final
Class Participation




Assignemnt Score:

Undergrad Bonus Score:

General Bonus Score:

Project Proposal:

Project:

Midterm Exam:

Final Exam:

Class Participation:

Final Grade: