CX4240: Introduction to Computational Data Analysis (2019 Spring)

Course Information

Course Overview

This course introduces techniques for computational data analysis, with an emphasis on machine learning algorithms and their applications to real-world data. We will investigate the following question: how to extract useful knowledge from data computationally for decision making and task support? We will focus on on machine learning methods for computational data analysis, which are organized into three parts:

  1. Basic math for data science and machine learning

    • Linear algebra
    • Probability and statistics
    • Information theory
  2. Unsupervised machine learning for data exploration

    • Clustering analysis
    • Dimension reduction
    • Kernel density estimation
  3. Supervised learning for predictive data analysis

    • Tree-based models
    • Linear classification and regression
    • Neural networks

Prerequisites for this course include 1) basic knowledge of probability, statistics, and linear algebra; 2) basic programming experience, preferably in Python.

Schedule

Date Topic Assignment Due Readings
1/7/19 Course Overview Piazza Signup GT Honor Code
1/9/19 Math Basics: Linear Algebra Linear Algebra Review by Zico Kolter
1/14/19 Math Basics: Probability and Statistics Probability Theory Review by Andrew Moore
1/16/19 Math Basics: Information Theory AS1 Out Visual Information Theory by Chris Olah
1/21/19 No Class (Martin Luther King Day)
1/23/19 Data Analysis Toolbox NumPy Tutorial; Matplotlib Tutorial
1/28/19 Clustering Analysis and K-Means AS1 Due
1/30/19 Hierarchical Clustering
2/4/19 Density-Based Clustering Start working on Project Proposal and team members Jupyter Notbook (Kmeans and DBSCAN); The Heilmeier Catechism; Project Examples; seaborn: statistical data visualization;
2/6/19 Gaussian Mixture Model AS2 Out
2/11/19 Evaluation of Clustering Algorithms
2/20/19 Midterm Review AS2 Due - Two more days extension; Project Proposal Due
2/25/19 Midterm Exam
2/27/19 Density Estimation Python Noteebook Example
3/4/19 Dimension Reduction Feature extraction using PCA; PCA for images;
3/6/19 Linear Regression
3/11/19 Linear Regression AS3 Out
3/13/19 Naïve Bayes and Logistic Regression Project hints
3/18/19 No Class (Spring Break)
3/20/19 No Class (Spring Break)
3/25/19 Neural Networks (Guest Lecture)
3/27/19 Neural Networks (Forward pass and Back propagation) AS3 Due
4/1/19 Support Vector Machine
4/3/19 Support Vector Machine AS4 Out
4/8/19 Decision Tree and Random Forest
4/10/19 Decision Tree and Random Forest
4/15/19 Project Presentation
4/17/19 Project Presentation AS4 Due
4/22/19 Course Review
4/24/19 Reading Day Report Due
TBD Final Exam

Office Hours and Questions

  • Office Hours:

    • Instructor: Weds 3:30-4:20pm
    • TA Office Hour I: Mons 3:30-4:20pm
    • TA Office Hour II: Thurs 10:30-11:20am
  • Piazza will be the main place for course discussions and announcements. If you have questions, please ask it on Piazza first because 1) other students may have the same question; 2) you will get help faster compared to sending emails.

  • If it’s something you do not like to discuss publicly on Piazza, send an email with CX4240 in the subject.

Grading

  • Assignments (50%)

    • There will be four assignments. Each one is designed for testing your understanding of the taught algorithms. It could be either programming or written analysis.
    • You will need to hand in the assignments at the beginning of the class on the due date.
    • All assignments follow the “no-late” policy. Assignments received after the due time will receive zero credit.
    • All students are expected to follow the Georgia Tech Academic Honor Code.
  • Project Proposal (5%)

    • A project proposal should be just one page
    • A project proposal should include 1)Introduction/Background 2)Methods 3)Potential results 4)Discussion 5)At lease three refrenced papers
    • A checkpoint to make sure you are working on a proper and machine learning related project
  • Project (20%)

    • You are expected to complete a project on computational data analysis with real-life data. Your project needs to be clear about 1) the data you are using; 2) the problem you are attempting to solve; 3) the method you are using; 4) the results and conclusion you attain.
    • You will need to turn in a project report and also give an in-class presentation for your project. The project report and the presentation will each count for 10% of your final grade. The project presentation and report can be combined into one deliverable using a GitHub page.
    • Each project needs to be completed in a team of 2-4 people. Team members need to clearly claim their contributions in the project report.
    • Each presentation cannot exceed beyond 5 minutes. If your presentation takes more than 5 minutes, you will be asked to stop the presentation at 5 minute mark. There will be 1 minute for Q/A.
    • There will be three or more guest professors and PhD students in addition to TAs who will grade your presentations
    • Refer to Project hints for your project's template and also some hints to improve the accuracy of your predictive model
  • Class participation (5%)

    • Your class participation score will be graded based on attendance and in-class quizzes.
    • Participation in class discussions (including asking relevant questions in class, volunteering to answer questions on Piazza) will be considered when determining your final grade. It will be especially useful when you are right on the edge of two letter grades.
  • Midterm Exam (10%)

    • The midterm exam will take place on Feb 25th in lieu of the regular class.
    • The midterm exam will be a written and open-book exam. No electronic material can be used except calculator. Only paper material can be used in the exam (books, printed notes, etc). It would be better if you prepare a one or two page cheatsheet for yourself.
    • There will be no make-up exams. You will get zero credit for your missed midterm exam.
  • Final Exam (10%)

    • The final exam will be at whatever time is scheduled for this class.
    • The final exam will be a written and open-book exam. No electronic material can be used except calculator. Only paper material can be used in the exam (books, printed notes, etc). It would be better if you prepare a one or two page cheatsheet for yourself.
    • Again, there will be no make-up exams. You will get zero credit for your missed final exam.

Resources

Recommended books:

Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.