CX4240: Introduction to Computational Data Analysis (2019 Spring)
Course Information
- Lecture time: Mons and Weds, 4:30pm-5:45pm
- Location: Klaus 1447
- Instructor: Mahdi Roozbahani
- Teaching Assistant: Hanjun Dai (hanjundai@gatech.edu) and Wendi Ren (wren44@gatech.edu)
- Piazza: https://piazza.com/class/jqeo3f7s5vc426
Course Overview
This course introduces techniques for computational data analysis, with an emphasis on machine learning algorithms and their applications to real-world data. We will investigate the following question: how to extract useful knowledge from data computationally for decision making and task support? We will focus on on machine learning methods for computational data analysis, which are organized into three parts:
Basic math for data science and machine learning
- Linear algebra
- Probability and statistics
- Information theory
Unsupervised machine learning for data exploration
- Clustering analysis
- Dimension reduction
- Kernel density estimation
Supervised learning for predictive data analysis
- Tree-based models
- Linear classification and regression
- Neural networks
Prerequisites for this course include 1) basic knowledge of probability, statistics, and linear algebra; 2) basic programming experience, preferably in Python.
Schedule
Office Hours and Questions
Office Hours:
- Instructor: Weds 3:30-4:20pm
- TA Office Hour I: Mons 3:30-4:20pm
- TA Office Hour II: Thurs 10:30-11:20am
Piazza will be the main place for course discussions and announcements. If you have questions, please ask it on Piazza first because 1) other students may have the same question; 2) you will get help faster compared to sending emails.
If it’s something you do not like to discuss publicly on Piazza, send an email with CX4240 in the subject.
Grading
Assignments (50%)
- There will be four assignments. Each one is designed for testing your understanding of the taught algorithms. It could be either programming or written analysis.
- You will need to hand in the assignments at the beginning of the class on the due date.
- All assignments follow the “no-late” policy. Assignments received after the due time will receive zero credit.
- All students are expected to follow the Georgia Tech Academic Honor Code.
Project Proposal (5%)
- A project proposal should be just one page
- A project proposal should include 1)Introduction/Background 2)Methods 3)Potential results 4)Discussion 5)At lease three refrenced papers
- A checkpoint to make sure you are working on a proper and machine learning related project
Project (20%)
- You are expected to complete a project on computational data analysis with real-life data. Your project needs to be clear about 1) the data you are using; 2) the problem you are attempting to solve; 3) the method you are using; 4) the results and conclusion you attain.
- You will need to turn in a project report and also give an in-class presentation for your project. The project report and the presentation will each count for 10% of your final grade. The project presentation and report can be combined into one deliverable using a GitHub page.
- Each project needs to be completed in a team of 2-4 people. Team members need to clearly claim their contributions in the project report.
- Each presentation cannot exceed beyond 5 minutes. If your presentation takes more than 5 minutes, you will be asked to stop the presentation at 5 minute mark. There will be 1 minute for Q/A.
- There will be three or more guest professors and PhD students in addition to TAs who will grade your presentations
- Refer to Project hints for your project's template and also some hints to improve the accuracy of your predictive model
Class participation (5%)
- Your class participation score will be graded based on attendance and in-class quizzes.
- Participation in class discussions (including asking relevant questions in class, volunteering to answer questions on Piazza) will be considered when determining your final grade. It will be especially useful when you are right on the edge of two letter grades.
Midterm Exam (10%)
- The midterm exam will take place on Feb 25th in lieu of the regular class.
- The midterm exam will be a written and open-book exam. No electronic material can be used except calculator. Only paper material can be used in the exam (books, printed notes, etc). It would be better if you prepare a one or two page cheatsheet for yourself.
- There will be no make-up exams. You will get zero credit for your missed midterm exam.
Final Exam (10%)
- The final exam will be at whatever time is scheduled for this class.
- The final exam will be a written and open-book exam. No electronic material can be used except calculator. Only paper material can be used in the exam (books, printed notes, etc). It would be better if you prepare a one or two page cheatsheet for yourself.
- Again, there will be no make-up exams. You will get zero credit for your missed final exam.
Resources
Recommended books:
- Learning from data, by Yaser S. Abu-Mostafa
- Machine learning, by Tom Mitchell
- Pattern recognition and machine learning, by Christopher Bishop
- Data Mining: Concepts and Techniques, by Jiawei Han, Micheline Kamber, and Jian Pei
- The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
- Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.