CX4240: Introduction to Computational Data Analysis (2019 Summer)
Course Information
- Lecture time: Tuesdays and Thursdays, 2:35pm-4:25pm
- Location: Klaus 2443
- Piazza: https://piazza.com/class/jv16ic5pnsm1f2
Course Overview
This course introduces techniques for computational data analysis, with an emphasis on machine learning algorithms and their applications to real-world data. We will investigate the following question: how to extract useful knowledge from data computationally for decision making and task support? We will focus on on machine learning methods for computational data analysis, which are organized into three parts:
Basic math for data science and machine learning
- Linear algebra
- Probability and statistics
- Information theory
Unsupervised machine learning for data exploration
- Clustering analysis
- Dimension reduction
- Kernel density estimation
Supervised learning for predictive data analysis
- Tree-based models
- Linear classification and regression
- Neural networks
Prerequisites for this course include 1) basic knowledge of probability, statistics, and linear algebra; 2) Basic programming experience in Python.
Schedule
Office Hours and Questions
Office Hours:
- Instructor: Thursdays 1:30-2:20pm (Klaus 1323)
- David Office Hour: Mondays 02:00-03:00pm
- Wendi Office Hour: Tuesdays 10:30-11:20am
- Aradhya Office Hour: Wednesdays 10:30-11:20am
Piazza will be the main place for course discussions and announcements. If you have questions, please ask it on Piazza first because 1) other students may have the same question; 2) you will get help faster compared to sending emails.
If it’s something you do not like to discuss publicly on Piazza, you can use private messaging in Piazza.
Grading
Assignments (50%)
- There will be four assignments. Each one is designed for testing your understanding of the taught algorithms. Assignments will have programming and written analysis.
- You will need to submit all your assignments using ipynb. In ipynb, you can use markdown text editor. Here is a quick guidline how to use markdown in ipynb.
- All assignments follow the “no-late” policy. Assignments received after the due time will receive zero credit.
- All students are expected to follow the Georgia Tech Academic Honor Code.
Project Proposal (5%)
- A project proposal should be just one page
- A project proposal should include:
- Introduction/Background
- Methods
- Potential results
- Discussion
- At lease three referenced papers
- A checkpoint to make sure you are working on a proper machine learning related project.
Project (20%)
- You are expected to complete a project on computational data analysis with real-life data. Your project needs to be clear about 1) the data you are using; 2) the problem you are attempting to solve; 3) the method you are using; 4) the results and conclusion you attain.
- You will need to turn in a GitHub page for your project. The project report and the presentation. The project presentation and report can be combined into one deliverable using a GitHub page. For the project presentation, you just need to scroll down on your GitHub page. .
- Each project needs to be completed in a team of 2-4 people. Team members need to clearly claim their contributions in the project report.
- Each presentation cannot exceed beyond N/A minutes. If your presentation takes more than N/A minutes, you will be asked to stop the presentation at N/A minute mark. There will be N/A minute for Q/A.
- There will be three or more guest professors and PhD students in addition to TAs who will grade your presentations
- Refer to Project hints for your project's template, creating GitHub page, and also some general hints to improve the accuracy of your predictive model.
Class participation (5%)
- Your class participation score will be graded based on attendance and possibly in-class quizzes.
- Participation in class discussions (including asking relevant questions in class, volunteering to answer questions on Piazza) will be considered when determining your final grade. It will be especially useful when you are right on the edge of two letter grades.
Final Exam (20%)
- The final exam will be at assigened date/time for this class.
- The final exam will be a written and open-book exam. No electronic material can be used except calculator. Only paper material can be used in the exam (books, printed notes, etc). It would be better if you prepare a one or two page cheatsheet for yourself (let's save some trees).
- Again, there will be no make-up exams. You will get zero credit for your missed final exam.
Resources
Recommended books:
- Learning from data, by Yaser S. Abu-Mostafa
- Pattern recognition and machine learning, by Christopher Bishop
- Machine learning, by Tom Mitchell
- Data Mining: Concepts and Techniques, by Jiawei Han, Micheline Kamber, and Jian Pei
- The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
- Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.