Machine Learning for Predictive Data Analytics

Course: ELE364

Instructor: Dr. Niraj Jha

F 2019

Description of Course Goals and Curriculum

The goal of the course is to provide an introduction to machine learning for predictive data analytics. The topics discussed in the course include data exploration (data quality, sampling, normalization, etc.), information-based learning (decision trees, feature selection, boosting and bagging, etc.), similarity- based learning (nearest neighbor algorithm, feature space, etc.), probability-based learning (Bayesian prediction, Bayesian networks, etc.), error-based learning (linear regressions, SVM, etc.), and evaluation (measuring accuracy, misclassification rate, etc.). There is a balanced mix between theory and application, as many of the mathematical concepts discussed in class are later implemented in programming projects (using Jupyter Notebook and Python-based libraries). Overall, the course is a broad introduction to many machine learning concepts used in predictive data analytics.

Learning From Classroom Instruction

The classroom instruction for this course only consists of lectures (twice a week for 80 minutes). The lectures are mandatory, since there are weekly quizzes administered at the start of each lecture. The lectures are based off of the lecture slides, which are closely correlated to the textbook. Even though the lectures may seem repetitive, it is important to pay attention in class because occasionally Professor Jha will cover material (proofs and other extensions) not included in the lecture notes and slides that will appear on the exams. Students are encouraged, but not required, to participate during lectures. It is recommended for students to take detailed notes during class, especially on material not covered in the slides, since this will help with exam questions and greatly reduce the time it takes to complete the homework assignments.

Learning For and From Assignments

The weekly quizzes make up 10% of the final grade. The quizzes are usually only one problem each and are meant to be completed in 15 minutes. These are open book quizzes, but students are recommended to at least skim the text before each quiz. The quizzes vary in difficulty, but the majority of them are closely based on the reading. The quizzes at the beginning of the semester are more conceptual, but the ones at the end are more mathematical.

The midterm makes up 20% of the final grade. The midterm from F2019 was probably more challenging than it was in previous years. Many of the questions were proof based and were not covered in class. Therefore, the class average for this midterm exam was quite low. (However, I have heard that in previous semesters, the midterm exam was much easier and more straightforward.) A background in mathematical proofs is helpful. Additionally, some questions on the exam were small details covered only once in class, so it is important to take good notes.

The homework makes up 20% of the final grade. The homework is generally straightforward, and most answers can be found in the textbook. Oftentimes the homework is more application-based than theoretical. For example, the homework will ask you to complete some iterations of an algorithm. Students are recommended to work with other peers, since it is easy to make small errors on the sometimes tedious calculations.

The small course projects make up 20% of the final grade. These are very basic programming assignments done on Jupyter Notebook using Python-based libraries (mostly scikit-learn). For more experienced coders, these programming assignments can take less than 10 minutes each, but for students less familiar with coding these can take longer. Be careful of small tricks in the programming assignments though; read the instructions carefully!

The final exam makes up 30% of the final grade. The final was more straightforward than the midterm, and consisted of a combination of theory based and application based questions. This exam did not have proof questions. I would recommend reading the textbook very carefully and memorizing as many details as possible, since some of the tested concepts were based on small details in the text.

External Resources

I would recommend attending office hours for help on homework or the programming assignments. Additionally, since many of the topics discussed in this course are fundamental machine learning concepts, it is easy to find online tutorials and videos. Especially for the programming assignments, one will need to refer to the scikit-learn website for information on how to use their packages.

What Students Should Know About This Course For Purposes Of Course Selection

I would recommend this course to anyone who is interested in the basics of machine learning. The prerequisite for the course is a “background in linear algebra, probability, statistics, and differential/integral calculus.” Students with a more quantitative background have an advantage, but there were students from non-STEM majors taking the course as well. This course relates well to the COS and ORF department, and counts as an elective for the SML certificate. If you are interested in machine learning but don’t have much experience or knowledge of its fundamentals, I would recommend taking this course before moving onto more advanced courses. However, if you are already familiar with all the topics listed on the syllabus, I would suggest taking other machine learning courses in the COS department, as you won’t gain much from this course and will likely find it too elementary.