Course: SML 201

Instructor: Daisy Huang

F 2017

### Description of Course Goals and Curriculum

This course covers the basics of how to conduct data analysis in R. Some of the topics include data visualization, introductory statistics, constructing confidence intervals, hypothesis testing, regressions, and a cross validation. This course does not have any specific prerequisites, but it does help to have some programming experience, since the assignments and projects are all dependent on using R. The course content is suitable for anyone interested in data-driven work in any discipline, so this includes students in engineering, natural sciences, social sciences and humanities. The problem sets and projects include questions that use data from several different fields. This course can be challenging, especially if one does not have any statistics or programming background. However, help is available at office hours and the instructors usually go over the important concepts of R during lecture and precept.### Learning From Classroom Instruction

Lectures were an 1 hour and 20 minutes twice a week, with 50 minute precepts once a week. Attendance at lectures and precepts is important because lectures sometimes had graded iClicker quizzes and attendance at precept was part of the course grade. Lectures generally covered broad concepts about statistics and had demonstrations and examples in R. Precepts went into more depth with specific examples in R that often related to assignments. Precepts had more flexibility since students could sometimes choose which problems to go over. To supplement lectures and precepts, there were also readings posted on Blackboard, some mandatory and some optional. These readings reinforced the concepts presented in lecture and had a lot of overlap with some of the things one may learn in another introductory statistics class. It was very helpful to try to go over some of the precept exercises on my own time, especially since they broke down larger questions into smaller ones and had solutions that I could refer to.### Learning For and From Assignments

The course has 4 assignments and 3 projects. All of these can be done in groups of up to three people, but the groups cannot be the same for each projects. A typical assignment or project consists of 3-4 questions with several parts involving data analysis and visualization of a real dataset. It is suggested that students working in groups on projects and assignments try to go through the questions on their own before answering them in a group, to reinforce individual understanding of concepts. As per the collaboration policy, students have to restrict collaboration to their group or partnership. The main difference between projects and assignments is that the projects in this class are treated like take-home exams. There is minimal help offered for projects, other than broad conceptual help. Other than this, projects are fairly similar to assignments. This class also does not have any exams, since the projects essentially take the place of exams. When working on assignments, a good learning strategy would be to make sure you know how to do each part on your own before collaborating with your group if you have one. It is also a good practice to try debugging your own code first before asking for help at office hours so that you interact with the material more and gain a better understanding of the meaning of your code.### External Resources

The class notes from lectures and precepts are always posted on Blackboard and serve as a great resource for assignments and projects, since they show examples of R code and the resulting output. Office hours for the lecturer and preceptors are also very accessible if students need help understanding concepts or knowing how to do assignments. The course also has a piazza page for students to ask and answer general questions about assignments, but most questions are not allowed to be answered for projects, unless they are very general.### What Students Should Know About This Course For Purposes Of Course Selection

This course is often a requirement for some majors and counts toward the Statistics and Machine Learning certificate. The amount of time commitment per week towards this course varies depending on a student's experience with similar concepts, but it is doable for people from any academic background. There are no exams, just assignments and projects. Lecture is essentially mandatory because of iClicker quizzes, but a few quizzes are dropped to account for hard conflicts that might arise with lecture. For a student who has taken another introductory statistics class, such as ORF 245, some of the concepts might be repetitive, but there are some new concepts in this course that are emphasized in the end. Furthermore, this class places a heavy emphasis on programming through assignments.Introduction to Data Science