Unformatted text preview:

Summary Below is the syllabus for the SPRING 2021 offering of the course. Fall 2021 will be similar, though the class will be offered in person only. INFO 2950 is an intro-level information science course on the foundations of data science. It covers topics including the standard Python data science stack, univariate and multivariate statistical analysis of small and medium-size datasets, regression methods, hypothesis testing, probability models, basic supervised and unsupervised machine learning, data visualization, and network analysis. Student who complete the course will be able to produce meaningful, data-driven analyses of real-world problems and will be prepared to begin more advanced work in data-intensive domains. Texts There is no required textbook for this class, though individual readings may be assigned throughout the semester. If you want additional information about course topics, we recommend the following books and resources: • General introduction to Python: John Guttag, Introduction to Computation and Programming in Python • General introduction to statistics: Allen B. Downey, Think Stats • Principles of data science: Joel Grus, Data Science from Scratch • Python data science stack: Jake VanderPlas, Python Data Science Handbook • Ethical issues in data science: Princeton Dialogues on AI and Ethics (case studies) Schedule Week 1 (beginning Monday, Feb 8) Intro and setup. HW 0 released. • Reading: If you aren't familiar with Jupyter notebooks and JupyterLab, review the JupyterLab documentation before section on Friday. • Monday (2/8): Intro • Wednesday (2/10): More intro, advice, notebooks, data types Week 2 (Feb 15) Dataframes and Pandas. HW0 due (2/18), project phase 0 due (2/17), HW 1 released. • Reading: 10 minutes to Pandas • Monday (2/15): Toward Pandas • Wednesday (2/17): PandasWeek 3 (Feb 22) Summary statistics, grouping, basic visualization. HW 1 due, project phase I due, HW 2 released. • Monday (2/22): COVID data case study • Wednesday (2/24): Projects, COVID II, Avocados Week 4 (March 1) Correlation and covariance, transformations, joining data. HW 2 due, HW 3 released. • Reading: Downey, chapters 2 (distributions) and 7 (relationships between variables). See "Texts" section above for link. Note: Read for the statistical concepts, not Downey's code. We will use standard Pandas and NumPy functions for our work. • Monday (3/1): Covariance and correlation • Wednesday (3/3): Correlation (continued) and bias Week 5 (March 8) Distance, similarity, clustering. No class on Wednesday. HW 3 due (Saturday, 3/13, at 11:59, due to wellness days), no new HW released (project work during section). • Reading (optional): Python Data Science Handbook on k-means clustering • Monday (3/8): Bias (continued), clustering • Wednesday (3/10): No class - wellness day Week 6 (March 15) Linear regression. HW 4 released, project phase II due. • Monday (3/15): Clustering wrap-up, intro to linear regression • Wednesday (3/17): Linear regression II Week 7 (March 22) Multiple regression. HW 4 due, HW 5 released, project phase II peer review due. • Monday (3/22): Model evaluation, multiple linear regression • Wednesday (3/24): Binary inputs, collinearity, logistic regression Week 8 (March 29) Hypothesis testing. HW 5 due, HW 6 released. • Monday (3/29): Logistic regression• Wednesday (3/31): Model evaluation • Friday (4/2): Supplemental lecture videos to watch before Monday, 4/5 [code]: o Classification reports and why we model data o Permutation and p-values o Bootstrap resampling and confidence intervals o (Optional) How to read documentation Week 9 (April 5) Probabilistic models and simulation. HW 6 due, HW 7 released. • Monday (4/5): Probabilistic models • Wednesday (4/7): Model selection, Bayes' rule Week 10 (April 12) Dimension reduction, matrix decomposition. HW 7 due, project work during section, project phase III due, no new HW released. • Monday (4/12): Hypothesis tests and distributions • Wednesday (4/14): Dimension reduction and matrix factorization Week 11 (April 19) Supervised learning and text as data. No section on Friday (wellness day), no new homework for remainder of the semester. • Monday (4/19): Conclude dimension reduction and matrix factorization • Wednesday (4/21): Text as data, Bayesian classifiers Week 12 (April 26) Text as data. No lecture on Monday (wellness day). Project phase IV due. Project and review work in section. • Wednesday (4/28): Text as data II Week 13 (May 3) Networks. Project phase IV peer review due. Project work in section. • Monday (5/3): Networks I • Wednesday (5/5): Networks II Week 14 (May 10)Wrapup. Project phase V (final submission) due. No section on Friday. Course concludes (no final exam). Policies Harassment and respect All students are entitled to respect from course staff and from their fellow students. All staff are entitled to respect from students and from fellow staff members. Violations of this principle, whether large or small, will not be tolerated. Respect means that your ideas are taken seriously, that you feel welcome in class settings (including in study groups and online fora), and that you are treated as a full, co-equal member of the class. Harassment describes any action, intentional or otherwise, that abridges the respect owed to every member of the class. If you experience harassment in any form, or if you would like to discuss your experience in the class, please see me in office hours or contact me by email. The university also has reporting and counseling resources available, including those for sexual harassmentLinks to an external site. and for other bias incidentsLinks to an external site.. Academic integrity Each student in this course is expected to abide by the Cornell University Code of Academic IntegrityLinks to an external site.. Any work submitted by a student in this course for academic credit will be the student's own work unless specifically and explicitly permitted otherwise. Using other people's code is an important part of programming but, for group projects, the code should be substantially the work of the group members (except for standard libraries). Any code used in projects that was not written by the group members should be placed in separate files and clearly labeled with their source URLs. If you have benefitted from online resources such as StackOverflow,


View Full Document

City Tech INFO 2950 - Syllabus

Download Syllabus
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Syllabus and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Syllabus 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?