In the increasingly data-rich and data-driven world, graduate students need to know how to make decisions grounded in data science foundations. The Purdue College of Science, in a joint effort by the departments of Computer Science, Statistics, and Mathematics, is currently developing a core of seven five-week, online, 1-credit modules covering data science foundational topics. Additionally, a 1-credit Ethics of Data Science module, developed and offered by the Department of Philosophy, will complete the 8-credit program. We refer to the 8 modules as *connector modules* as they teach material from core disciplines that is essential preparation for students to pursue domain-specific, data science research and to enroll in data science graduate courses.

Data Science uses quantitative and analytical methods to help gain insights and predictions based on big data. With vast amounts of data being generated in every domain, the job market demand in data scientists is growing fast. The Data Science Connector modules are designed to connect your training to Data Science offering the foundations that are needed in order to jumpstart a career as a Data Scientist in your field of expertise. All 8 courses can be completed within 10 weeks. Alternatively, choose the specific courses to meet your training needs.

**Intended audience:** Purdue graduate students with an undergraduate degree in a STEM field or with business and programming experience.

**Intended outcome:** After successful completion of a combination of connector modules, students will be prepared to engage in data-science oriented research within their domain. They may be able to pursue additional graduate-level data science courses.

## Courses

### June 3 - July 5, 2019

#### STAT 59800 PS: Probability and Statistics

Lectures by Dr. Leonore Findsen

**Topics**: Intro to probability and statistics; random variables and distributions; exploratory data analysis and statistical inferences; R.

**Prerequisite:** MA 261 Multivariate Calculus

#### MA 59800: Linear Algebra for Data Science

Lectures by Professor Jim McClure

**Topics**: Fundamentals of linear algebra for data science and data science applications. Includes matrix operations, eigenvalues and diagonalization, orthogonality and least squares.

**Prerequisite:** Basic knowledge of vectors (e.g. MA 16200 or MA 16600 Calculus II)

#### CS 59000 FCS: Foundations of Computer Science

**Instructor:** Ruby Tahboub, rtahboub@purdue.edu

**Topics:** Basic logic and proof methods; recursion and induction. Introduction to data structures. Sorting, searching, basic graph algorithms. Algorithm design and analysis techniques.

**Prerequisite:** Familiarity with Calculus at the level of MA 161 or MA 165 or equivalent

Lectures by Professor Ananth Grama

#### CS 59000 DEI: Data Engineering I

**Instructor:** Tony Bergstrom, bgstm@purdue.edu

**Topics:** Basic data manipulation; review of Python; introduction to Unix scripts; data cleaning; dealing with missing data; summarizing data.

**Prerequisites:**

- Programming experience in Python equivalent to material covered in CS 177, CS 501 or MGMT 58600.
- Familiarity with Calculus at the level of Math 161 or Math 165 or equivalent.
- MA 59800 Linear Algebra for Data Science or familiarity with Linear Algebra, in particular exposure to matrix multiplication and Gaussian elimination as covered in Math 262 or equivalent.

Lectures by Professor Jennifer Neville

### July 8 - August 9, 2019

**CS 59000 DEII: Data Engineering II**

**Instructor:** Tony Bergstrom, bgstm@purdue.edu

**Topics:** Relational databases; SQL; introduction to No-SQL systems; introduction to cloud computing; data security and privacy; access control; indexing.

**Prerequisites:**

- CS 59000FCS: Foundations of Computer Science (see above)
- CS 59000DEI: Data Engineering I (see above)
- Programming experience in an object-oriented language (Java, C++) and basic understanding of common data structures. Equivalent to material covered in CS 180.

Lectures by Professor Sunil Prabhakar

#### CS 59000 FDM: Foundations of Decision Making

**Instructor:** Dr. Romila Pradhan, rpradhan@purdue.edu

**Topics:** Sampling and reproducibility. Hypothesis testing (A/B testing), multiple hypothesis testing, data visualization, fairness and data biases.

**Prerequisites:**

- STAT 59800PS: Probability and Statistics (see above)
- MA 59800: Linear Algebra for Data Science (see above)
- CS 59000FCS: Foundations of Computer Science (see above)
- CS 59000DEI: Data Engineering I (see above)

Lectures by Professor Jennifer Neville

#### CS 59000 NCDS: Numerical Computing for Data Science

**Instructor: **David Gleich, dgleich@purdue.edu

**Topics:** Numerical modeling of data, applications and methods of linear systems & eigenvalues on networks, massive matrix methods for data analysis including singular value decomposition, principle components, and regression; numerical optimization including linear programming and data.

**Prerequisites:**

- STAT 59800PS: Probability and Statistics (see above)
- MA 59800: Linear Algebra for Data Science (see above)
- Programming experience in an object-oriented language (Java, C++) and basic understanding of common data structures. Equivalent to material covered in CS 180.

Lectures by Professor David Gleich

#### PHIL 29300 DL: Ethics for Data Science

Lectures by Professor Taylor Davis

**Topics:** Understanding of ethical questions and responsibilities. Focused on identifying and isolating ethical problems and issues likely to arise in work and professional environments.

The course first introduces a conceptual framework for understanding professional and ethical responsibility, then focuses on applying this framework through repeated practice with case studies. The capstone of the course is an original case study analysis focusing on a case from each student's own area of research.

There are no prerequisites.