The Data Science Option (DSO) is a set of extra requirements for students interested in data science. If completed, your degree title will be Doctor of Philosophy (Chemistry: Data Science). The goal of this option is to educate all students in the foundations of data science, so they may apply those methods and techniques in current research. The Chemistry DSO is designed for students with little or no background in data science, computer science or coding.
Curriculum
The requirements for the Chemistry DSO are as follows:
1) All students must take the following three classes which are scheduled to be offered in an autumn/winter/spring sequence:
- CHEM 541/MSE 542 Data Science and Materials Informatics (3 credits, autumn)
- CHEM 542/MSE 543 Materials and Device Modeling (3 credits, winter) OR CHEM 565 – Computational Chemistry (3 credits, winter)
- CHEM 543/MSE 544 Big Data for Materials Science (3 credits, spring)
2) Students must register for and attend at least three quarters of the following interdisciplinary and community seminars in order to further expand their education and exposure to a diverse range of research topics at the nexus of Chemistry and Data Science:
- MOLENG 520/CHEM 597 Molecular Engineering Institute Seminar
- MOLENG 599A Seminar in Clean Energy (Clean Energy Institute Seminar
- CHEME 599 Current Topics in Data Science (eScience Community Seminar)
A student may choose any combination of the three they wish, and requirements from other training programs (e.g., Clean Energy Institute (CEI) graduate fellows required to take the CEI seminar) may count toward this requirement.
TOTAL CREDITS
The total credit requirement for the Chem-DSO is 9 credits in graded coursework (3 courses with 3 credits each). The current Ph.D. requirements in Chemistry include 18 credits of graded coursework. Depending on the specific sub-division requirements, these 18 credits include a 9 or 12 credit core course sequence (3 or 4 classes) and 9 or 6 credits “for breadth” from a variety of options.
Courses taken for the Chem-DSO are in addition to the 18 credit requirement for the Chemistry Graduate Ph.D. Program (see Table 1 below).
Full requirements to earn the Doctor of Philosophy (Chemistry: Data Science)
Research progress requirements
- 2nd-year qualifying examination
- General examination
- D. dissertation (written thesis and oral defense)
90 credits of coursework (minimum)
- Required coursework (18 credits): including 9-12 credit core course sequence and 6-9 credits from breadth options
- Data Science coursework (9 credits)
- CHEM 541/MSE 542
- CHEM 542/MSE 542 OR CHEM 565
- CHEM 543/MSE 544
- Alternative courses available subject to approval by advisor
- Department of Chemistry general seminar (1 credit/quarter, max 18, no total credit requirement)
CHEM 590 Seminar in General Chemistry
- Department of Chemistry divisional seminars (1 credit/quarter, max 18, no total credit requirement)
- CHEM 591 Seminar in Inorganic Chemistry
- CHEM 592 Seminar in Analytical Chemistry
- CHEM 593 Seminar in Organic Chemistry
- CHEM 595 Seminar in Physical Chemistry
- Department of Chemistry current research seminars (1 credit/quarter, max 18, no total credit requirement)
- CHEM 571 Current Research Topics in Inorganic Chemistry
- CHEM 573 Current Research Topics in Organic and Biological Chemistry
- CHEM 574 Current Research Topics in Spectroscopy
- CHEM 575 Current Research Topics in Theoretical and Computational Chemistry
- CHEM 578 Current Research Topics in Materials Chemistry
- Data Science seminars (3 credits)
- MOLENG 520/CHEM 597 Molecular Engineering Institute Seminar (1 credit/quarter, max 30)
- MOLENG 599A Seminar in Clean Energy (1 credit/quarter, max 30)
- CHEME 599 Current Topics in Data Science (1 credit/quarter, max 12)
- Preparation for 2nd year examination (9 credits)
CHEM 581 (3 credits/quarter) - Research Credit (27 credits minimum)
- CHEM 600: Independent Study or Research (no minimum credit requirement)
- CHEM 800: Doctoral Dissertation (27 credits over 3 quarters, enroll after completing general exam)
Table 1. Typical path to 90 credits
Course(s) |
Standard Chem Ph.D. |
Chem Ph.D. with DSO |
Required coursework (Chem Ph.D.) |
18 |
18 |
Chemistry general seminar* |
6 |
6 |
Chemistry divisional seminar* |
6 |
6 |
Current research seminar* |
6 |
6 |
2nd year exam prep (CHEM 581) |
9 |
9 |
Independent study or research (CHEM 600)* |
18 |
6 |
Doctoral dissertation research (CHEM 800)* |
27 |
27 |
Required coursework (DSO) |
|
9 |
Data science seminar |
|
3 |
Total |
90 |
90 |
* Students typically complete more than the indicated amount of credits for this course/area |
Alternative Course Options
A course for the Chem-DSO may be replaced by an equivalent or more advanced course in the same area upon approval. A student must submit an email petition to their Faculty Adviser and the Graduate Program Coordinator for approval. The joint Chem/MSE course sequence is relatively new (as of 2021), and we expect that some current students will have completed equivalent courses offered by other departments. Appropriate courses include (but are not limited to) the following:
General
CHEME 545/CHEM 545 – Data Science Methods for Clean Energy Research (3 credits)
Software Development
CHEME 546/CHEM 546 – Software Engineering for Molecular Data Scientists (3 credits)
CHEME 547/CHEM 547 – Data Science Capstone Project (3 credits)
CSE 583 – Software Development for Data Scientists (4 credits)
AMATH 581 – Scientific Computing (5 credits)
AMATH 583 – High-Performance Scientific Computing (5 credits); Prerequisite: AMATH 581
Statistics and Machine Learning
CSE 416/STAT 416 – Introduction to Machine Learning (4 credits); Prerequisites: (CSE 143 or CSE 160) and (STAT 311 or STAT 390)
STAT 435 – Introduction to Statistical Machine Learning (4 credits); Prerequisites: either STAT 341, STAT 390/MATH 390, or STAT 391; recommended: MATH 308
CSE 546 – Machine Learning (4 credits)
STAT 535 Statistical Learning: Modeling, Prediction, and Computing (3 credits)
STAT 509 – Introduction to Mathematical Statistics: Econometrics I (5 credits)
STAT 512-513 – Statistical Inference (4 credits each)
AMATH 515 - Fundamentals of Optimization (5 credits)
ATM S 552 – Objective Analysis (3 credits)
Data Visualization & Data Management
CSE 414: Introduction to Database Systems (4 credits); Prerequisites: CSE 143
CSE 544 – Principles of DBMS (4 credits)
CSE 442 – Data Visualization (4 credits); Prerequisite: CSE 332
CSE 412 – Introduction to Data Visualization (4 credits); Prerequisites: CSE 143 or CSE 163
CSE 512 – Data Visualization (4 credits)
IMT 562 – Interactive Information Visualization (4 credits)
INFO 474 – Interactive Information Visualization (5 credits); Prerequisites: INFO 343 or CSE 154; and CSE 143; and either Q METH 201, Q SCI 381, STAT 221/CS&SS 221/SOC 221, STAT 311, or STAT 390/MATH 390
HCDE/DATA 511 – Information Visualization/Data Visualization and Exploratory Analytics (4 credits)
HCDE 411 – Information Visualization (5 credits) Prerequisites: HCDE 308 and HCDE 310
ESS 420 – Introduction to GIS for the Earth Sciences (5 credits)
ESS 520 – Application in Geophysical Analysis with Python for the Earth Sciences (4 credits)
AMATH 582 – Computational Methods for Data Analysis (5 credits) Prerequisite: either MATLAB and linear algebra
OCEAN 502 – Marine Geospatial Information Science (3 credits)
BIOL 519 – Data Science for Biologists (4 credits)