Data science is an amalgam of tools, techniques, and processes from statistics, computer science, signal processing, machine learning, …, chosen to form a powerful toolbox and a set of best practices for modern data analysis. Success stories of data science range from molecular biology where it is used to understand single cell RNA sequencing datasets, over physics where it is used to detect new elementary particles, to governance and policymaking where it is used to visualize, understand, and predict global migration flows. A Practical Introduction to Data Science is a first data science course for a varied audience, which emphasizes concrete examples in Python. It assumes some familiarity with programming, ideally in Python. The course covers exploratory data analysis, causal reasoning, data visualization principles, fundamentals of statistics and probability, and machine learning, with many computational examples.

Timetable

Week Topic Resources Assignments
1 (20/2)

Introduction (slides, video)
Two exploratory vignettes (video, notebooks on the right ➡️)

NB: videos are from last year so ignore the admin parts; slides are up to date
 

Breiman: Statistical Modeling: The Two Cultures
Donoho: 50 Years of Data Science

Lecture notebooks:
meteorites (download files)
readbook (download files)
walk-readbook (download files)

​​Questionnaire

Exercise sheet 1 (download files)

2 (6/3) 

Tidy data with pandas + data visualization (slides, video 1/2, video 2/2)

 

Lecture notebooks:
tidy-data (download files)

Introduction to gradescope and Exercise sheet 2 (video)

Exercise sheet 2 (download files); due 14/3 (4pm)
 

3 (13/3) Visualization with Matplotlib + principles of dataviz (slides, video 1/2, video 2/2) Lecture notebooks:
matplotlib-doodles (download files)
matplotlib-intro (download files)
anatomy-of-a-fig (download files)

Exercise session 3 (video)

Exercise sheet 3 (download files); due 21/3 (4pm)

4 (20/3)    

Seaborn + visualizing distributions (slides, video 1/2, video 2/2)

Lecture notebooks:
seaborn-and-more (download files)
Hertzsprung-Russell (download files)
visualizing-distributions (download files)
Exercise sheet 4 (download files); due 28/3 (4pm)
5 (27/3) Probability (slides, video 1/2, video 2/2) Lecture notebooks:
Left-handed (download files)
Monty-hall (download files)
Exercise sheet 5 (download files)
6 (3/4)

Random variables, CLT, Poisson distribution (slides, video 1/2, video 2/2)

Interpreting the Central Limit Theorem (file)

Lecture notebooks:
towards-clt (download files)

Exercise session 6 (video)

Exercise sheet 6 (download files)

7 (17/4) Poisson distribution, from CLT to confidence intervals (slides, video 1/2, video 2/2) Lecture notebooks:
walk-bowel (download files)
Exercise sheet 7 (download files)
8 (24/4) Confidence Intervals (slides 1/2slides 2/2, video 1/2, video 2/2) Lecture notebooks:
Confidence-Intervals & p-values (download files)

Exercise sheet 8 (download files)
Note: Please use the "pids_2023" kernel in Jupyterhub from now on.

 

9 (1/5)    

Exercise sheet 9 (download files)

 

10 (8/5)

Towards Regression via Correlation (slides, video 1/2, video 2/2)

Lecture notebooks:
correlation-regression (download files)
Exercise sheet 10 (download files)
11 (15/5)  
Multiple regression, Simpson's paradox (slides, video)   Exercise sheet 11 (download files)
12 (22/5) video 1/2, video 2/2, slides

Lecture notebooks: machine-learning (download files)

Practice exam

Communication

For discussions about the lectures and excercises you can use the Piazza forum. This allows you to ask questions and help each other in the precise context where issues arise. It's really great!

Class Time & Location

Lectures — Mondays @ 10:15am in Grosser Hörsaal (EO.16) /  Vesalianum Seiteneingang
Recitations — Tuesdays @ 4:15pm in Grosser Hörsaal (EO.16) /  Vesalianum Seiteneingang
Exam ---- June 23 @ 2pm to 4pm in DSGB Neubau, Sporthalle 1, Grosse Allee 6, 4052 Basel

Reading and Online Resources

The following books are pure gold:

  • Spiegelhalter, David. The Art of Statistics: Learning from Data. Penguin UK, 2019.
  • Pearl, Judea & MacKenzie, Dana: The Book fo Why: The New Science of Cause and Effect. Basic Books New York, 2018.
  • Rafael A. Irizarry: Introduction to Data Science, Data Analysis and Prediction Algorithms with R (book, course).

There are many great data science courses offered around the world, and the best ones are free:

The solutions of the exercise will be released on the Github page.

Contact

Lecturer

Prof. Dr. Ivan Dokmanić: ivan.dokmanic[at]unibas.ch

 

Teaching assistants

Ada Krasnovsky: ada.krasnovsky[at]unibas.ch
Alexandra Flora Spitzer: alexandra.spitzer[at]stud.unibas.ch
AmirEhsan Khorashadizadeh: amir.kh[at]unibas.ch
Valentin Debarnot: valentin.debarnot[at]unibas.ch