Data science is an amalgam of tools, techniques, and processes from statistics, computer science, signal processing, machine learning, …, chosen to form a powerful toolbox and a set of best practices for modern data analysis. Success stories of data science range from molecular biology where it is used to understand single cell RNA sequencing datasets, over physics where it is used to detect new elementary particles, to governance and policymaking where it is used to visualize, understand, and predict global migration flows. A Practical Introduction to Data Science is a first data science course for a varied audience, which emphasizes concrete examples in Python. It assumes some familiarity with programming, ideally in Python. The course covers exploratory data analysis, causal reasoning, data visualization principles, fundamentals of statistics and probability, and machine learning, with many computational examples.

Timetable

Week Date Topic Resources Assignment
1 22/2

Introduction (slides, video)
Three (two) exploratory vignettes (video 1)
 

Breiman: Statistical Modeling: The Two Cultures
Donoho: 50 Years of Data Science

Notebooks: walk-meteoriteswalk-bowel

Questionnaire

Lecture notebooks: meteorites (files), readbook (files), walk-readbook (files)

2 1/3

Tidy data with pandas + data visualization (slides, video 1, video 2)
Python clinique, Q&A, video
 

Exercise sheet 1 (download files here)
Tidy data: Notebook and dataset (or JupyterHub)

Exercise sheet 1 solution (download files here)

Lecture notebook: tidy-data-in-pandas (files)
  8/3 Fasnachtsferien    
3 15/3

Recap + data visualization (slides, video 1, video 2)
 

Exercise session 2, video

Exercise sheet 2 (download files here)

Exercise sheet 2 solution (download files here)

Lecture notebooks: billboard (files), matplotlib-intro (files)
4 22/3

Dataviz with Seaborn,  visualizating distributions, principles of visualization (slides, video 1, video 2)
 

Exercise session 3 video

Exercise sheet 3 (download file here)

Exercise sheet 3 solution (download files here)

Lecture notebook: seaborn-and-more (files)
5 29/3

Intro to distributions, densities, outliers, probability (slides, video 1, video 2)

Exercise session 4 video

Exercise sheet 4 (download file here)

Exercise sheet 4 solution (download files here)

Lecture notebook: Lecture5 (files)
6 05/4

Intro to probability (slides, video 1, video 2)

Exercise session 5 video

Exercise sheet 5 (download file here)

Exercise sheet 5 solution (download files here)

Lecture notebook: monty-hall (files)
7 12/4      
8 19/4

CLT, LLN, binomial, Poisson, and normal distributions (slides, video 1, video 2)

Exercise session 6 video

Exercise sheet 6 (download file here)

Exercise sheet 6 solution (download files here)

Lecture notebook: towards-clt (files)
9 26/4

Random variation or real effect? #1
(video, slides)

Exercise session 7 video

Exercise sheet 7 (download file here)

Exercise sheet 7 solution (download file here)

Lecture notebook: walk-bowel (files), bootstrap (files)
10 3/5

Confidence intervals and the bootstrap (video 1, video 2, slides)

Exercise session 8 video

Exercise sheet 8 
(download file here)

Exercise sheet 8 solution (download file here)

Lecture notebook: confidence intervals (files)
11 10/5 Brief p-values and beginning regression (video 1, video 2, slides)

Exercise sheet 9 (download file here)

Exercise sheet 9 solution (download file here)

 
12 17/5 Regression, confounding (video 1, video 2, slides)

Exercise sheet 10 (download file here)

Exercise sheet 10 solution (download file here)

Lecture notebook: correlation-regression (files)
13 24/5 Intro to ML (video 1, video 2, slides)

Exercise sheet 11 
(download file here)

Exercise sheet 11 solution (download file here)

Lecture notebook: ML (files)
14 31/5

Wrap up ML and farewell! (video, slides)

Exercise session (video, slides)

Practice exam, correct answers Lecture notebook: ML (files)
  24/6 Final exam @ Biozentrum
 
 

 

 

Communication and Collaborative Reading

For discussions about the lectures and excercises you can use the Piazza forum. This allows you to ask questions and help each other in the precise context where issues arise. It's really great!

Class Time & Location

Lecture will be given in Bernoullianum, Grosser Hörsaal 148.

The lecture will take place from 4:15pm to 6pm.

Exercises session will take place from 2:15pm to 4pm.

Only exception: excercise session on May 17 will be given in Vesalianum Seiteneingang, Grosser Hörsaal (EO.16).

 

Reading and Online Resources

The following books are pure gold:

  • Spiegelhalter, David. The Art of Statistics: Learning from Data. Penguin UK, 2019.
  • Pearl, Judea & MacKenzie, Dana: The Book fo Why: The New Science of Cause and Effect. Basic Books New York, 2018.

There are many great data science courses offered around the world, and the best ones are free:

 

Contact

Lecturer

Prof. Dr. Ivan Dokmanić: ivan.dokmanic[at]unibas.ch

 

Teaching assistants

Phaina Koncebovski: phaina.koncebovski[at]stud.unibas.ch

AmirEhsan Khorashadizadeh: amir.kh[at]unibas.ch

Valentin Debarnot: valentin.debarnot[at]unibas.ch