Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction to this book

In today’s world, data is generated at an unprecedented pace, and our ability to harness it is changing the way we live, work, and even think. Data science, an interdisciplinary field that blends statistics, computer science, and domain-specific knowledge, empowers us to extract insights from this vast ocean of data. As data science becomes increasingly essential across various industries and sectors, there is a growing need for skilled professionals who can make sense of data and transform it into actionable information. This book is designed to give you a very broad and at the same time a very practical hands-on tour through the full spectrum of data science approaches.

There are many books, courses, and other learning materials on data science, aimed at different levels of expertise and different backgrounds. However, many of these resources assume a strong foundation in computer science, mathematics, or another quantitative discipline. Until recently, this reflected the most common route into the field, and it is also true for the author of this book. But this is changing. More and more universities and higher education programs are now training a new generation of data scientists: students who may have only limited prior experience in IT and may not come from a scientific background. This book is written for them. It aims to fill that gap by offering a comprehensive, hands-on introduction to data science for readers who are just beginning their journey or considering a future in this fascinating field.

This book was originally developed for a course taught to second-semester bachelor students, with only basic Python programming and mathematics as prerequisites. Since then, I have also started teaching data science to media informatics master students and will soon teach related topics in a new applied research program. This motivated me to expand and deepen the material so that it can serve the needs of these different groups, including students in applied computer science, many of whom have had little or no prior introduction to data science or machine learning. My hope is that this book does not require you to be an expert in computer science or to have a strong background in statistics in order to understand the concepts and techniques it covers. At the same time, I also hope to provide enough depth and enough anchor points for readers with stronger technical backgrounds to continue exploring the subject in greater detail.

The approach of this book is to build understanding from the ground up and to develop intuition through practicing and appliying the concepts yourself. To make this possible, I will often omit some of the mathematical details, or address them only briefly. Instead, you will find many references to resources that allow you to dig deeper when needed. Depending on your learning style, occasionally following some of these paths to better understand the mathematics or algorithms behind the methods can greatly can greatly strengthen your sense of mastery (and, hard to believe for some, can be a lot of fun!). Yet, for a general understanding of the available data science “toolkit” this level of depth is often not strictly necessary.

What cannot be skipped, however, is hands-on practice. In my view, there is simply no way to become a data scientist without getting your hands dirty. Most chapters are built around Python code examples, and all analyses and visualizations in this book are generated by the code shown in those same chapters. This is not meant merely to demonstrate that these things can be done in Python. Rather, it is meant as a starting point and as an invitation for you to experiment with the code yourself. So do not be shy: run the code, change it, break it, fix it, modify it, and adapt it. In my view, that is the only real way to make this incredibly powerful toolbox your own.