In short, data science is about extracting, interpreting, and communicating relevant insights from complex data with the help of digital techniques.
Unlike long-established fields such as mathematics, physics, or history, data science is a relatively new term and area of study. If you ask ten data scientists to define their field, you will likely get ten different answers. Some might view it as a distinct discipline, others as a technical approach or mindset, and still others might consider it synonymous with statistics. Many authors have contributed definitions or descriptions of what data science is Cao (2017)Donoho (2017)Blei & Smyth (2017)Carmichael & Marron (2018)Grus (2019), but for now let’s start with the rather general and accessible description of data science as
the practice of gaining, interpreting, and communicating insights from complex data through digital techniques.
Many quantitative scientists would argue that they do similar work, as they strive to learn from data and use digital tools extensively. This overlap does not diminish the importance of data science; it simply indicates that many scientists must also be data scientists to stay current in their fields. Rapid advancements in digital techniques, including machine learning, are transforming many research areas.
Opinions on what data science exactly is can vary, often depending on the application area. In consulting and business, data science might mean something different than in academia. However, one especially common way to illustrate data science is with a Venn diagram: the overlap between digital techniques, statistics, and domain expertise Conway (2010)Carmichael & Marron (2018).

Figure 1. Venn diagram to indicate the intersection of fields for data science.
Data is Not New. Why Has Data Science Emerged Now?¶
Data has been a cornerstone of human understanding for millennia - from ancient civilizations keeping records of harvests and astronomy, to modern businesses tracking sales and performance. It’s clear that data in itself is not a new concept. However, the emergence and ascendancy of data science as a discipline is a relatively recent phenomenon. So, why now?
The prominence of data science in today’s world can be attributed to several concurrent developments:
(1) The exponential increase in the volume of data generated. Thanks to digitalization and the rise of the Internet, mobile devices, and IoT (Internet of Things), we are producing data at a previously unimaginable scale. This big data presents both a challenge and an opportunity - the challenge being how to handle and process this vast amount of information, and the opportunity being the valuable insights that can be gleaned from it.
This is accompanied by an increased recognition of the importance of data-driven decision-making across diverse sectors Provost & Fawcett (2013). Various industries, governments, and institutions have realized that leveraging the power of data can lead to increased efficiency, better decision-making, and a competitive advantage.
This existence (and appreciation) of larger and larger amounts of data can be seen as a substrate for the rise of data science, but it really needed a combination of several other developments to be able to properly work with such data (Figure 1).
(2) The evolution and expansion of statistical methodologies have been a key driver. Statistics provide the foundational principles and techniques for analyzing data, making inferences, and predicting future trends. In the era of big data, classical and modern statistical techniques form the backbone of most analyses in data science. The relationship between statistics and data science is so close that some statisticians have argued that data science should be understood as a modern extension or reframing of statistics. In the late 1990s, Jeff Wu famously proposed viewing statistics as “data science”, and the relationship between the two fields is still debated today Carmichael & Marron (2018). Despite all overlap, both terms still exist and usually mean related but different things (see also Hassani et al. (2021)).
(3) The strides we’ve made in data handling capabilities have greatly facilitated the rise of data science. This obviously includes the drastic advancements in computational power and storage capabilities that made it possible to collect, store, and analyze these massive datasets. But this also includes many developments from computer science, such as databases. Just a few decades ago, collecting, storing, and analyzing the vast amounts of data we deal with today would have been unimaginable, let alone impractical.
(4) There has been significant progress in the field of algorithms, which also includes machine learning. Algorithms are central to many tools used in data science, from classical optimization methods and statistical estimation to modern machine learning and deep learning approaches. These advancements have opened up new possibilities for predictive analytics, automation, and artificial intelligence.
(5) Lastly, the increasingly important field of data visualization has greatly expanded the ways in which we can explore, understand, and communicate data. Effective data visualization makes complex data more comprehensible, accessible, and actionable. The development of powerful visualization tools enables us to present data in a visually compelling manner that fosters understanding and drives informed decisions Munzner (2014)Healy (2018)Wilke (2019).
So, while data is not new, the volume of data, our ability to process it, and the recognition of its value, are. These changes have given rise to the burgeoning field of data science, marking a new era in our relationship with data.

Figure 1:The concurrent developments leading to Data Science [1].
A brief spotlight: the many facets of Data Science¶
Data science, by its very nature, stands at the bustling intersection of digital techniques, statistical methodologies, and domain expertise. It is a broad and incredibly diverse field with intricate links to many different sectors and disciplines. This diversity results in a wide variety of roles and responsibilities, each bringing unique skills and viewpoints to address an array of challenges and opportunities.
One of the key characteristics that makes data science so dynamic is its inherent multidisciplinarity. Data science isn’t just about dealing with numbers or coding, it’s about leveraging a suite of digital tools and statistical methods to draw insights from data, and applying these insights to a specific context or domain. A data scientist working in healthcare, for instance, might use different techniques and has a different focus than a data scientist working in retail or finance. The beauty of data science lies in this versatility, it is a field where skills and disciplines converge and collaborate.
Given the breadth and depth of the field, being a successful data scientist requires much more than just technical skills. A natural curiosity to explore and understand data, an openness to new ideas and methods, the eagerness to continuously learn and adapt, and most crucially, the ability to communicate and collaborate effectively are all vital attributes. After all, data science is a team sport. No single person can master all facets of data science. Instead, it’s about bringing diverse skills together, working with others, and learning from each other.
In the following pages, we will explore the multifaceted world of data science, its skills, and applications.
Images of Thomas Bayes and Carl Friedrich Gauß taken from Wikipedia. For Thomas Bayes it is not even fully certain that the image actually shows him.
- Cao, L. (2017). Data Science: A Comprehensive Overview. ACM Computing Surveys, 50(3), 43:1-43:42. 10.1145/3076253
- Donoho, D. (2017). 50 Years of Data Science. Journal of Computational and Graphical Statistics, 26(4), 745–766. 10.1080/10618600.2017.1384734
- Blei, D. M., & Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences, 114(33), 8689–8692.
- Carmichael, I., & Marron, J. (2018). Data science vs. statistics: two cultures? Japanese Journal of Statistics and Data Science, 1(1), 117–138.
- Grus, J. (2019). Data science from scratch: first principles with python. O’Reilly Media.
- Conway, D. (2010). The Data Science Venn Diagram. ttp://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
- Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision making. Big Data, 1(1), 51–59.
- Hassani, H., Beneki, C., Silva, E. S., Vandeput, N., & Madsen, D. Ø. (2021). The science of statistics versus data science: What is the future? Technological Forecasting and Social Change, 173, 121111.
- Munzner, T. (2014). Visualization analysis and design. CRC press.
- Healy, K. (2018). Data visualization: a practical introduction. Princeton University Press.
- Wilke, C. O. (2019). Fundamentals of data visualization: a primer on making informative and compelling figures. O’Reilly Media.