Bibliography#
Longbing Cao. Data Science: A Comprehensive Overview. ACM Computing Surveys, 50(3):43:1–43:42, June 2017. URL: https://dl.acm.org/doi/10.1145/3076253 (visited on 2023-09-05), doi:10.1145/3076253.
David Donoho. 50 Years of Data Science. Journal of Computational and Graphical Statistics, 26(4):745–766, October 2017. Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/10618600.2017.1384734. URL: https://doi.org/10.1080/10618600.2017.1384734 (visited on 2023-09-05), doi:10.1080/10618600.2017.1384734.
David M Blei and Padhraic Smyth. Science and data science. Proceedings of the National Academy of Sciences, 114(33):8689–8692, 2017.
Iain Carmichael and JS Marron. Data science vs. statistics: two cultures? Japanese Journal of Statistics and Data Science, 1(1):117–138, 2018.
Joel Grus. Data science from scratch: first principles with python. O'Reilly Media, 2019.
Drew Conway. The data science venn diagram. 2010. URL: ttp://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram (visited on 2024-06-03).
Foster Provost and Tom Fawcett. Data science and its relationship to big data and data-driven decision making. Big data, 1(1):51–59, 2013.
Hossein Hassani, Christina Beneki, Emmanuel Sirimal Silva, Nicolas Vandeput, and Dag Øivind Madsen. The science of statistics versus data science: what is the future? Technological Forecasting and Social Change, 173:121111, 2021.
Tamara Munzner. Visualization analysis and design. CRC press, 2014.
Kieran Healy. Data visualization: a practical introduction. Princeton University Press, 2018.
Claus O Wilke. Fundamentals of data visualization: a primer on making informative and compelling figures. O'Reilly Media, 2019.
Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. 2015. arXiv:1408.6491.
Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. Discrimination through optimization. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–30, nov 2019. URL: https://doi.org/10.1145%2F3359301, doi:10.1145/3359301.
Reuters. Amazon scraps secret AI recruiting tool that showed bias against women. October 2018. URL: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G (visited on 2023-06-05).
Kirsten Martin. Ethical Implications and Accountability of Algorithms. Journal of Business Ethics, 160(4):835–850, December 2019. URL: https://doi.org/10.1007/s10551-018-3921-3 (visited on 2023-06-09), doi:10.1007/s10551-018-3921-3.
Solon Barocas and Andrew D. Selbst. Big Data's Disparate Impact. California Law Review, 104(3):671–732, 2016. Publisher: California Law Review, Inc. URL: https://www.jstor.org/stable/24758720 (visited on 2023-06-09).
Catherine D'Ignazio and Lauren F. Klein. Data feminism. <Strong> ideas series. The MIT Press, Cambridge, Massachusetts ; London, England, 2020. ISBN 978-0-262-04400-4.
C. Stinson. Algorithms are not neutral. AI Ethics, 2:763–770, 2022. doi:10.1007/s43681-022-00136-w.
Sina Fazelpour and David Danks. Algorithmic bias: Senses, sources, solutions. Philosophy Compass, 16(8):e12760, 2021. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/phc3.12760. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/phc3.12760 (visited on 2023-06-09), doi:10.1111/phc3.12760.
Shoshana Zuboff. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. Profile Books, 1st edition, 2019. ISBN 9781781256848.
Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023.
Cathy O'neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown, 2017.
Mike Loukides, Hilary Mason, and D. J. Patil. Ethics and Data Science. O'Reilly Media, 1st edition edition, July 2018.
Executable Books Community. Jupyter Book. February 2020. URL: https://zenodo.org/record/4539666 (visited on 2023-06-06), doi:10.5281/zenodo.4539666.
Vladimir Pletser and Dirk Huylebrouck. The Ishango Artefact: the Missing Base 12 Link. Forma, 1999.
Philip Russom and others. Big data analytics. TDWI best practices report, fourth quarter, 19(4):1–34, 2011.
Martin Frické. The knowledge pyramid: a critique of the DIKW hierarchy. Journal of Information Science, 35(2):131–142, April 2009. Publisher: SAGE Publications Ltd. URL: https://doi.org/10.1177/0165551508094050 (visited on 2024-04-22), doi:10.1177/0165551508094050.
Ralf Otte, Boris Wippermann, Sebastian Schade, and Viktor Otte. Von Data Mining bis Big Data: Handbuch für die industrielle Praxis. Carl Hanser Verlag GmbH & Co. KG, München, 1 edition, July 2020. ISBN 978-3-446-45550-4.
Karl Popper. Karl Popper: Logik der Forschung. Akademie Verlag, July 2013. ISBN 978-3-05-006378-2. URL: https://www.degruyter.com/document/doi/10.1524/9783050063782/html (visited on 2023-09-04), doi:10.1524/9783050063782.
Thomas Bartelborth. Die erkenntnistheoretischen Grundlagen induktiven Schließens. Universität Leipzig, 2017. URL: https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa-220168.
Keith Lehrer. Theory Of Knowledge: Second Edition. Routledge, New York, 2 edition, September 2019. ISBN 978-0-429-49426-0. doi:10.4324/9780429494260.
Max Boisot and Agustí Canals. Data, information and knowledge: have we got it right? Journal of Evolutionary Economics, 14(1):43–67, January 2004. URL: https://doi.org/10.1007/s00191-003-0181-9 (visited on 2024-04-22), doi:10.1007/s00191-003-0181-9.
Claus Weihs and Katja Ickstadt. Data Science: the impact of statistics. International Journal of Data Science and Analytics, 6(3):189–194, November 2018. URL: https://doi.org/10.1007/s41060-018-0102-5 (visited on 2023-09-05), doi:10.1007/s41060-018-0102-5.
Rüdiger Wirth and Jochen Hipp. Crisp-dm: towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, volume 1, 29–39. Manchester, 2000.
Hilary Mason and Chris Wiggins. Dataists » A Taxonomy of Data Science. 2010. URL: http://www.dataists.com/2010/09/a-taxonomy-of-data-science/ (visited on 2023-09-05).
Philip Guo. Data Science Workflow: Overview and Challenges. 2022. URL: https://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext (visited on 2023-09-05).
The pandas development team. Pandas-dev/pandas: pandas. February 2020. URL: https://doi.org/10.5281/zenodo.3509134, doi:10.5281/zenodo.3509134.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
Justin Matejka and George Fitzmaurice. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI '17, 1290–1294. New York, NY, USA, May 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3025453.3025912 (visited on 2023-09-07), doi:10.1145/3025453.3025912.
Thomas Piketty. Capital in the Twenty-First Century:. Belknap Press, Cambridge, MA, April 2014. ISBN 978-0-674-43000-6.
David Lane, David Scott, Mikki Hebl, Rudy Guerra, Dan Osherson, and Heidi Zimmer. Introduction to Statistics. Citeseer, 2003. URL: https://open.umn.edu/opentextbooks/textbooks/459.
Peter Bruce and Andrew Bruce. Practical Statistics for Data Scientists. O′Reilly, Beijing Boston Farnham Sebastopol Tokyo, 1 edition, June 2017. ISBN 978-1-4919-5296-2.
Stanley H Chan. Introduction to probability for data science. Michigan Publishing Services, 2021. URL: https://probability4datascience.com/index.html.
F. J. Anscombe. Graphs in Statistical Analysis. The American Statistician, 27(1):17–21, February 1973. Publisher: Taylor & Francis _eprint: https://www.tandfonline.com/doi/pdf/10.1080/00031305.1973.10478966. URL: https://www.tandfonline.com/doi/abs/10.1080/00031305.1973.10478966 (visited on 2024-04-22), doi:10.1080/00031305.1973.10478966.
Amit Saxena, Mukesh Prasad, Akshansh Gupta, Neha Bharill, Om Prakash Patel, Aruna Tiwari, Meng Joo Er, Weiping Ding, and Chin-Teng Lin. A review of clustering techniques and developments. Neurocomputing, 267:664–681, 2017.
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, 413–422. IEEE, 2008.
Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 93–104. 2000.
Azzedine Boukerche, Lining Zheng, and Omar Alfandi. Outlier detection: methods, models, and classification. ACM Computing Surveys (CSUR), 53(3):1–37, 2020.
Abir Smiti. A critical overview of outlier detection methods. Computer Science Review, 38:100306, 2020.
Farzana Anowar, Samira Sadaoui, and Bassant Selim. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Computer Science Review, 40:100378, 2021.
Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. Understanding how dimension reduction tools work: an empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization. Journal of Machine Learning Research, 22(201):1–73, 2021.
Leland McInnes, John Healy, and James Melville. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
Allison Marie Horst, Alison Presmanes Hill, and Kristen B Gorman. palmerpenguins: Palmer Archipelago (Antarctica) penguin data. 2020. R package version 0.1.0. URL: https://allisonhorst.github.io/palmerpenguins/, doi:10.5281/zenodo.3960218.
Ilias Tougui, Abdelilah Jilbab, and Jamal El Mhamdi. Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthcare informatics research, 27(3):189, 2021.
Fabio Mendoza Palechor and Alexis De la Hoz Manotas. Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from colombia, peru and mexico. Data in Brief, 2019. URL: https://api.semanticscholar.org/CorpusID:201195793.
Sotiris Kotsiantis. Combining bagging, boosting, rotation forest and random subspace methods. Artificial intelligence review, 35:223–240, 2011.
Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996.
Jafar Tanha, Yousef Abdi, Negin Samadi, Nazila Razzaghi, and Mohammad Asadpour. Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7:1–47, 2020.
Candice Bentéjac, Anna Csörgő, and Gonzalo Martínez-Muñoz. A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54:1937–1967, 2021.
Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017. URL: http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1):2522–5839, 2020.
Sebastian Raschka, Yuxi Hayden Liu, Vahid Mirjalili, and Dmytro Dzhulgakov. Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python. Packt Publishing Ltd, 2022.
Simon JD Prince. Understanding Deep Learning. MIT press, 2023.
Nikhil Ketkar, Jojo Moolayil, Nikhil Ketkar, and Jojo Moolayil. Introduction to pytorch. Deep learning with python: learn best practices of deep learning models with PyTorch, pages 27–91, 2021.
Yuli Vasiliev. Natural language processing with Python and spaCy: A practical introduction. No Starch Press, 2020.
Narendra Kumar Gupta, Giuseppe Di Fabbrizio, and Patrick Haffner. Capturing the stars: predicting ratings for service and product reviews. In HLT-NAACL 2010. 2010.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. October 2013. arXiv:1310.4546 [cs, stat]. URL: http://arxiv.org/abs/1310.4546 (visited on 2023-06-12), doi:10.48550/arXiv.1310.4546.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. September 2013. arXiv:1301.3781 [cs]. URL: http://arxiv.org/abs/1301.3781 (visited on 2023-06-12), doi:10.48550/arXiv.1301.3781.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 2017.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, and others. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and others. Huggingface's transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
Dan Jurafsky and James H Martin. Speech and language processing. 3rd ed. draft. 2024.
Lewis Tunstall, Leandro Von Werra, and Thomas Wolf. Natural language processing with transformers. " O'Reilly Media, Inc.", 2022.
Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, 11 – 15. Pasadena, CA USA, 2008.
Peter S Bearman, James Moody, and Katherine Stovel. Chains of affection: the structure of adolescent romantic and sexual networks. American journal of sociology, 110(1):44–91, 2004.
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
Filippo Menczer, Santo Fortunato, and Clayton A Davis. A first course in network science. Cambridge University Press, 2020.
Keith McNulty. Welcome \textbar Handbook of Graphs and Networks in People Analytics: With Examples in R and Python. Routledge & CRC Press, 2022. ISBN 978-1-03-220497-0. URL: https://ona-book.org (visited on 2024-01-23).
Santo Fortunato. Community detection in graphs. Physics reports, 486(3-5):75–174, 2010.
Tiago P Peixoto. Descriptive vs. inferential community detection in networks: Pitfalls, myths and half-truths. Cambridge University Press, 2023.
{bibliography}