Data Science is an interdisciplinary field that blends elements of Statistics, Computer Science and Mathematics. On this page I am sharing some learning resources that I found useful: I hope this can help other learners to get started. Most of those resources are even freely available.

Computer Programming

When it comes to coding for data science, the choice of programming languages boils down to either Python or R. My current focus is on Python and here are some resources that I can recommend.

Free Books

  • Think Python
    by Allen B. Downey
    Green Tea Press
    —–
    This book is one of the best introduction to Python programming for complete beginners that I’ve found. The basic concepts are explained very clearly, and there’s plenty of programming exercises for practice. Available in PDF and HTML on the Green Tea Press website. Check out the other free titles from the same author (Think Bayes, Think Stats, Think Java and more).

  • Python for Everybody
    by Charles Severance
    —–
    Another excellent introduction for people new to programming. It serves a textbook for courses available on Coursera, edX, FutureLearn and freeCodeCamp. I didn’t actually use this book when I had to learn Python from scratch since I found out only recently, but I think it’s worth adding to the list. It can be downloaded from the author’s website. Also available online on Trinket.

  • Automate the boring stuff with Python
    by Al Sweigart
    No Starch Press
    —–
    This is a go-to book after you’ve mastered the basics of the language. It offers coding practice with interesting examples like sending email, manipulating files, web scraping, automating spreadsheets and image processing. There’s a free HTML version on the book’s website: automatetheboringstuff.com. More free Python books from the same author on: inventwithpython.com.

Other Books (non-free)

  • Python for Data Analysis
    by Wes McKinney
    2017, O’Reilly
    —–
    The pandas library is what makes Python a data analysis powerhouse and this title is written by the creator of pandas himself.
    I used the second edition of this book. However, in my opinion, it’s already outdated: that’s how fast this library has evolved since the book’s publication. In any case, I still have to find a better resource to learn pandas in detail. A third edition is on the way, and I would wait for that if I had to purchase a copy now.

  • Learning SQL: Generate, Manipulate, and Retrieve Data
    by Alan Beaulieu
    2020, O’Reilly
    —–
    Data scientist are often required to use SQL to handle large datasets.
    This text is not for absolute beginners, but it’s still one of the best resources to master the power of SQL. I used the second edition of this book to to get my grips on SQL years ago, and I keep going back to it when I need to refresh my knowledge.

Python MOOCs

There are a several online courses that can help to set some foundations of coding with Python if you, like me, don’t come from a computing background. Here are some that I liked most.

The next two courses are part of the Computational Thinking using Python program by MIT on edX:

Another good one:


Statistics

Free Books


Mathematics

Free Books

  • Mathematics for Machine Learning
    by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
    2020, Cambridge University Press
    —–
    Not a text for beginners, since it requires some solid foundations of Calculus and Linear Algebra. It can be downloaded for free on mml-book.github.io/.

  • Introduction to Probability for Data Science
    by Stanley Chan
    2021, Michigan Publishing
    —–
    Read for free on probability4datascience.com/, or download the free PDF from Michigan Publishing.


Machine Learning

Free Books

  • An Introduction to Statistical Learning (with applications in R)
    by G. James, D. Witten, T. Hastie, R. Tibshirani
    Springer Verlag
    —–
    One of the best books around on Statistical Learning. It can be downloaded for free onfrom its web page www.statlearning.com or from trevorhastie.github.io/ISLR/.

Other Books (non-free)

  • Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
    by Aurélien Géron
    2019, O’Reilly
    —–
    At the time of writing, I have just started with this title: I wish I had done this earlier! Beautifully and clearly written, it covers the most relevant Machine Learning topics from an applied point of view with just the right amount of theory.

Free Online Learning


Data Science Online Learning Platforms

  • DataCamp

  • Dataquest
    So far, my favourite learning platform. It does not use videos, rather just clearly written text and exercises, which I pefer. The link above contains a referral link, it should give you $15 off if you sign up.

  • 365DataScience