Data Science is an interdisciplinary field that blends elements of Statistics, Computer Science and Mathematics. On this page I am sharing some learning resources that I found useful: I hope this can help other learners to get started. Most of those resources are even freely available.
When it comes to coding for data science, the choice of programming languages boils down to either Python or R. My current focus is on Python and here are some resources that I can recommend.
by Allen B. Downey
Green Tea Press
This book is one of the best introduction to Python programming for complete beginners that I’ve found. The basic concepts are explained very clearly, and there’s plenty of programming exercises for practice. Available in PDF and HTML on the Green Tea Press website. Check out the other free titles from the same author (Think Bayes, Think Stats, Think Java and more).
Python for Everybody
by Charles Severance
Another excellent introduction for people new to programming. It serves a textbook for courses available on Coursera, edX, FutureLearn and freeCodeCamp. I didn’t actually use this book when I had to learn Python from scratch since I found out only recently, but I think it’s worth adding to the list. It can be downloaded from the author’s website. Also available online on Trinket.
Automate the boring stuff with Python
by Al Sweigart
No Starch Press
This is a go-to book after you’ve mastered the basics of the language. It offers coding practice with interesting examples like sending email, manipulating files, web scraping, automating spreadsheets and image processing. There’s a free HTML version on the book’s website: automatetheboringstuff.com. More free Python books from the same author on: inventwithpython.com.
Other Books (non-free)
Python for Data Analysis
by Wes McKinney
The pandas library is what makes Python a data analysis powerhouse and this title is written by the creator of pandas himself.
I used the second edition of this book. However, in my opinion, it’s already outdated: that’s how fast this library has evolved since the book’s publication. In any case, I still have to find a better resource to learn pandas in detail. A third edition is on the way, and I would wait for that if I had to purchase a copy now.
Learning SQL: Generate, Manipulate, and Retrieve Data
by Alan Beaulieu
Data scientist are often required to use SQL to handle large datasets.
This text is not for absolute beginners, but it’s still one of the best resources to master the power of SQL. I used the second edition of this book to to get my grips on SQL years ago, and I keep going back to it when I need to refresh my knowledge.
There are a several online courses that can help to set some foundations of coding with Python if you, like me, don’t come from a computing background. Here are some that I liked most.
The next two courses are part of the Computational Thinking using Python program by MIT on edX:
Another good one:
- Using Python for Research
by Harvard University on edX.
Introduction to Modern Statistics
by Mine Çetinkaya-Rundel and Johanna Hardin
Mathematics for Machine Learning
by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
2020, Cambridge University Press
Not a text for beginners, since it requires some solid foundations of Calculus and Linear Algebra. It can be downloaded for free on mml-book.github.io/.
- An Introduction to Statistical Learning (with applications in R)
by G. James, D. Witten, T. Hastie, R. Tibshirani
One of the best books around on Statistical Learning. It can be downloaded for free onfrom its web page www.statlearning.com or from trevorhastie.github.io/ISLR/.
Other Books (non-free)
- Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
by Aurélien Géron
At the time of writing, I have just started with this title: I wish I had done this earlier! Beautifully and clearly written, it covers the most relevant Machine Learning topics from an applied point of view with just the right amount of theory.
Free Online Learning
Andrew Ng’s Machine Learning course on Coursera
Probably the most popular MOOCs on machine learning.