This is a new series in the blog where I go over some of the chapters from “introduction to statistical learning” my objective is leaving these notes for myself and future readers to have a summarized version of the chapters with more examples and applications than the original textbook plus solving some of the exercises to really grasp the concepts. If I made any mistakes or you have a better suggestions, please leave it in the comments so I can correct it.
Background
It would be lying if this where like some of the online courses that say that no mathematical background is required to tag along. In principle as this would be statistical learning, for clearly understanding the topics and being able to appreciate the complexity of the problems plus solutions I would recommend the following background:
- Linear Algebra
- Calculus
- Probability and Statistics
- Computer science theory
Like in the title the textbook I plan to follow is the free version of “The elements of statistical learning” which can be obtained for free in https://www.statlearning.com/
What to expect
No one denies that neural networks provide a full array of tools to solve a variety of problems, but sometimes there is no need to go that deep. There is much about neural networks that remains to be discovered, the statistical theory behind is still not that understood meaning it works but it might present itself as a convoluted solution to an easier problem. In the industry people still use data science methods that I will cover in this series of posts.
Here are a couple of insights from the Kaggle 2020 survey data that I consider important for anyone who is interested in getting into a data analysis position (here is the link for the raw data https://www.kaggle.com/competitions/kaggle-survey-2020/data)
- Python is the most used programming language for data science projects in the industry. Hence I will follow the (ISL) version in python and have the code snippets in Python for replication
- Almost half of the people that answer the survey said they are still using logistics and linear regression as their primary data science model
Having said that, I will try to post two lectures per week. That way there is also time to process the content and try the different experiments from the book.
Leave a Reply