A major theme throughout the book is detection of overfitting Techniques to manage overfitting are discussed in detail These include data preprocessing, normalization, standardization, transformation of distributions, feature selection, train test split, cross validation, goodness of fit, and error metrics.
Linear and non linear models are described, with detailed examples of use with actual data.
The illustrations are superb Fully disclosed code in R is included.
This book is a very readable handbook that I highly recommend to everyone developing predictive models.
I wish I d had this book 10 years ago, and the discipline to have sat down and read it thoroughly It is well written, has beautiful plots that are worthy of a book on visualization all by themselves, has great coverage of topics, and is easy to understand.
There is a natural comparison to be made toThe Elements of Statistical Learning Data Mining, Inference, and Prediction, Second Edition Springer Series in Statistics I found this book much, much better Where ESLII was fractured and seemed to jump from point to point with no explanation, APM proceeded in a well thought out manner ESLII used some non standard notation and assumptions, where APM used notation familiar to anyone with a background in statistics and linear algebra To be fair, it may be that I ll return to ESL after having read APM and be able to bridge the leaps the authors made with material I ve learned from this book.
The pros Gives a solid introduction to the problem prediction is trying to solve Provides a framework for evaluating prediction results, using a consistent data set across all problems Has citations and references for further reading Does a good job of contrasting machine learning black box models and classical statistics interpertability see Breiman s Statistical Modeling Two Cultures paper for some great insights into this phenomenon The cons A bit light on theory, especially proofs and details behind the models I feel this is a bit of a pro, though, since the citations for the work are provided, and the theorems and proofs are there if you are interested in them.
My name is Matt I m an educator that focuses on data science in business applications My background is business and mechanical engineering, not computer science I don t have a PhD I m an ordinary person that fell in love with Data Science I ve sense started an education business aimed at bringing applied data science courses to help business minded people solve real world problems.
I purchased Applied Predictive Modeling after visiting a high performance hedge fund that employs a number of brilliant minds This book appeared in most of the work spaces so I decided to pick up a copy and read it for myself.
I read the first half of APM on vacation and honestly I couldn t put it down The book goes into detail on a wide range of models, many of which I d never heard of before Beyond this, APM provides the R code showing exactly how to implement the models For me, this application focus is valuable.
The book weaves in many case studies from pharmaceuticals, to business, to even using machine learning to find the optimal concrete formula.
I will say that this book is not for complete beginners, but as soon as you get through the basics this is a great book from two of the best minds in modeling For beginners I recommend R For Data Science.
Hope this helps Matt While this was largely a review for me, there are always gems to be found in comprehensive texts like this I would have loved to have this book 6 7 years ago Even though I don t agree with the entirety of the espoused approach see e.
g Practical Data Science with R for an alternative approach to the cross validation test train holdout set , it is a valid one and I highly recommend this to anyone implementing supervised learning models In particular, the author s caret package which is a perfect companion to this book provides a great basis for data model pipelining that I would dearly love to see other ML frameworks adopt scikit learn is close, but not quite there , and will provide a practical baseline for those building custom model pipelines and frameworks or evaluating what is available off the shelf.