I hope the previous chapters gave you a basic but sound foundation about how to use machine learning models for research or applications in your industry. Especially, I hope that the book sparked your interest in exploring machine learning further.
This leads to the question, where to go next? What is a good strategy to further extend your machine learning knowledge? There are many answers to this question. We will focus on six strategies:
Whatever strategy you use to extend your knowledge in machine learning, I wish you an enjoyable journey.
Aden-Buie, Garrick, Barret Schloerke, and JJ Allaire. 2022.
Learnr: Interactive Tutorials for r.
https://CRAN.R-project.org/package=learnr.
Asaithambi, Sudharsan. 2017. “Why, How and When to Scale Your Features.” GreyAtom.
Bank of England. 2022. “Bank of England Inflation Calculator.” Online.
Beck, Marcus W. 2018.
“NeuralNetTools: Visualization and Analysis Tools for Neural Networks.” Journal of Statistical Software 85 (11): 1–20.
https://doi.org/10.18637/jss.v085.i11.
Biecek, Przemyslaw. 2018.
“Dalex: Explainers for Complex Predictive Models in R.” Journal of Machine Learning Research 19 (84): 1–5.
https://jmlr.org/papers/v19/18-416.html.
Centers for Disease Control and Prevention (CDC). 2021a. “Vaccine Hesitancy for COVID-19: County and Local Estimate.” Online, June.
———. 2021b. “COVID-19 Vaccinations in the United States, County.” Online, September.
Chan, Chung-Hong, Geoffrey C. H. Chan, Thomas J. Leeper, and Jason Becker. 2021. Rio: A Swiss-Army Knife for Data File i/o.
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2022.
Shiny: Web Application Framework for r.
https://CRAN.R-project.org/package=shiny.
Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002.
“SMOTE: Synthetic Minority over-Sampling Technique.” Journal of Artificial Intelligence Research 16.
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html.
Chen, Tianqi, and Carlos Guestrin. 2016.
“XGBoost a Scalable Tree Boosting System.” In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, edited by Balaji Krishnapuram, 321–57.
ACM, New York, NY.
https://doi.org/10.1145/2939672.2939785.
Chen, Tianqi, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, et al. 2023.
XGBoost: Extreme Gradient Boosting.
https://CRAN.R-project.org/package=xgboost.
Corporation, Microsoft, and Steve Weston. 2022.
Doparallel: Foreach Parallel Adaptor for the ’Parallel’ Package.
https://CRAN.R-project.org/package=doParallel.
Cortez, Paulo, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009.
“Modeling Wine Preferences by Data Mining from Physicochemical Properties.” Decision Support Systems 47 (4): 547–53.
https://doi.org/10.1016/j.dss.2009.05.016.
Delgado, Fernando. 2022.
“A Beginner’s Guide to CatBoost with Python.” MLearning.ai, June.
https://medium.com/mlearning-ai/a-beginners-guide-to-catboost-with-python-763d7e7ac199.
Delua, Julianna. 2021.
“Supervised Vs. Unsupervised Learning: What’s the Difference?” IBM Blog.
https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning.
Federal Reserve Bank of St. Louis. 2023. “Economic Research Resources.” Online.
Firke, Sam. 2023.
Janitor: Simple Tools for Examining and Cleaning Dirty Data.
https://CRAN.R-project.org/package=janitor.
Freund, Yoav, and Robert E. Schapire. 1996. “Experiments with a New Boosting Algorithm.” In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, 148–56. ICML’96. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Friedman, Jerome, Robert Tibshirani, and Trevor Hastie. 2010.
“Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22.
https://doi.org/10.18637/jss.v033.i01.
Galton, Francis. 1907.
“Vox Populi.” Nature 75 (1949): 450–51.
https://doi.org/10.1038/075450a0.
Greenwell, Brandon M., and Bradley C. Boehmke. 2020.
“Variable Importance Plots—an Introduction to the Vip Package.” The R Journal 12 (1): 343–66.
https://doi.org/10.32614/RJ-2020-013.
Gujarati, D. N., and D. C. Porter. 2009.
Basic Econometrics. Economics Series. McGraw-Hill, New York, NY.
https://books.google.com/books?id=6l1CPgAACAAJ.
Hanck, Christoph, Martin Arnoldand, Alexander Gerber, and Martin Schmelzer. 2023.
Introduction to Econometrics with R. Online.
https://www.econometrics-with-r.org/index.html.
Haykin, Simon. 1999. Neural Networks: A Comprehensive Foundation (2nd Edition). Prentice Hall, Upper Saddle River, NJ.
Hechenbichler, K., and K. Schliep. 2004.
“Weighted k-Nearest-Neighbor Techniques and Ordinal Classification.” Vol. 399. Sfb386. Institut f
ür Statistik, Ludwig-Maximilians-Universit
ät M
ünchen.
https://epub.ub.uni-muenchen.de/1769/1/paper_399.pdf.
Hendricks, Paul. 2015.
Titanic: Titanic Passenger Survival Data Set.
https://CRAN.R-project.org/package=titanic.
Hennig, Christian M., Fionn Murtagh, and Roberto Rocci. 2016. Handbook of Cluster Analysis. Chapman; Hall/CRC, Boca Raton, FL.
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989.
“Multilayer Feedforward Networks Are Universal Approximators.” Neural Networks 2 (5): 359–66.
https://doi.org/10.1016/0893-6080(89)90020-8.
Hvitfeldt, Emil. 2023. Themis: Extra Recipes Steps for Dealing with Unbalanced Data.
Hvitfeldt, Emil, Thomas Lin Pedersen, and Michaël Benesty. 2022.
Lime: Local Interpretable Model-Agnostic Explanations. Vignette: Understanding Lime.
https://cran.r-project.org/web/packages/lime/vignettes/Understanding_lime.html.
Hwang, Yoon Hyup. 2019. Hands-on Data Science for Marketing: Improve Your Marketing Strategies with Machine Learning Using Python and r. Packt, Birmingham, United Kingdom.
IBM. 2021. “Telco Customer Churn.” Online.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor. 2023.
An Introduction to Statistical Learning. Springer, New York, NY.
https://doi.org/10.1007/978-3-031-38747-0.
Jensen, J. D., J. Thurman, and A. L. Vincent. 2021.
Lightning Injuries. Online; Statpearls, Treasure Island, FL.
https://pubmed.ncbi.nlm.nih.gov/28722949/.
Kuhn, Max. 2008.
“Building Predictive Models in R Using the Caret Package.” Journal of Statistical Software 28 (5): 1–26.
https://doi.org/10.18637/jss.v028.i05.
Kuhn, Max, and Daniel Falbel. 2022.
Brulee: High-Level Modeling Functions with ’torch’.
https://CRAN.R-project.org/package=brulee.
Kuhn, Max, and Julia Silge. 2022.
Tidy Modeling with r. A Framework for Modeling in the Tidyverse. O’Reilly, Sebastopol, CA.
https://www.tmwr.org/.
Kuhn, Max, and Hadley Wickham. 2020.
Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. https://www.tidymodels.org.
Kurama, Vihar. 2018. “A Guide to AdaBoost: Boosting to Save the Day.” Paperspace Blog, Series: Ensemble Methods.
Lange, Carsten. 2003. Neuronale Netze in der Wirtschaftswissenschaftlichen Prognose und Modellgenerierung (Neural Networks in Economic Modeling). Physica, Heidelberg, Germany.
Lange, Carsten, and Jian Lange. 2022. “Applying Machine Learning and AI Explanations to Analyze Vaccine Hesitancy.” arXiv, January.
LeCun, Yann, Corinna Cortes, and Christopher J. C. Burges. 2005. “The MNIST Database of Handwritten Digits.” Online.
Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. 2019.
“Consistent Individualized Feature Attribution for Tree Ensembles.” arXiv.
https://arxiv.org/abs/1802.03888.
Lundberg, Scott, and Su-In Lee. 2017.
“A Unified Approach to Interpreting Model Predictions.” ArXiv.
https://arxiv.org/abs/1705.07874.
Lyer, Vijayasri. 2021.
“Behold: The Confusion Matrix. Not a Confusing Matrix Anymore.” Medium.
https://vijayasriiyer.medium.com/behold-the-confusion-matrix-10afd3feb603.
Maksymiuk, Szymon, Alicja Gosiewska, and Przemyslaw Biecek. 2020.
“Landscape of R Packages for Explainable Artificial Intelligence.” arXiv.
https://arxiv.org/abs/2009.13248.
McCulloch, Warren S., and Walter Pitts. 1943.
“A Logical Calculus of the Ideas Immanent in Nervous Activity.” The Bulletin of Mathematical Biophysics 5 (4): 115–33.
https://doi.org/10.1007/bf02478259.
McDonald, John F., and Robert A. Moffitt. 1980.
“The Uses of Tobit Analysis.” The Review of Economics and Statistics 62 (2): 318.
https://doi.org/10.2307/1924766.
Milborrow, Stephen. 2022.
Rpart.plot: Plot ’Rpart’ Models: An Enhanced Version of ’Plot.rpart’.
https://CRAN.R-project.org/package=rpart.plot.
Mohajon, Joydwip. 2020.
“Confusion Matrix for Your Multi-Class Machine Learning Model.” Towards Data Science.
https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826.
Molnar, Christoph. 2020. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. Second. Independently published.
Morde, Vishal. 2019. “XGBoost Algorithm: Long May She Reign!” Towards Data Science, April.
Narula, Sabhash C. 1979.
“Orthogonal Polynomial Regression.” International Statistical Review/Revue Internationale de Statistique 47 (1): 31–36.
http://www.jstor.org/stable/1403204.
O‘Sullivan, Conor. 2022a.
“KernelSHAP Vs TreeSHAP.” Towards Data Science, July.
https://towardsdatascience.com/kernelshap-vs-treeshap-e00f3b3a27db.
———. 2022b.
“From Shapley to SHAP — Understanding the Math.” Towards Data Science, August.
https://towardsdatascience.com/from-shapley-to-shap-understanding-the-math-e7155414213b.
Park, A. et al. 2021.
“Presidential Precinct Data for the 2020 General Election.” Edited by New York Times.
New York Times, April.
https://github.com/TheUpshot/presidential-precinct-map-2020.
Pramoditha, Rukshan. 2021.
“Can LightGBM Outperform XGBoost? Boosting Algorithms in Machine Learning — Part 5.” Towards Data Science, October.
https://towardsdatascience.com/can-lightgbm-outperform-xgboost-d05a94102a55.
R Core Team. 2022.
R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
https://www.R-project.org/.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016.
“"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” arXiv, February.
https://doi.org/10.48550/ARXIV.1602.04938.
RStudio Team. 2022.
RStudio: Integrated Development Environment for r. Boston, MA: RStudio, Inc.
http://www.rstudio.com/.
Sanchez, Gaston. 2021.
Handling Strings with r. Leanpub, Victoria, Canada.
https://www.gastonsanchez.com/r4strings/.
Schliep, Klaus, and Klaus Hechenbichler. 2016.
Kknn: Weighted k-Nearest Neighbors.
https://CRAN.R-project.org/package=kknn.
Shapley, L. S. 1953.
“A Value for n-Person Games.” In
Contributions to the Theory of Games (AM-28), Volume II, 307–18. Princeton University Press, Princeton, NJ.
https://doi.org/10.1515/9781400881970-018.
Sharma, Abhishek. 2020.
“4 Simple Ways to Split a Decision Tree in Machine Learning.” Analytics Vidhya, June.
https://www.analyticsvidhya.com/blog/2020/06/4-ways-split-decision-tree/.
Shea, Justin M. 2023.
Wooldridge: 115 Data Sets from "Introductory Econometrics: A Modern Approach, 7e" by Jeffrey m. Wooldridge.
https://CRAN.R-project.org/package=wooldridge.
Singh, Himanshi. 2021.
“How to Select Best Split in Decision Trees Using Gini Impurity.” Analytics Vidhye, March.
https://www.analyticsvidhya.com/blog/2021/03/how-to-select-best-split-in-decision-trees-gini-impurity/.
Tay, J. Kenneth, Balasubramanian Narasimhan, and Trevor Hastie. 2023.
“Elastic Net Regularization Paths for All Generalized Linear Models.” Journal of Statistical Software 106 (1): 1–31.
https://doi.org/10.18637/jss.v106.i01.
Therneau, Terry, and Beth Atkinson. 2022.
Rpart: Recursive Partitioning and Regression Trees.
https://CRAN.R-project.org/package=rpart.
Venables, W. N., and B. D. Ripley. 2002.
Modern Applied Statistics with s. Fourth. Springer, New York, NY.
https://www.stats.ox.ac.uk/pub/MASS4/.
Verhulst, Pierre-François. 1845. “Recherches Mathématiques Sur La Loi D’accroissement De La Population” 18: 2013–15.
Voigt, Stefan, Patrick Weiss, and Christoph Scheuch. 2023.
Tidy Finance with R. Chapman; Hall/
CRC, Boca Raton, FL.
https://doi.org/10.1201/b23237.
Wang, Chi-Feng. 2019.
“The Vanishing Gradient Problem. Its Causes, Its Significance, and Its Solutions.” Towards Data Science.
https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484.
Wickham, Hadley. 2016.
Ggplot2: Elegant Graphics for Data Analysis. Springer, New York, NY.
https://ggplot2.tidyverse.org.
———. 2019. Advanced r. Chapman; Hall/CRC, Boca Raton, FL.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy DAgostino McGowan, Romain Francois, Garrett Grolemund, et al. 2019.
“Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686.
https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Garrett Grolemund. 2017.
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly, Sebastopol, CA.
https://r4ds.had.co.nz/index.html.
Wikipedia contributors. 2023a.
“Camel Case.” Wikipedia, the Free Encyclopedia.
https://en.wikipedia.org/w/index.php?title=Camel_case&oldid=1188598129.
Wong, Kay Jan. 2023.
“6 Types of Clustering Methods — an Overview.” Towards Data Science, March.
https://towardsdatascience.com/6-types-of-clustering-methods-an-overview-7522dba026ca.
Wooldridge, Jeffrey Marc. 2020.
Introductory Econometrics: A Modern Approach. Seventh. Cengage Learning, Boston, MA.
http://books.google.ch/books?id=64vt5TDBNLwC.
Wright, Marvin N., and Andreas Ziegler. 2017.
“ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1): 1–17.
https://doi.org/10.18637/jss.v077.i01.
Yıldırım, Soner. 2020.
“Gradient Boosted Decision Trees-Explained.” Towards Data Science, February.
https://towardsdatascience.com/gradient-boosted-decision-trees-explained-9259bd8205af.
Zhu, Hao. 2021.
kableExtra: Construct Complex Table with ’kable’ and Pipe Syntax.
https://CRAN.R-project.org/package=kableExtra.