Appendix B — Datasets

In general, it is better to stay away from datasets on Kaggle, the UCI Machine Learning Repository, and other commonly used options. And for the papers in Appendix D, you must not use a dataset from either of those sources. From a data science perspective, using a dataset as it is available from such a source means that almost all the important decisions have been already made, and are potentially undocumented. And from a career perspective, it does not set your portfolio apart because everyone else just uses these datasets. Some alternatives include: