Online Appendix D — Datasets

One thing students often struggle with is picking a dataset. In general, it is better to stay away from datasets on Kaggle, the UCI Machine Learning Repository, and other commonly used options. From a data science perspective, using a dataset as it is available from such a source means that almost all the important decisions have been already made, and are potentially undocumented. And from a career perspective, it does not set your portfolio apart because everyone else just uses these datasets. Some alternatives include: