Appendix B — Datasets

In general, it is better to stay away from datasets on Kaggle, the UCI Machine Learning Repository, and other commonly used options. From a data science perspective, using a dataset as it is available from such a source means that almost all the important decisions have been already made, and are potentially undocumented. And from a career perspective, it does not set your portfolio apart because everyone else just uses these datasets. Some alternatives include: