DatasetHub
DatasetHub provides a unified, simple API for loading machine learning datasets
through functions like get_<dataset>() — all returned in familiar formats.
Note
New here? Start with the Quick Start to load your first dataset.
The goal is to remove unnecessary friction when exploring new ML tasks and give both beginners and practitioners a consistent, predictable way to access well-documented datasets with ready-to-run baselines.
What DatasetHub solves
Working with open datasets is often harder than it should be. DatasetHub reduces this overhead by offering:
One unified API —
get_<dataset>()for any supported task.Easy entry point for students and beginners.
Detailed documentation with descriptions, sources, and examples.
Consistent structure across datasets (columns, target, metadata, baseline).
Starter baselines so you can experiment immediately.
Supported datasets
See all supported datasets here: Datasets
Or jump directly to the quick examples: