dataset_hub._core.get_data
- dataset_hub._core.get_data.get_data(dataset_name, task_type, verbose)[source]
Core backend function used by all .get_<dataset_name>() functions to load datasets.
- This function:
Loads the dataset configuration using ConfigFactory.
Instantiates the appropriate Provider via dataset_hub._core.provider.ProviderFactory.
Loads the dataset using dataset_hub._core.provider.
(optional)Logs a link to the dataset documentation once per session if verbose is enabled (either via argument or Library Settings).
- Parameters:
dataset_name (str) – The name of the dataset (corresponding to the YAML config file).
task_type (str) – The type of task (e.g., “classification”, “regression”).
verbose (bool, optional) – Whether to print dataset information and documentation link. If None, the global library setting is used.
- Returns:
A consistent wrapper containing the loaded data.
Example:
dataset = get_data("titanic", "classification") df = dataset["data"] # pd.DataFrame
- Return type:
DataBundle
- Raises:
FileNotFoundError – If the dataset configuration YAML file is not found.
ValueError – If the provider type is unknown or misconfigured.