dataset_hub._core.get_data

dataset_hub._core.get_data.get_data(dataset_name, task_type, verbose)[source]

Core backend function used by all .get_<dataset_name>() functions to load datasets.

This function:
  1. Loads the dataset configuration using ConfigFactory.

  2. Instantiates the appropriate Provider via dataset_hub._core.provider.ProviderFactory.

  3. Loads the dataset using dataset_hub._core.provider.

  4. (optional) Logs a link to the dataset documentation once per session if verbose is enabled (either via argument or Library Settings).

Parameters:
  • dataset_name (str) – The name of the dataset (corresponding to the YAML config file).

  • task_type (str) – The type of task (e.g., “classification”, “regression”).

  • verbose (bool, optional) – Whether to print dataset information and documentation link. If None, the global library setting is used.

Returns:

A consistent wrapper containing the loaded data.

Example:

dataset = get_data("titanic", "classification")
df = dataset["data"]  # pd.DataFrame

Return type:

DataBundle

Raises:
  • FileNotFoundError – If the dataset configuration YAML file is not found.

  • ValueError – If the provider type is unknown or misconfigured.