dataset_hub._core.provider.Provider

class dataset_hub._core.provider.provider.Provider(config)[source]

Bases: ABC, Generic[UserDataT]

Abstract base class for all data providers.

A provider loads a dataset from some source (URL, file, built-in dataset, etc.) according to a configuration model defined in ConfigClass.

The provider lifecycle consists of:
  1. Normalization — convert a raw dict into a validated config dict using the dataclass ConfigClass.

  2. Optional transformation — post-process or enrich the normalized config.

  3. Data loading — implemented in load().

Parameters:

config (Dict[str, Any]) –

config

The validated and optionally transformed configuration dictionary.

Type:

Dict[str, Any]

ConfigClass

A dataclass defining the structure of the provider’s configuration. Must be overridden by subclasses.

Type:

Type[ProviderConfig]

abstract load()[source]

Load and return the dataset according to the provider’s configuration.

This method must be implemented by all concrete providers.

Returns:

The loaded dataset object. Typically a pd.DataFrame for single-table datasets, but can be any data type (e.g., dict, list, graph, array).

Return type:

Any

class dataset_hub._core.provider.provider.ProviderConfig[source]

Bases: object

Base class for all provider configuration models.

Concrete provider configurations must inherit from this class. These dataclasses define the structure, defaults, and type hints for a provider’s configuration, and are used by the Provider class to validate and normalize incoming config dictionaries.