dataset_hub._core.provider.Provider
- class dataset_hub._core.provider.provider.Provider(config)[source]
Bases:
ABC,Generic[UserDataT]Abstract base class for all data providers.
A provider loads a dataset from some source (URL, file, built-in dataset, etc.) according to a configuration model defined in ConfigClass.
- The provider lifecycle consists of:
Normalization — convert a raw dict into a validated config dict using the dataclass ConfigClass.
Optional transformation — post-process or enrich the normalized config.
Data loading — implemented in load().
- Parameters:
config (Dict[str, Any]) –
- config
The validated and optionally transformed configuration dictionary.
- Type:
Dict[str, Any]
- ConfigClass
A dataclass defining the structure of the provider’s configuration. Must be overridden by subclasses.
- Type:
Type[ProviderConfig]
- abstract load()[source]
Load and return the dataset according to the provider’s configuration.
This method must be implemented by all concrete providers.
- Returns:
The loaded dataset object. Typically a pd.DataFrame for single-table datasets, but can be any data type (e.g., dict, list, graph, array).
- Return type:
Any
- class dataset_hub._core.provider.provider.ProviderConfig[source]
Bases:
objectBase class for all provider configuration models.
Concrete provider configurations must inherit from this class. These dataclasses define the structure, defaults, and type hints for a provider’s configuration, and are used by the Provider class to validate and normalize incoming config dictionaries.