dataset_hub._core.provider.DataFrameProvider
- class dataset_hub._core.provider.dataframe_provider.DataFrameProvider(config)[source]
Bases:
Provider[DataFrame]Provider that loads a dataset from a source (URL or file) and returns it as a pandas DataFrame.
Regardless of the underlying file format, the output is always returned as:
{“data”: pandas.DataFrame}
Supported formats depend on the implementation of read_dataframe.
- Parameters:
config (Dict[str, Any]) –
- ConfigClass
alias of
DataFrameProviderConfig
- load()[source]
Fetch and load the dataset specified in the configuration.
- Returns:
The loaded pandas DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the file cannot be read or the format is unsupported.
- read_dataframe(path_or_url, format, read_kwargs)[source]
Universal function to read a DataFrame from various file formats.
- Parameters:
path_or_url (str) – Local file path or URL to the data.
format (str) – Data format (‘csv’, ‘parquet’, ‘excel’, ‘json’).
read_kwargs (dict, optional) – Additional parameters to pass to the corresponding pandas reader function.
- Returns:
Loaded DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the specified format is not supported.
- class dataset_hub._core.provider.dataframe_provider.DataFrameProviderConfig(source, read_kwargs=<factory>)[source]
Bases:
ProviderConfigConfiguration schema for DataFrameProvider.
- Parameters:
source (Dict[str, Any]) –
read_kwargs (Dict[str, Any]) –
- source
Source configuration with type, url, and format.
- Type:
Dict[str, Any] | SourceConfig
- read_kwargs
Optional keyword arguments forwarded directly to the corresponding pandas reader.
- Type:
Dict[str, Any]