Household Electric Power Consumption
- dataset_hub.timeseries.datasets.get_household_power(verbose=None)[source]
Load and return the Individual Household Electric Power Consumption dataset.
Measurements of electric power consumption in a single household with a one-minute sampling rate over a period of almost 4 years (December 2006 – November 2010). Each record contains several electrical and sub-metering measurements.
This dataset is designed for minute-level analysis of total household energy consumption, capturing overall usage patterns of a single home.
Original dataset: This dataset is available on the UCI Machine Learning Repository: Individual Household Electric Power Consumption
Columns:
Date(str): Date of measurement in format dd/mm/yyyyTime(str): Time of measurement in format hh:mm:ssGlobal_active_power(float): Household global minute-averaged active power (kilowatt)Global_reactive_power(float): Household global minute-averaged reactive power (kilowatt)Voltage(float): Minute-averaged voltage (volt)Sub_metering_1(float): Energy sub-metering No. 1 (kitchen: dishwasher, oven, microwave; watt-hour of active energy)Sub_metering_2(float): Energy sub-metering No. 2 (laundry room: washing machine, tumble-drier, refrigerator, light; watt-hour)Sub_metering_3(float): Energy sub-metering No. 3 (electric water-heater, air-conditioner; watt-hour)Global_intensity🚩 (float): Household global minute-averaged current intensity (ampere)
Notes
Missing values are present in approximately 1.25% of the rows.
Active energy not covered by sub-meterings 1–3 can be calculated as:
(Global_active_power*1000/60 - Sub_metering_1 - Sub_metering_2 - Sub_metering_3)in watt-hour.
- Parameters:
verbose (bool, optional) – If True, the function prints a link to the dataset documentation in the log output after loading. (e.g., on this page) Default is None, which uses the global Library Settings.
- Returns:
The household power consumption dataset with all features.
- Return type:
pandas.DataFrame
Quick Start:
from dataset_hub.timeseries import get_household_power df = get_household_power()
Baseline
import pandas as pd
from sklearn.linear_model import LinearRegression
from dataset_hub.timeseries import get_household_power
# Load dataset
df = get_household_power()
df.head()
| Global_active_power | Global_reactive_power | Voltage | Global_intensity | Sub_metering_1 | Sub_metering_2 | Sub_metering_3 | datetime | |
|---|---|---|---|---|---|---|---|---|
| 0 | 4.216 | 0.418 | 234.84 | 18.4 | 0.0 | 1.0 | 17.0 | 2006-12-16 17:24:00 |
| 1 | 5.360 | 0.436 | 233.63 | 23.0 | 0.0 | 1.0 | 16.0 | 2006-12-16 17:25:00 |
| 2 | 5.374 | 0.498 | 233.29 | 23.0 | 0.0 | 2.0 | 17.0 | 2006-12-16 17:26:00 |
| 3 | 5.388 | 0.502 | 233.74 | 23.0 | 0.0 | 1.0 | 17.0 | 2006-12-16 17:27:00 |
| 4 | 3.666 | 0.528 | 235.68 | 15.8 | 0.0 | 1.0 | 17.0 | 2006-12-16 17:28:00 |
Monthly data preparation
# Combine Date and Time into datetime
df["datetime"] = pd.to_datetime(df["Date"] + " " + df["Time"], dayfirst=True)
# Drop original 'Date' and 'Time' columns as they are no longer needed for aggregation
df = df.drop(columns=["Date", "Time"])
# Aggregate to monthly mean
df_monthly = df.resample("ME", on="datetime").mean()
# Drop last November 2010
df_monthly = df_monthly.iloc[:-1]
print("Dropped last month (November 2010) because data is missing for days 27–30.")
Dropped last month (November 2010) because data is missing for days 27–30.
X & y preparation
# Target series
y = df_monthly["Global_active_power"]
# Lag features (12 months)
X = pd.concat([y.shift(i) for i in range(1, 13)], axis=1)
X.columns = [f"lag_{i}" for i in range(1, 13)]
# Drop missing
X, y = X.dropna(), y.loc[X.dropna().index]
Forecasting & evaluation
# Train on all except last month
model = LinearRegression().fit(X[:-1], y[:-1])
# Next month forecast
y_pred = model.predict(X[-1:])[0]
# Compute percentage error
percentage_error = abs((y.iloc[-1] - y_pred) / y.iloc[-1]) * 100
# Note: This is NOT a classical MAPE or robust model evaluation.
# We are showing the percentage error on a single month forecast as an illustrative example only.
print(f"Percentage error for next month forecast: {percentage_error:.2f}%")
Percentage error for next month forecast: 1.22%