Household Electric Power Consumption

dataset_hub.timeseries.datasets.get_household_power(verbose=None)[source]

Load and return the Individual Household Electric Power Consumption dataset.

Measurements of electric power consumption in a single household with a one-minute sampling rate over a period of almost 4 years (December 2006 – November 2010). Each record contains several electrical and sub-metering measurements.

This dataset is designed for minute-level analysis of total household energy consumption, capturing overall usage patterns of a single home.

Original dataset: This dataset is available on the UCI Machine Learning Repository: Individual Household Electric Power Consumption

Columns:

Date (str): Date of measurement in format dd/mm/yyyy
Time (str): Time of measurement in format hh:mm:ss
Global_active_power (float): Household global minute-averaged active power (kilowatt)
Global_reactive_power (float): Household global minute-averaged reactive power (kilowatt)
Voltage (float): Minute-averaged voltage (volt)
Sub_metering_1 (float): Energy sub-metering No. 1 (kitchen: dishwasher, oven, microwave; watt-hour of active energy)
Sub_metering_2 (float): Energy sub-metering No. 2 (laundry room: washing machine, tumble-drier, refrigerator, light; watt-hour)
Sub_metering_3 (float): Energy sub-metering No. 3 (electric water-heater, air-conditioner; watt-hour)
Global_intensity 🚩 (float): Household global minute-averaged current intensity (ampere)

Notes

Missing values are present in approximately 1.25% of the rows.
Active energy not covered by sub-meterings 1–3 can be calculated as: (Global_active_power*1000/60 - Sub_metering_1 - Sub_metering_2 - Sub_metering_3) in watt-hour.

Parameters:: verbose (bool, optional) – If True, the function prints a link to the dataset documentation in the log output after loading. (e.g., on this page) Default is None, which uses the global Library Settings.
Returns:: The household power consumption dataset with all features.
Return type:: pandas.DataFrame

Quick Start:

from dataset_hub.timeseries import get_household_power

df = get_household_power()

Baseline

import pandas as pd
from sklearn.linear_model import LinearRegression

from dataset_hub.timeseries import get_household_power

# Load dataset
df = get_household_power()
df.head()

	Global_active_power	Global_reactive_power	Voltage	Global_intensity	Sub_metering_2	Sub_metering_3	datetime
0	4.216	0.418	234.84	18.4	1.0	17.0	2006-12-16 17:24:00
1	5.360	0.436	233.63	23.0	1.0	16.0	2006-12-16 17:25:00
2	5.374	0.498	233.29	23.0	2.0	17.0	2006-12-16 17:26:00
3	5.388	0.502	233.74	23.0	1.0	17.0	2006-12-16 17:27:00
4	3.666	0.528	235.68	15.8	1.0	17.0	2006-12-16 17:28:00

Monthly data preparation

# Combine Date and Time into datetime
df["datetime"] = pd.to_datetime(df["Date"] + " " + df["Time"], dayfirst=True)

# Drop original 'Date' and 'Time' columns as they are no longer needed for aggregation
df = df.drop(columns=["Date", "Time"])

# Aggregate to monthly mean
df_monthly = df.resample("ME", on="datetime").mean()

# Drop last November 2010
df_monthly = df_monthly.iloc[:-1]
print("Dropped last month (November 2010) because data is missing for days 27–30.")

Dropped last month (November 2010) because data is missing for days 27–30.

X & y preparation

# Target series
y = df_monthly["Global_active_power"]

# Lag features (12 months)
X = pd.concat([y.shift(i) for i in range(1, 13)], axis=1)
X.columns = [f"lag_{i}" for i in range(1, 13)]

# Drop missing
X, y = X.dropna(), y.loc[X.dropna().index]

Forecasting & evaluation

# Train on all except last month
model = LinearRegression().fit(X[:-1], y[:-1])

# Next month forecast 
y_pred = model.predict(X[-1:])[0]

# Compute percentage error
percentage_error = abs((y.iloc[-1] - y_pred) / y.iloc[-1]) * 100

# Note: This is NOT a classical MAPE or robust model evaluation.
# We are showing the percentage error on a single month forecast as an illustrative example only.
print(f"Percentage error for next month forecast: {percentage_error:.2f}%")

Percentage error for next month forecast: 1.22%