Iris

dataset_hub.classification.datasets.get_iris(verbose=None)[source]

Load and return the Iris dataset (classification).

A classic multiclass classification dataset containing measurements of iris flowers from three different species.

Original dataset: UCI Iris

Columns:

  • sepal_length (float): length of the sepal in cm

  • sepal_width (float): width of the sepal in cm

  • petal_length (float): length of the petal in cm

  • petal_width (float): width of the petal in cm

  • species 🚩 (str): target variable, species name (setosa, versicolor, virginica)

Parameters:

verbose (bool, optional) – If True, the function prints a link to the dataset documentation in the log output after loading. (e.g., on this page) Default is None, which uses the global Library Settings.

Returns:

The Iris dataset with all features including the target.

Return type:

pandas.DataFrame

Quick Start:

from dataset_hub.classification import get_iris

df = get_iris()

Baseline

Open In Colab

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from dataset_hub.classification import get_iris

# Get iris dataset
df = get_iris()
df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
# Separate target variable (y) and features (X)
y = df["species"]
X = df.drop("species", axis=1)

# Split data into train and test parts
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Create and train the model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", round(accuracy, 3))
Accuracy: 0.967