Iris
- dataset_hub.classification.datasets.get_iris(verbose=None)[source]
Load and return the Iris dataset (classification).
A classic multiclass classification dataset containing measurements of iris flowers from three different species.
Original dataset: UCI Iris
Columns:
sepal_length(float): length of the sepal in cmsepal_width(float): width of the sepal in cmpetal_length(float): length of the petal in cmpetal_width(float): width of the petal in cmspecies🚩 (str): target variable, species name (setosa, versicolor, virginica)
- Parameters:
verbose (bool, optional) – If True, the function prints a link to the dataset documentation in the log output after loading. (e.g., on this page) Default is None, which uses the global Library Settings.
- Returns:
The Iris dataset with all features including the target.
- Return type:
pandas.DataFrame
Quick Start:
from dataset_hub.classification import get_iris df = get_iris()
Baseline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from dataset_hub.classification import get_iris
# Get iris dataset
df = get_iris()
df.head()
| sepal_length | sepal_width | petal_length | petal_width | species | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
# Separate target variable (y) and features (X)
y = df["species"]
X = df.drop("species", axis=1)
# Split data into train and test parts
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Create and train the model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", round(accuracy, 3))
Accuracy: 0.967