Set up your development environment

Fork the dataset-hub repository

First, you need to create an account on GitHub (if you do not already have one) and fork the project repository by clicking on the ‘Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub user account. For more details on how to fork a repository see this guide.

The following steps explain how to set up a local clone of your forked git repository and how to locally install dataset-hub according to your operating system.

Set up a local clone of your fork

Clone your fork of the dataset-hub repo from your GitHub account to your local disk:

git clone https://github.com/YOUR_USERNAME/dataset-hub.git

and change into that directory:

cd dataset-hub

Set up a environment and install dependencies

We recommend using pip and virtual environments to manage your Python dependencies. First, create and activate a virtual environment. For example, using venv:

python -m venv venv

Activate the virtual environment:

source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Then, install the projects and required dependencies:

pip install -e . # Install dataset-hub project with dependencies

To check that everything is set up correctly, try to import dataset-hub in a Python file or notebook:

import dataset_hub