hfutils.operate.download

This module provides functions for downloading files and directories from Hugging Face repositories.

It includes utilities for downloading individual files, archives, and entire directories, with support for concurrent downloads, retries, and progress tracking.

The module interacts with the Hugging Face Hub API to fetch repository contents and download files, handling various repository types and revisions.

Key features:

  • Download individual files from Hugging Face repositories

  • Download and extract archive files

  • Download entire directories with pattern matching and ignore rules

  • Concurrent downloads with configurable worker count

  • Retry mechanism for failed downloads

  • Progress tracking with tqdm

  • Support for different repository types (dataset, model, space)

  • Token-based authentication for accessing private repositories

This module is particularly useful for managing and synchronizing local copies of Hugging Face repository contents, especially when dealing with large datasets or models.

download_file_to_file

hfutils.operate.download.download_file_to_file(local_file: str, repo_id: str, file_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', hf_token: str | None = None)[source]

Download a file from a Hugging Face repository and save it to a local file.

Parameters:
  • local_file (str) – The local file path to save the downloaded file.

  • repo_id (str) – The identifier of the repository.

  • file_in_repo (str) – The file path within the repository.

  • repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).

  • revision (str) – The revision of the repository (e.g., branch, tag, commit hash).

  • hf_token (str, optional) – Huggingface token for API client, use HF_TOKEN variable if not assigned.

download_archive_as_directory

hfutils.operate.download.download_archive_as_directory(local_directory: str, repo_id: str, file_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', password: str | None = None, hf_token: str | None = None)[source]

Download an archive file from a Hugging Face repository and extract it to a local directory.

Parameters:
  • local_directory (str) – The local directory path to extract the downloaded archive.

  • repo_id (str) – The identifier of the repository.

  • file_in_repo (str) – The file path within the repository.

  • repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).

  • revision (str) – The revision of the repository (e.g., branch, tag, commit hash).

  • password (str, optional) – The password of the archive file.

  • hf_token (str, optional) – Huggingface token for API client, use HF_TOKEN variable if not assigned.

download_directory_as_directory

hfutils.operate.download.download_directory_as_directory(local_directory: str, repo_id: str, dir_in_repo: str = '.', pattern: str = '**/*', repo_type: ~typing.Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', silent: bool = False, ignore_patterns: ~typing.List[str] = <object object>, max_workers: int = 8, max_retries: int = 5, soft_mode_when_check: bool = False, hf_token: str | None = None)[source]

Download all files in a directory from a Hugging Face repository to a local directory.

Parameters:
  • local_directory (str) – The local directory path to save the downloaded files.

  • repo_id (str) – The identifier of the repository.

  • dir_in_repo (str) – The directory path within the repository.

  • pattern (str) – Patterns for filtering.

  • repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).

  • revision (str) – The revision of the repository (e.g., branch, tag, commit hash).

  • silent (bool) – If True, suppress progress bar output.

  • ignore_patterns (List[str]) – List of file patterns to ignore.

  • max_workers (int) – Max workers when downloading. Default is 8.

  • max_retries (int) – Max retry times when downloading. Default is 5.

  • soft_mode_when_check (bool) – Just check the size of the expected file when enabled. Default is False.

  • hf_token (str, optional) – Huggingface token for API client, use HF_TOKEN variable if not assigned.