hfutils.index.fetch

ArchiveStandaloneFileIncompleteDownload

class hfutils.index.fetch.ArchiveStandaloneFileIncompleteDownload[source]

Exception raised when a standalone file in an archive is incompletely downloaded.

ArchiveStandaloneFileHashNotMatch

class hfutils.index.fetch.ArchiveStandaloneFileHashNotMatch[source]

Exception raised when the hash of a standalone file in an archive does not match.

hf_tar_list_files

hfutils.index.fetch.hf_tar_list_files(repo_id: str, archive_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, hf_token: str | None = None)[source]

List files inside a tar archive file in a Hugging Face repository.

Parameters:
  • repo_id (str) – The identifier of the repository.

  • archive_in_repo (str) – The path to the archive file in the repository.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository.

  • revision (str, optional) – The revision of the repository.

  • idx_repo_id (str, optional) – The identifier of the index repository.

  • idx_file_in_repo (str, optional) – The path to the index file in the index repository.

  • idx_repo_type (RepoTypeTyping, optional) – The type of the index repository.

  • idx_revision (str, optional) – The revision of the index repository.

  • hf_token (str, optional) – The Hugging Face access token.

Returns:

The list of files inside the tar archive.

Return type:

List[str]

hf_tar_file_download

hfutils.index.fetch.hf_tar_file_download(repo_id: str, archive_in_repo: str, file_in_archive: str, local_file: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, proxies: Dict | None = None, user_agent: Dict | str | None = None, headers: Dict[str, str] | None = None, endpoint: str | None = None, hf_token: str | None = None)[source]

Download a file from a tar archive file in a Hugging Face repository.

Parameters:
  • repo_id (str) – The identifier of the repository.

  • archive_in_repo (str) – The path to the archive file in the repository.

  • file_in_archive (str) – The path to the file inside the archive.

  • local_file (str) – The path to save the downloaded file locally.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository.

  • revision (str, optional) – The revision of the repository.

  • idx_repo_id (str, optional) – The identifier of the index repository.

  • idx_file_in_repo (str, optional) – The path to the index file in the index repository.

  • idx_repo_type (RepoTypeTyping, optional) – The type of the index repository.

  • idx_revision (str, optional) – The revision of the index repository.

  • proxies (Dict, optional) – The proxies to be used for the HTTP request.

  • user_agent (Union[Dict, str, None], optional) – The user agent for the HTTP request.

  • headers (Dict[str, str], optional) – The additional headers for the HTTP request.

  • endpoint (str, optional) – The Hugging Face API endpoint.

  • hf_token (str, optional) – The Hugging Face access token.

hf_tar_get_index

hfutils.index.fetch.hf_tar_get_index(repo_id: str, archive_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, hf_token: str | None = None)[source]

Get the index of a tar archive file in a Hugging Face repository.

Parameters:
  • repo_id (str) – The identifier of the repository.

  • archive_in_repo (str) – The path to the archive file in the repository.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository.

  • revision (str, optional) – The revision of the repository.

  • idx_repo_id (str, optional) – The identifier of the index repository.

  • idx_file_in_repo (str, optional) – The path to the index file in the index repository.

  • idx_repo_type (RepoTypeTyping, optional) – The type of the index repository.

  • idx_revision (str, optional) – The revision of the index repository.

  • hf_token (str, optional) – The Hugging Face access token.

Returns:

The index of the tar archive file.

Return type:

Dict

hf_tar_file_exists

hfutils.index.fetch.hf_tar_file_exists(repo_id: str, archive_in_repo: str, file_in_archive: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, hf_token: str | None = None)[source]

Check if a file exists inside a tar archive file in a Hugging Face repository.

Parameters:
  • repo_id (str) – The identifier of the repository.

  • archive_in_repo (str) – The path to the archive file in the repository.

  • file_in_archive (str) – The path to the file inside the archive.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository.

  • revision (str, optional) – The revision of the repository.

  • idx_repo_id (str, optional) – The identifier of the index repository.

  • idx_file_in_repo (str, optional) – The path to the index file in the index repository.

  • idx_repo_type (RepoTypeTyping, optional) – The type of the index repository.

  • idx_revision (str, optional) – The revision of the index repository.

  • hf_token (str, optional) – The Hugging Face access token.

Returns:

True if the file exists, False otherwise.

Return type:

bool