hfutils.index.validate

hf_tar_item_validate

hfutils.index.validate.hf_tar_item_validate(file_item: RepoFile, size: int, hash_: str | None = None, hash_lfs: str | None = None)[source]

Validate a file item in a tar archive.

This function checks if the file item matches the expected size and hash.

Parameters:
  • file_item (RepoFile) – The file item from the Hugging Face repository.

  • size (int) – The expected size of the file.

  • hash (str, optional) – The expected SHA-1 hash of the file.

  • hash_lfs (str, optional) – The expected SHA-256 hash of the file if stored in LFS.

Returns:

True if the file item is valid, False otherwise.

Return type:

bool

hf_tar_validate

hfutils.index.validate.hf_tar_validate(repo_id: str, archive_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, hf_token: str | None = None)[source]

Validate a tar archive in a Hugging Face repository.

This function validates if the tar archive in the Hugging Face repository matches the expected size and hash.

Note

This function is based on Huggingface API and hash information in index files, no tar file will be downloaded.

Parameters:
  • repo_id (str) – The ID of the Hugging Face repository.

  • archive_in_repo (str) – The path to the tar archive in the repository.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository, defaults to ‘dataset’.

  • revision (str, optional) – The revision of the repository, defaults to ‘main’.

  • idx_repo_id (Optional[str], optional) – The ID of the repository where the index file is stored.

  • idx_file_in_repo (Optional[str], optional) – The path to the index file in the repository.

  • idx_repo_type (Optional[RepoTypeTyping], optional) – The type of the repository where the index file is stored.

  • idx_revision (Optional[str], optional) – The revision of the repository where the index file is stored.

  • hf_token (Optional[str], optional) – The Hugging Face token for authentication, defaults to None.

Raises:
  • EntryNotFoundError – If the specified entry is not found in the repository.

  • IsADirectoryError – If the specified entry is a directory.

Returns:

True if the tar archive is valid, False otherwise.

Return type:

bool

Note

If this function returns False, it means the json index is expired and need to be re-generated.

So this function and hfutils.index.make.hf_tar_create_index() can be used together to gracefully refresh an indexed tar dataset.