hfutils.index.validate
hf_tar_item_validate
- hfutils.index.validate.hf_tar_item_validate(file_item: RepoFile, size: int, hash_: str | None = None, hash_lfs: str | None = None)[source]
Validate a file item in a tar archive.
This function checks if the file item matches the expected size and hash.
- Parameters:
file_item (RepoFile) – The file item from the Hugging Face repository.
size (int) – The expected size of the file.
hash (str, optional) – The expected SHA-1 hash of the file.
hash_lfs (str, optional) – The expected SHA-256 hash of the file if stored in LFS.
- Returns:
True if the file item is valid, False otherwise.
- Return type:
bool
hf_tar_validate
- hfutils.index.validate.hf_tar_validate(repo_id: str, archive_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, hf_token: str | None = None)[source]
Validate a tar archive in a Hugging Face repository.
This function validates if the tar archive in the Hugging Face repository matches the expected size and hash.
Note
This function is based on Huggingface API and hash information in index files, no tar file will be downloaded.
- Parameters:
repo_id (str) – The ID of the Hugging Face repository.
archive_in_repo (str) – The path to the tar archive in the repository.
repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository, defaults to ‘dataset’.
revision (str, optional) – The revision of the repository, defaults to ‘main’.
idx_repo_id (Optional[str], optional) – The ID of the repository where the index file is stored.
idx_file_in_repo (Optional[str], optional) – The path to the index file in the repository.
idx_repo_type (Optional[RepoTypeTyping], optional) – The type of the repository where the index file is stored.
idx_revision (Optional[str], optional) – The revision of the repository where the index file is stored.
hf_token (Optional[str], optional) – The Hugging Face token for authentication, defaults to None.
- Raises:
EntryNotFoundError – If the specified entry is not found in the repository.
IsADirectoryError – If the specified entry is a directory.
- Returns:
True if the tar archive is valid, False otherwise.
- Return type:
bool
Note
If this function returns False, it means the json index is expired and need to be re-generated.
So this function and
hfutils.index.make.hf_tar_create_index()
can be used together to gracefully refresh an indexed tar dataset.