hfutils.index.make

tar_create_index

hfutils.index.make.tar_create_index(src_tar_file, dst_index_file: str | None = None, chunk_for_hash: int = 1048576, with_hash: bool = True, silent: bool = False)[source]

Create an index file for a tar archive file.

Parameters:
  • src_tar_file (str) – The path to the source tar archive file.

  • dst_index_file (str, optional) – The path to save the index file, defaults to None.

  • chunk_for_hash (int, optional) – The chunk size for hashing, defaults to 1 << 20 (1 MB).

  • with_hash (bool, optional) – Whether to include file hashes in the index, defaults to True.

  • silent (bool, optional) – Whether to suppress progress bars and logging messages, defaults to False.

Returns:

The path to the created index file.

Return type:

str

hf_tar_create_index

hfutils.index.make.hf_tar_create_index(repo_id: str, archive_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', idx_repo_id: str | None = None, idx_file_in_repo: str | None = None, idx_repo_type: Literal['dataset', 'model', 'space'] | None = None, idx_revision: str | None = None, chunk_for_hash: int = 1048576, with_hash: bool = True, skip_when_synced: bool = True, hf_token: str | None = None)[source]

Create an index file for a tar archive file in a Hugging Face repository.

Parameters:
  • repo_id (str) – The identifier of the repository.

  • archive_in_repo (str) – The path to the tar archive file.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository, defaults to ‘dataset’.

  • revision (str, optional) – The revision of the repository, defaults to ‘main’.

  • idx_repo_id (str, optional) – The identifier of the index repository, defaults to None.

  • idx_file_in_repo (str, optional) – The path to save the index file in the index repository, defaults to None.

  • idx_repo_type (RepoTypeTyping, optional) – The type of the index repository, defaults to None.

  • idx_revision (str, optional) – The revision of the index repository, defaults to None.

  • chunk_for_hash (int, optional) – The chunk size for hashing, defaults to 1 << 20 (1 MB).

  • with_hash (bool, optional) – Whether to include file hashes in the index, defaults to True.

  • skip_when_synced (bool) – Skip syncing when index is ready, defaults to True.

  • hf_token (str, optional) – The Hugging Face access token, defaults to None.

tar_get_index_info

hfutils.index.make.tar_get_index_info(src_tar_file, chunk_for_hash: int = 1048576, with_hash: bool = True, silent: bool = False)[source]

Get the index information of a tar archive file.

Note

The return value of this function will be directly used as the index json file.

Parameters:
  • src_tar_file (str) – The path to the source tar archive file.

  • chunk_for_hash (int, optional) – The chunk size for hashing, defaults to 1 << 20 (1 MB).

  • with_hash (bool, optional) – Whether to include file hashes in the index, defaults to True.

  • silent (bool, optional) – Whether to suppress progress bars and logging messages, defaults to False.

Returns:

The index information of the tar archive file.

Return type:

dict

hf_tar_create_from_directory

hfutils.index.make.hf_tar_create_from_directory(repo_id: str, archive_in_repo: str, local_directory: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', chunk_for_hash: int = 1048576, with_hash: bool = True, silent: bool = False, hf_token: str | None = None)[source]

Create a tar archive file from a local directory and upload it to a Hugging Face repository.

Parameters:
  • repo_id (str) – The identifier of the repository.

  • archive_in_repo (str) – The path to save the tar archive file in the repository.

  • local_directory (str) – The path to the local directory to be archived.

  • repo_type (RepoTypeTyping, optional) – The type of the Hugging Face repository, defaults to ‘dataset’.

  • revision (str, optional) – The revision of the repository, defaults to ‘main’.

  • chunk_for_hash (int, optional) – The chunk size for hashing, defaults to 1 << 20 (1 MB).

  • with_hash (bool, optional) – Whether to include file hashes in the index, defaults to True.

  • silent (bool, optional) – Whether to suppress progress bars and logging messages, defaults to False.

  • hf_token (str, optional) – The Hugging Face access token, defaults to None.