hfutils.operate.upload
This module provides functions for uploading files and directories to Hugging Face repositories.
The module uses the Hugging Face Hub API client for repository operations.
upload_file_to_file
- hfutils.operate.upload.upload_file_to_file(local_file, repo_id: str, file_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', message: str | None = None, hf_token: str | None = None)[source]
Upload a local file to a specified path in a Hugging Face repository.
- Parameters:
local_file (str) – The local file path to be uploaded.
repo_id (str) – The identifier of the repository.
file_in_repo (str) – The file path within the repository.
repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).
revision (str) – The revision of the repository (e.g., branch, tag, commit hash).
message (Optional[str]) – The commit message for the upload.
hf_token (str, optional) – Huggingface token for API client, use
HF_TOKEN
variable if not assigned.
- Raises:
Any exception raised by the Hugging Face Hub API client.
This function uses the Hugging Face Hub API client to upload a single file to a specified repository. It’s useful for adding or updating individual files in a repository.
upload_directory_as_archive
- hfutils.operate.upload.upload_directory_as_archive(local_directory, repo_id: str, archive_in_repo: str, pattern: str | None = None, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', message: str | None = None, silent: bool = False, group_method: str | int | None = None, max_size_per_pack: str | float | None = None, hf_token: str | None = None)[source]
Upload a local directory as an archive file to a specified path in a Hugging Face repository.
- Parameters:
local_directory (str) – The local directory path to be uploaded.
repo_id (str) – The identifier of the repository.
archive_in_repo (str) – The archive file path within the repository.
pattern (Optional[str]) – A pattern to filter files in the local directory.
repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).
revision (str) – The revision of the repository (e.g., branch, tag, commit hash).
message (Optional[str]) – The commit message for the upload.
silent (bool) – If True, suppress progress bar output.
group_method (Optional[Union[str, int]]) – Method for grouping files (None for default, int for segment count). Only applied when
max_total_size
is assigned.max_size_per_pack (Optional[Union[str, float]]) – Maximum total size for each group (can be string like “1GB”). When assigned, this function will try to upload with multiple archive files.
hf_token (str, optional) – Huggingface token for API client, use
HF_TOKEN
variable if not assigned.
- Raises:
Any exception raised during archive creation or file upload.
This function compresses the specified local directory into an archive file and then uploads it to the Hugging Face repository. It’s useful for uploading entire directories as a single file, which can be more efficient for large directories.
upload_directory_as_directory
- hfutils.operate.upload.upload_directory_as_directory(local_directory, repo_id: str, path_in_repo: str, pattern: str | None = None, repo_type: ~typing.Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', message: str | None = None, time_suffix: bool = True, clear: bool = False, ignore_patterns: ~typing.List[str] = <object object>, hf_token: str | None = None, operation_chunk_size: int | None = None, upload_timespan: float = 5.0)[source]
Upload a local directory and its files to a specified path in a Hugging Face repository.
- Parameters:
local_directory (str) – The local directory path to be uploaded.
repo_id (str) – The identifier of the repository.
path_in_repo (str) – The directory path within the repository.
pattern (Optional[str]) – A pattern to filter files in the local directory.
repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).
revision (str) – The revision of the repository (e.g., branch, tag, commit hash).
message (Optional[str]) – The commit message for the upload.
time_suffix (bool) – If True, append a timestamp to the commit message.
clear (bool) – If True, remove files in the repository not present in the local directory.
ignore_patterns (List[str]) – List of file patterns to ignore.
hf_token (str, optional) – Huggingface token for API client, use
HF_TOKEN
variable if not assigned.operation_chunk_size (Optional[int]) – Chunk size of the operations. All the operations will be separated into multiple commits when this is set.
upload_timespan (float) – Upload minimal time interval when chunked uploading enabled.
- Raises:
Any exception raised during the upload process.
This function uploads a local directory to a Hugging Face repository, maintaining its structure. It can handle large directories by chunking the upload process and provides options for clearing existing files and ignoring specific patterns.
Note
When operation_chunk_size is set, multiple commits will be created. When some commits fail, it will roll back to the startup commit, using
hfutils.repository.hf_hub_rollback()
function.Warning
When operation_chunk_size is set, multiple commits will be created. But HuggingFace’s repository API cannot guarantee the atomic feature of your data. So this function is not thread-safe.
Note
The rate limit of HuggingFace repository commit creation is approximately 120 commits / hour. So if you really have a large number of chunks to create, please set the upload_timespan to a value no less than 30.0 to make sure your uploading will not be rate-limited.