hfutils.operate.upload
This module provides functions for uploading files and directories to Hugging Face repositories.
The module uses the Hugging Face Hub API client for repository operations and supports various upload strategies including single file uploads, directory-to-archive uploads, and direct directory uploads with optional chunking for large datasets.
- Example::
>>> # Upload a single file >>> upload_file_to_file('local.txt', 'user/repo', 'remote.txt')
>>> # Upload directory as archive >>> upload_directory_as_archive('data/', 'user/repo', 'data.zip')
>>> # Upload directory structure >>> upload_directory_as_directory('data/', 'user/repo', 'dataset/')
upload_file_to_file
- hfutils.operate.upload.upload_file_to_file(local_file, repo_id: str, file_in_repo: str, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', message: str | None = None, hf_token: str | None = None)[source]
Upload a local file to a specified path in a Hugging Face repository.
This function uploads a single local file to a repository on Hugging Face Hub. It’s the most basic upload operation and is useful for adding or updating individual files in a repository.
- Parameters:
local_file (str) – The local file path to be uploaded.
repo_id (str) – The identifier of the repository in format ‘username/repo-name’.
file_in_repo (str) – The target file path within the repository.
repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).
revision (str) – The revision of the repository (e.g., branch, tag, commit hash).
message (Optional[str]) – The commit message for the upload. If None, a default message is generated.
hf_token (str, optional) – Huggingface token for API client, use
HF_TOKEN
variable if not assigned.
- Raises:
Any exception raised by the Hugging Face Hub API client.
- Example::
>>> upload_file_to_file('model.pkl', 'user/my-model', 'pytorch_model.bin') >>> upload_file_to_file('data.csv', 'user/my-dataset', 'train.csv', ... message='Add training data')
upload_directory_as_archive
- hfutils.operate.upload.upload_directory_as_archive(local_directory, repo_id: str, archive_in_repo: str, pattern: str | None = None, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', message: str | None = None, silent: bool = False, group_method: str | int | None = None, max_size_per_pack: str | float | None = None, hf_token: str | None = None)[source]
Upload a local directory as an archive file to a specified path in a Hugging Face repository.
This function compresses the specified local directory into an archive file and then uploads it to the Hugging Face repository. It’s useful for uploading entire directories as a single compressed file, which can be more efficient for large directories with many small files. The function supports splitting large archives into multiple parts if size limits are specified.
- Parameters:
local_directory (str) – The local directory path to be uploaded.
repo_id (str) – The identifier of the repository in format ‘username/repo-name’.
archive_in_repo (str) – The archive file path within the repository (determines compression format).
pattern (Optional[str]) – A pattern to filter files in the local directory (glob-style pattern).
repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).
revision (str) – The revision of the repository (e.g., branch, tag, commit hash).
message (Optional[str]) – The commit message for the upload. If None, a default message is generated.
silent (bool) – If True, suppress progress bar output.
group_method (Optional[Union[str, int]]) – Method for grouping files (None for default, int for segment count). Only applied when
max_size_per_pack
is assigned.max_size_per_pack (Optional[Union[str, float]]) – Maximum total size for each group (can be string like “1GB”). When assigned, this function will try to upload with multiple archive files.
hf_token (str, optional) – Huggingface token for API client, use
HF_TOKEN
variable if not assigned.
- Raises:
Any exception raised during archive creation or file upload.
- Example::
>>> # Upload directory as single archive >>> upload_directory_as_archive('data/', 'user/repo', 'dataset.zip')
>>> # Upload with file filtering >>> upload_directory_as_archive('images/', 'user/repo', 'images.tar.gz', ... pattern='*.jpg')
>>> # Upload with size limit (creates multiple parts) >>> upload_directory_as_archive('large_data/', 'user/repo', 'data.zip', ... max_size_per_pack='1GB')
upload_directory_as_directory
- hfutils.operate.upload.upload_directory_as_directory(local_directory, repo_id: str, path_in_repo: str, pattern: str | None = None, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', message: str | None = None, time_suffix: bool = True, clear: bool = False, hf_token: str | None = None, operation_chunk_size: int | None = None, upload_timespan: float = 5.0)[source]
Upload a local directory and its files to a specified path in a Hugging Face repository.
This function uploads a local directory to a Hugging Face repository while maintaining its directory structure. It provides advanced features like chunked uploads for large datasets, automatic cleanup of removed files, and progress tracking. This is the most comprehensive upload function for directory structures.
- Parameters:
local_directory (str) – The local directory path to be uploaded.
repo_id (str) – The identifier of the repository in format ‘username/repo-name’.
path_in_repo (str) – The target directory path within the repository.
pattern (Optional[str]) – A pattern to filter files in the local directory (glob-style pattern).
repo_type (RepoTypeTyping) – The type of the repository (‘dataset’, ‘model’, ‘space’).
revision (str) – The revision of the repository (e.g., branch, tag, commit hash).
message (Optional[str]) – The commit message for the upload. If None, a default message is generated.
time_suffix (bool) – If True, append a timestamp to the commit message.
clear (bool) – If True, remove files in the repository not present in the local directory.
hf_token (str, optional) – Huggingface token for API client, use
HF_TOKEN
variable if not assigned.operation_chunk_size (Optional[int]) – Chunk size of the operations. All the operations will be separated into multiple commits when this is set.
upload_timespan (float) – Upload minimal time interval when chunked uploading enabled.
- Raises:
Any exception raised during the upload process.
Note
When operation_chunk_size is set, multiple commits will be created. When some commits fail, it will roll back to the startup commit, using
hfutils.repository.hf_hub_rollback()
function.Warning
When operation_chunk_size is set, multiple commits will be created. But HuggingFace’s repository API cannot guarantee the atomic feature of your data. So this function is not thread-safe.
Note
The rate limit of HuggingFace repository commit creation is approximately 120 commits / hour. So if you really have a large number of chunks to create, please set the upload_timespan to a value no less than 30.0 to make sure your uploading will not be rate-limited.
- Example::
>>> # Basic directory upload >>> upload_directory_as_directory('data/', 'user/repo', 'dataset/')
>>> # Upload with file filtering >>> upload_directory_as_directory('models/', 'user/repo', 'checkpoints/', ... pattern='*.pth')
>>> # Chunked upload for large datasets >>> upload_directory_as_directory('large_dataset/', 'user/repo', 'data/', ... operation_chunk_size=100, upload_timespan=30.0)
>>> # Upload with cleanup of removed files >>> upload_directory_as_directory('updated_data/', 'user/repo', 'data/', ... clear=True)