hfutils.repository.size
This module provides functionality for analyzing and managing Hugging Face repository files.
It includes classes and functions for representing repository files, creating file lists, and analyzing repository contents. The module is designed to work with the Hugging Face Hub API and provides utilities for handling file paths, sizes, and other metadata.
Key components: - RepoFileItem: Represents a single file in a repository. - RepoFileList: A sequence of RepoFileItems with additional metadata and utility methods. - hf_hub_repo_analysis: Function to analyze repository contents based on given criteria.
This module is particularly useful for developers working with Hugging Face repositories and need to analyze or manage file structures and metadata.
RepoFileItem
- class hfutils.repository.size.RepoFileItem(path: str, size: int, is_lfs: bool, lfs_sha256: str | None, blob_id: str)[source]
Represents a file item in a Hugging Face repository.
This class encapsulates metadata about a single file, including its path, size, LFS status, and blob ID.
- Parameters:
path – The file path relative to the repository root.
size – The size of the file in bytes.
is_lfs – Whether the file is stored using Git LFS.
lfs_sha256 – The SHA256 hash of the LFS file, if applicable.
blob_id – The Git blob ID of the file.
- __repr__()[source]
Return a string representation of the RepoFileItem.
- Returns:
A formatted string representation.
- classmethod from_repo_file(repo_file: RepoFile, subdir: str = '') RepoFileItem [source]
Create a RepoFileItem from a RepoFile object.
- Parameters:
repo_file – The RepoFile object to convert.
subdir – The subdirectory to use as the base path (default: ‘’).
- Returns:
A new RepoFileItem instance.
- property path_segments: Tuple[str, ...]
Get the path segments of the file.
- Returns:
A tuple of path segments.
RepoFileList
- class hfutils.repository.size.RepoFileList(repo_id: str, items: List[RepoFileItem], repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', subdir: str | None = '')[source]
Represents a list of RepoFileItems with additional metadata and utility methods.
This class provides a way to manage and analyze a collection of files from a Hugging Face repository, including information about the repository itself.
- Parameters:
repo_id – The ID of the repository.
items – A list of RepoFileItem objects.
repo_type – The type of the repository (default: ‘dataset’).
revision – The revision of the repository (default: ‘main’).
subdir – The subdirectory within the repository (default: ‘’).
- __getitem__(index)[source]
Get a RepoFileItem by index.
- Parameters:
index – The index of the item to retrieve.
- Returns:
The RepoFileItem at the specified index.
- __init__(repo_id: str, items: List[RepoFileItem], repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', subdir: str | None = '')[source]
- __len__() int [source]
Get the number of items in the list.
- Returns:
The number of RepoFileItems in the list.
- __repr__()[source]
Return a string representation of the RepoFileList.
- Returns:
A formatted string representation of the file list.
- repr(max_items: int | None = 10)[source]
Generate a custom string representation of the RepoFileList.
- Parameters:
max_items – The maximum number of items to include in the representation (default: 10).
- Returns:
A formatted string representation of the file list.
- property total_size: int
Get the total size of all files in the list.
- Returns:
The total size in bytes.
hf_hub_repo_analysis
- hfutils.repository.size.hf_hub_repo_analysis(repo_id: str, pattern: str = '**/*', repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str = 'main', hf_token: str | None = None, silent: bool = False, subdir: str = '', sort_by: Literal['none', 'path', 'size'] = 'path', **kwargs) RepoFileList [source]
Analyze the contents of a Hugging Face repository.
This function retrieves file information from a specified repository and creates a RepoFileList object containing detailed information about each file.
- Parameters:
repo_id – The ID of the repository to analyze.
repo_type – The type of the repository (default: ‘dataset’).
revision – The revision of the repository to analyze (default: ‘main’).
hf_token – The Hugging Face API token (optional).
silent – Whether to suppress output (default: False).
subdir – The subdirectory within the repository to analyze (default: ‘’).
sort_by – How to sort the file list (‘none’, ‘path’, or ‘size’) (default: ‘path’).
kwargs – Additional keyword arguments to pass to list_all_with_pattern.
- Returns:
A RepoFileList object containing the analysis results.
- Raises:
May raise exceptions related to API access or file operations.
- Usage:
>>> result = hf_hub_repo_analysis('username/repo', pattern='*.txt', repo_type='model') >>> print(result)