hfutils.index.local_fetch

This module provides utility functions for working with tar archives and their associated index files. It includes functions for retrieving archive indexes, listing files, checking file existence, getting file information, and downloading files from archives.

The module relies on a JSON-based index file that contains metadata about the files within the archive, including their offsets, sizes, and optional SHA256 hashes.

Functions in this module are designed to work with both local archive files and their corresponding index files, providing a convenient interface for archive manipulation and file extraction.

tar_get_index

hfutils.index.local_fetch.tar_get_index(archive_file: str, idx_file: str | None = None)[source]

Retrieve the index data for a given tar archive file.

This function reads the JSON index file associated with the archive, which contains metadata about the files within the archive.

Parameters:
  • archive_file (str) – Path to the tar archive file.

  • idx_file (Optional[str]) – Optional path to the index file. If not provided, it will be inferred from the archive file name.

Returns:

The parsed JSON data from the index file.

Return type:

dict

Raises:
  • FileNotFoundError – If the index file is not found.

  • json.JSONDecodeError – If the index file is not valid JSON.

Example:
>>> index_data = tar_get_index('my_archive.tar')

tar_list_files

hfutils.index.local_fetch.tar_list_files(archive_file: str, idx_file: str | None = None) List[str][source]

List all files contained within the specified tar archive.

This function uses the archive’s index file to retrieve the list of files without actually reading the tar archive itself.

Parameters:
  • archive_file (str) – Path to the tar archive file.

  • idx_file (Optional[str]) – Optional path to the index file. If not provided, it will be inferred from the archive file name.

Returns:

A list of file names contained in the archive.

Return type:

List[str]

Example:
>>> files = tar_list_files('my_archive.tar')
>>> for file in files:
>>>     print(file)

tar_file_exists

hfutils.index.local_fetch.tar_file_exists(archive_file: str, file_in_archive: str, idx_file: str | None = None) bool[source]

Check if a specific file exists within the tar archive.

This function uses the archive’s index to check for file existence without reading the entire archive.

Parameters:
  • archive_file (str) – Path to the tar archive file.

  • file_in_archive (str) – The name of the file to check for in the archive.

  • idx_file (Optional[str]) – Optional path to the index file. If not provided, it will be inferred from the archive file name.

Returns:

True if the file exists in the archive, False otherwise.

Return type:

bool

Example:
>>> exists = tar_file_exists('my_archive.tar', 'path/to/file.txt')
>>> if exists:
>>>     print("File exists in the archive")

tar_file_size

hfutils.index.local_fetch.tar_file_size(archive_file: str, file_in_archive: str, idx_file: str | None = None) int[source]

Get the size of a specific file within the tar archive.

This function returns the size of the specified file in bytes.

Parameters:
  • archive_file (str) – Path to the tar archive file.

  • file_in_archive (str) – The name of the file to get the size for.

  • idx_file (Optional[str]) – Optional path to the index file. If not provided, it will be inferred from the archive file name.

Returns:

The size of the file in bytes.

Return type:

int

Raises:

FileNotFoundError – If the specified file is not found in the archive.

Example:
>>> size = tar_file_size('my_archive.tar', 'path/to/file.txt')
>>> print(f"File size: {size} bytes")

tar_file_info

hfutils.index.local_fetch.tar_file_info(archive_file: str, file_in_archive: str, idx_file: str | None = None) dict[source]

Retrieve information about a specific file within the tar archive.

This function returns a dictionary containing metadata about the specified file, such as its size and offset within the archive.

Parameters:
  • archive_file (str) – Path to the tar archive file.

  • file_in_archive (str) – The name of the file to get information for.

  • idx_file (Optional[str]) – Optional path to the index file. If not provided, it will be inferred from the archive file name.

Returns:

A dictionary containing file metadata.

Return type:

dict

Raises:

FileNotFoundError – If the specified file is not found in the archive.

Example:
>>> info = tar_file_info('my_archive.tar', 'path/to/file.txt')
>>> print(f"File size: {info['size']} bytes")

tar_file_download

hfutils.index.local_fetch.tar_file_download(archive_file: str, file_in_archive: str, local_file: str, idx_file: str | None = None, chunk_size: int = 1048576)[source]

Extract and download a specific file from the tar archive to a local file.

This function reads the specified file from the archive and writes it to a local file. It also performs integrity checks to ensure the downloaded file is complete and matches the expected hash (if provided in the index).

Parameters:
  • archive_file (str) – Path to the tar archive file.

  • file_in_archive (str) – The name of the file to extract from the archive.

  • local_file (str) – The path where the extracted file should be saved.

  • idx_file (Optional[str]) – Optional path to the index file. If not provided, it will be inferred from the archive file name.

  • chunk_size (int) – The size of chunks to read and write, in bytes. Default is 1MB.

Raises:
Example:
>>> tar_file_download('my_archive.tar', 'path/to/file.txt', 'local_file.txt')