hfutils.utils.data

This module provides functionality for identifying data files based on their file extensions.

It includes a comprehensive set of data file extensions and a function to check if a given filename corresponds to a known data file format. This can be useful in various data processing and file handling scenarios where it’s necessary to distinguish data files from other types of files.

is_data_file

hfutils.utils.data.is_data_file(filename: str | PathLike) bool[source]

Determine if a given filename corresponds to a known data file format.

This function checks if the file extension of the provided filename matches any of the known data file extensions defined in the _DATA_EXTS set.

Parameters:

filename (Union[str, os.PathLike]) – The name of the file to check. Can be a string or a path-like object.

Returns:

True if the file extension matches a known data file format, False otherwise.

Return type:

bool

Raises:

TypeError – If the provided filename is not a string or path-like object.

Usage:
>>> is_data_file('data.csv')
True
>>> is_data_file('script.py')
False
>>> is_data_file(Path('/path/to/data.json'))
True

Note

The function is case-insensitive and works with both file names and full paths. It normalizes the filename and extracts only the extension for comparison.