hfutils.utils.session

This module provides functionality for creating and managing HTTP sessions with customizable retry logic, timeout settings, and user-agent rotation using random user-agent generation. It is designed to help with robust web scraping and API consumption by handling common HTTP errors and timeouts gracefully.

Main Features:

  • Automatic retries on specified HTTP response status codes.

  • Configurable request timeout.

  • Rotating user-agent for each session to mimic different browsers and operating systems.

  • Optional SSL verification.

TimeoutHTTPAdapter

class hfutils.utils.session.TimeoutHTTPAdapter(*args, **kwargs)[source]

A custom HTTPAdapter that enforces a default timeout on all requests.

Parameters:
  • args – Variable length argument list for HTTPAdapter.

  • kwargs – Arbitrary keyword arguments. ‘timeout’ can be specified to set a custom timeout.

__init__(*args, **kwargs)[source]
send(request, **kwargs)[source]

Sends the Request object, applying the timeout setting.

Parameters:
  • request (requests.PreparedRequest) – The Request object to send.

  • kwargs – Keyword arguments that may contain ‘timeout’.

Returns:

The response to the request.

get_requests_session

hfutils.utils.session.get_requests_session(max_retries: int = 5, timeout: int = 15, verify: bool = True, headers: Dict[str, str] | None = None, session: Session | None = None) Session[source]

Creates a requests session with retry logic, timeout settings, and random user-agent headers.

Parameters:
  • max_retries (int) – Maximum number of retries on failed requests.

  • timeout (int) – Request timeout in seconds.

  • verify (bool) – Whether to verify SSL certificates.

  • headers (Optional[Dict[str, str]]) – Additional headers to include in the requests.

  • session (Optional[requests.Session]) – An existing requests.Session instance to use.

Returns:

A configured requests.Session object.

Return type:

requests.Session

get_random_ua

hfutils.utils.session.get_random_ua()[source]

Retrieves a random user agent string from the cached UserAgent rotator.

Returns:

A random user agent string.

Return type:

str