Utility Module¶
openaivec.util ¶
TextChunker
dataclass
¶
Utility for splitting text into token‑bounded chunks.
Source code in src/openaivec/util.py
split ¶
Token‑aware sentence segmentation.
The text is first split by the given separators, then greedily packed
into chunks whose token counts do not exceed max_tokens
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
original
|
str
|
Original text to split. |
required |
max_tokens
|
int
|
Maximum number of tokens allowed per chunk. |
required |
sep
|
List[str]
|
List of separator patterns used by
:pyfunc: |
required |
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: List of text chunks respecting the |
Source code in src/openaivec/util.py
get_exponential_with_cutoff ¶
Sample an exponential random variable with an upper cutoff.
A value is repeatedly drawn from an exponential distribution with rate
1/scale
until it is smaller than 3 * scale
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale
|
float
|
Scale parameter of the exponential distribution. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
Sampled value bounded by |
Source code in src/openaivec/util.py
backoff ¶
backoff(
exception: type[Exception],
scale: int | None = None,
max_retries: int | None = None,
) -> Callable[..., V]
Decorator implementing exponential back‑off retry logic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exception
|
type[Exception]
|
Exception type that triggers a retry. |
required |
scale
|
int | None
|
Initial scale parameter for the exponential jitter.
This scale is used as the mean for the first delay's exponential
distribution and doubles with each subsequent retry. If |
None
|
max_retries
|
Optional[int]
|
Maximum number of retries. |
None
|
Returns:
Type | Description |
---|---|
Callable[..., V]
|
Callable[..., V]: A decorated function that retries on the specified exception with exponential back‑off. |
Raises:
Type | Description |
---|---|
exception
|
Re‑raised when the maximum number of retries is exceeded. |
Source code in src/openaivec/util.py
backoff_async ¶
backoff_async(
exception: type[Exception],
scale: int | None = None,
max_retries: int | None = None,
) -> Callable[..., Awaitable[V]]
Asynchronous version of the backoff decorator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exception
|
type[Exception]
|
Exception type that triggers a retry. |
required |
scale
|
int | None
|
Initial scale parameter for the exponential jitter.
This scale is used as the mean for the first delay's exponential
distribution and doubles with each subsequent retry. If |
None
|
max_retries
|
int | None
|
Maximum number of retries. |
None
|
Returns:
Type | Description |
---|---|
Callable[..., Awaitable[V]]
|
Callable[..., Awaitable[V]]: A decorated asynchronous function that retries on the specified exception with exponential back‑off. |
Raises:
Type | Description |
---|---|
exception
|
Re‑raised when the maximum number of retries is exceeded. |
Source code in src/openaivec/util.py
map_async
async
¶
map_async(
inputs: List[T],
f: Callable[[List[T]], Awaitable[List[U]]],
batch_size: int = 128,
) -> List[U]
Asynchronously map a function f
over a list of inputs in batches.
This function divides the input list into smaller batches and applies the
asynchronous function f
to each batch concurrently. It gathers the results
and returns them in the same order as the original inputs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
List[T]
|
List of inputs to be processed. |
required |
f
|
Callable[[List[T]], Awaitable[List[U]]]
|
Asynchronous function to apply. It takes a batch of inputs (List[T]) and must return a list of corresponding outputs (List[U]) of the same size. |
required |
batch_size
|
int
|
Size of each batch for processing. |
128
|
Returns:
Type | Description |
---|---|
List[U]
|
List[U]: List of outputs corresponding to the original inputs, in order. |
Source code in src/openaivec/util.py
map ¶
Map a function f
over a list of inputs in batches.
This function divides the input list into smaller batches and applies the
function f
to each batch. It gathers the results and returns them in the
same order as the original inputs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
List[T]
|
List of inputs to be processed. |
required |
f
|
Callable[[List[T]], List[U]]
|
Function to apply. It takes a batch of inputs (List[T]) and must return a list of corresponding outputs (List[U]) of the same size. |
required |
batch_size
|
int
|
Size of each batch for processing. |
128
|
Returns:
Type | Description |
---|---|
List[U]
|
List[U]: List of outputs corresponding to the original inputs, in order. |