Task Module¶

openaivec.task ¶

Pre-configured task library for OpenAI API structured outputs.

This module provides a comprehensive collection of pre-configured tasks designed for various business and academic use cases. Tasks are organized into domain-specific submodules, each containing ready-to-use PreparedTask instances that work seamlessly with openaivec's batch processing capabilities.

Available Task Domains¶

Natural Language Processing (`nlp`)¶

Core NLP tasks for text analysis and processing:

Translation: Multi-language translation with 40+ language support
Sentiment Analysis: Emotion detection and sentiment scoring
Named Entity Recognition: Extract people, organizations, locations
Morphological Analysis: Part-of-speech tagging and lemmatization
Dependency Parsing: Syntactic structure analysis
Keyword Extraction: Important term identification

Customer Support (`customer_support`)¶

Specialized tasks for customer service operations:

Intent Analysis: Understand customer goals and requirements
Sentiment Analysis: Customer satisfaction and emotional state
Urgency Analysis: Priority assessment and response time recommendations
Inquiry Classification: Automatic categorization and routing
Inquiry Summary: Comprehensive issue summarization
Response Suggestion: AI-powered response drafting

Usage Patterns¶

Quick Start with Default Tasks¶

from openai import OpenAI
from openaivec.responses import BatchResponses
from openaivec.task import nlp, customer_support

client = OpenAI()

# Use pre-configured tasks
sentiment_analyzer = BatchResponses.of_task(
    client=client,
    model_name="gpt-4o-mini",
    task=nlp.SENTIMENT_ANALYSIS
)

intent_analyzer = BatchResponses.of_task(
    client=client, 
    model_name="gpt-4o-mini",
    task=customer_support.INTENT_ANALYSIS
)

Customized Task Configuration¶

from openaivec.task.customer_support import urgency_analysis

# Create customized urgency analysis
custom_urgency = urgency_analysis(
    business_context="SaaS platform support",
    urgency_levels={
        "critical": "Service outages, security breaches",
        "high": "Login issues, payment failures", 
        "medium": "Feature bugs, billing questions",
        "low": "Feature requests, general feedback"
    }
)

analyzer = BatchResponses.of_task(
    client=client,
    model_name="gpt-4o-mini", 
    task=custom_urgency
)

Pandas Integration¶

import pandas as pd
from openaivec import pandas_ext

df = pd.DataFrame({"text": ["I love this!", "This is terrible."]})

# Apply tasks directly to DataFrame columns
df["sentiment"] = df["text"].ai.task(nlp.SENTIMENT_ANALYSIS)
df["intent"] = df["text"].ai.task(customer_support.INTENT_ANALYSIS)

# Extract structured results
results_df = df.ai.extract("sentiment")

Spark Integration¶

from openaivec.spark import ResponsesUDFBuilder

# Register UDF for large-scale processing
spark.udf.register(
    "analyze_sentiment",
    ResponsesUDFBuilder.of_openai(
        api_key=api_key,
        model_name="gpt-4o-mini"
    ).build_from_task(task=nlp.SENTIMENT_ANALYSIS)
)

# Use in Spark SQL
df = spark.sql("""
    SELECT text, analyze_sentiment(text) as sentiment 
    FROM customer_feedback
""")

Task Architecture¶

PreparedTask Structure¶

All tasks are built using the PreparedTask dataclass:

@dataclass(frozen=True)
class PreparedTask:
    instructions: str           # Detailed prompt for the LLM
    response_format: Type[T]    # Pydantic model for structured output
    temperature: float = 0.0    # Sampling temperature
    top_p: float = 1.0         # Nucleus sampling parameter

Response Format Standards¶

Literal Types: Categorical fields use typing.Literal for type safety
Multilingual: Non-categorical fields respond in input language
Validation: Pydantic models ensure data integrity
Spark Compatible: All types map correctly to Spark schemas

Design Principles¶

Consistency: Uniform API across all task domains
Configurability: Customizable parameters for different use cases
Type Safety: Strong typing with Pydantic validation
Scalability: Optimized for batch processing and large datasets
Extensibility: Easy to add new domains and tasks

Adding New Task Domains¶

To add a new domain (e.g., finance, healthcare, legal):

Create Domain Module: src/openaivec/task/new_domain/
Implement Tasks: Following existing patterns with Pydantic models
Add Multilingual Support: Include language-aware instructions
Export Functions: Both configurable functions and constants
Update Documentation: Add to this module docstring

Example New Domain Structure¶

src/openaivec/task/finance/
├── __init__.py              # Export all functions and constants
├── risk_assessment.py       # Credit risk, market risk analysis
├── document_analysis.py     # Financial document processing
└── compliance_check.py      # Regulatory compliance verification

Performance Considerations¶

Batch Processing: Use BatchResponses for multiple inputs
Deduplication: Automatic duplicate removal reduces API costs
Caching: Results are cached based on input content
Async Support: AsyncBatchResponses for concurrent processing
Token Optimization: Vectorized system messages for efficiency

Best Practices¶

Choose Appropriate Models:
gpt-4o-mini: Fast, cost-effective for most tasks
gpt-4o: Higher accuracy for complex analysis
Customize When Needed:
Use default tasks for quick prototyping
Configure custom tasks for production use
Handle Multilingual Input:
Tasks automatically detect and respond in input language
Categorical fields remain in English for system compatibility
Monitor Performance:
Use batch sizes appropriate for your use case
Monitor token usage for cost optimization

See individual task modules for detailed documentation and examples.

Classes¶

PreparedTask `dataclass` ¶

A data class representing a complete task configuration for OpenAI API calls.

This class encapsulates all the necessary parameters for executing a task, including the instructions to be sent to the model, the expected response format using Pydantic models, and sampling parameters for controlling the model's output behavior.

Attributes:

Name	Type	Description
`instructions`	`str`	The prompt or instructions to send to the OpenAI model. This should contain clear, specific directions for the task.
`response_format`	`Type[T]`	A Pydantic model class that defines the expected structure of the response. Must inherit from BaseModel.
`temperature`	`float`	Controls randomness in the model's output. Range: 0.0 to 1.0. Lower values make output more deterministic. Defaults to 0.0.
`top_p`	`float`	Controls diversity via nucleus sampling. Only tokens comprising the top_p probability mass are considered. Range: 0.0 to 1.0. Defaults to 1.0.

Example

Creating a custom task:

from pydantic import BaseModel

class TranslationResponse(BaseModel):
    translated_text: str
    source_language: str
    target_language: str

custom_task = PreparedTask(
    instructions="Translate the following text to French:",
    response_format=TranslationResponse,
    temperature=0.1,
    top_p=0.9
)

Note

This class is frozen (immutable) to ensure task configurations cannot be accidentally modified after creation.

Source code in src/openaivec/task/model.py

@dataclass(frozen=True)
class PreparedTask:
    """A data class representing a complete task configuration for OpenAI API calls.

    This class encapsulates all the necessary parameters for executing a task,
    including the instructions to be sent to the model, the expected response
    format using Pydantic models, and sampling parameters for controlling
    the model's output behavior.

    Attributes:
        instructions (str): The prompt or instructions to send to the OpenAI model.
            This should contain clear, specific directions for the task.
        response_format (Type[T]): A Pydantic model class that defines the expected
            structure of the response. Must inherit from BaseModel.
        temperature (float): Controls randomness in the model's output.
            Range: 0.0 to 1.0. Lower values make output more deterministic.
            Defaults to 0.0.
        top_p (float): Controls diversity via nucleus sampling. Only tokens
            comprising the top_p probability mass are considered.
            Range: 0.0 to 1.0. Defaults to 1.0.

    Example:
        Creating a custom task:

        ```python
        from pydantic import BaseModel

        class TranslationResponse(BaseModel):
            translated_text: str
            source_language: str
            target_language: str

        custom_task = PreparedTask(
            instructions="Translate the following text to French:",
            response_format=TranslationResponse,
            temperature=0.1,
            top_p=0.9
        )
        ```

    Note:
        This class is frozen (immutable) to ensure task configurations
        cannot be accidentally modified after creation.
    """
    instructions: str
    response_format: Type[T]
    temperature: float = 0.0
    top_p: float = 1.0