Skip to content

Task Module

openaivec.task

Pre-configured task library for OpenAI API structured outputs.

This module provides a comprehensive collection of pre-configured tasks designed for various business and academic use cases. Tasks are organized into domain-specific submodules, each containing ready-to-use PreparedTask instances that work seamlessly with openaivec's batch processing capabilities.

Available Task Domains

Natural Language Processing (nlp)

Core NLP tasks for text analysis and processing:

  • Translation: Multi-language translation with 40+ language support
  • Sentiment Analysis: Emotion detection and sentiment scoring
  • Named Entity Recognition: Extract people, organizations, locations
  • Morphological Analysis: Part-of-speech tagging and lemmatization
  • Dependency Parsing: Syntactic structure analysis
  • Keyword Extraction: Important term identification
Customer Support (customer_support)

Specialized tasks for customer service operations:

  • Intent Analysis: Understand customer goals and requirements
  • Sentiment Analysis: Customer satisfaction and emotional state
  • Urgency Analysis: Priority assessment and response time recommendations
  • Inquiry Classification: Automatic categorization and routing
  • Inquiry Summary: Comprehensive issue summarization
  • Response Suggestion: AI-powered response drafting

Usage Patterns

Quick Start with Default Tasks
from openai import OpenAI
from openaivec.responses import BatchResponses
from openaivec.task import nlp, customer_support

client = OpenAI()

# Use pre-configured tasks
sentiment_analyzer = BatchResponses.of_task(
    client=client,
    model_name="gpt-4o-mini",
    task=nlp.SENTIMENT_ANALYSIS
)

intent_analyzer = BatchResponses.of_task(
    client=client, 
    model_name="gpt-4o-mini",
    task=customer_support.INTENT_ANALYSIS
)
Customized Task Configuration
from openaivec.task.customer_support import urgency_analysis

# Create customized urgency analysis
custom_urgency = urgency_analysis(
    business_context="SaaS platform support",
    urgency_levels={
        "critical": "Service outages, security breaches",
        "high": "Login issues, payment failures", 
        "medium": "Feature bugs, billing questions",
        "low": "Feature requests, general feedback"
    }
)

analyzer = BatchResponses.of_task(
    client=client,
    model_name="gpt-4o-mini", 
    task=custom_urgency
)
Pandas Integration
import pandas as pd
from openaivec import pandas_ext

df = pd.DataFrame({"text": ["I love this!", "This is terrible."]})

# Apply tasks directly to DataFrame columns
df["sentiment"] = df["text"].ai.task(nlp.SENTIMENT_ANALYSIS)
df["intent"] = df["text"].ai.task(customer_support.INTENT_ANALYSIS)

# Extract structured results
results_df = df.ai.extract("sentiment")
Spark Integration
from openaivec.spark import ResponsesUDFBuilder

# Register UDF for large-scale processing
spark.udf.register(
    "analyze_sentiment",
    ResponsesUDFBuilder.of_openai(
        api_key=api_key,
        model_name="gpt-4o-mini"
    ).build_from_task(task=nlp.SENTIMENT_ANALYSIS)
)

# Use in Spark SQL
df = spark.sql("""
    SELECT text, analyze_sentiment(text) as sentiment 
    FROM customer_feedback
""")

Task Architecture

PreparedTask Structure

All tasks are built using the PreparedTask dataclass:

@dataclass(frozen=True)
class PreparedTask:
    instructions: str           # Detailed prompt for the LLM
    response_format: Type[T]    # Pydantic model for structured output
    temperature: float = 0.0    # Sampling temperature
    top_p: float = 1.0         # Nucleus sampling parameter
Response Format Standards
  • Literal Types: Categorical fields use typing.Literal for type safety
  • Multilingual: Non-categorical fields respond in input language
  • Validation: Pydantic models ensure data integrity
  • Spark Compatible: All types map correctly to Spark schemas
Design Principles
  1. Consistency: Uniform API across all task domains
  2. Configurability: Customizable parameters for different use cases
  3. Type Safety: Strong typing with Pydantic validation
  4. Scalability: Optimized for batch processing and large datasets
  5. Extensibility: Easy to add new domains and tasks

Adding New Task Domains

To add a new domain (e.g., finance, healthcare, legal):

  1. Create Domain Module: src/openaivec/task/new_domain/
  2. Implement Tasks: Following existing patterns with Pydantic models
  3. Add Multilingual Support: Include language-aware instructions
  4. Export Functions: Both configurable functions and constants
  5. Update Documentation: Add to this module docstring
Example New Domain Structure
src/openaivec/task/finance/
├── __init__.py              # Export all functions and constants
├── risk_assessment.py       # Credit risk, market risk analysis
├── document_analysis.py     # Financial document processing
└── compliance_check.py      # Regulatory compliance verification

Performance Considerations

  • Batch Processing: Use BatchResponses for multiple inputs
  • Deduplication: Automatic duplicate removal reduces API costs
  • Caching: Results are cached based on input content
  • Async Support: AsyncBatchResponses for concurrent processing
  • Token Optimization: Vectorized system messages for efficiency

Best Practices

  1. Choose Appropriate Models:
  2. gpt-4o-mini: Fast, cost-effective for most tasks
  3. gpt-4o: Higher accuracy for complex analysis

  4. Customize When Needed:

  5. Use default tasks for quick prototyping
  6. Configure custom tasks for production use

  7. Handle Multilingual Input:

  8. Tasks automatically detect and respond in input language
  9. Categorical fields remain in English for system compatibility

  10. Monitor Performance:

  11. Use batch sizes appropriate for your use case
  12. Monitor token usage for cost optimization

See individual task modules for detailed documentation and examples.

Classes

PreparedTask dataclass

A data class representing a complete task configuration for OpenAI API calls.

This class encapsulates all the necessary parameters for executing a task, including the instructions to be sent to the model, the expected response format using Pydantic models, and sampling parameters for controlling the model's output behavior.

Attributes:

Name Type Description
instructions str

The prompt or instructions to send to the OpenAI model. This should contain clear, specific directions for the task.

response_format Type[T]

A Pydantic model class that defines the expected structure of the response. Must inherit from BaseModel.

temperature float

Controls randomness in the model's output. Range: 0.0 to 1.0. Lower values make output more deterministic. Defaults to 0.0.

top_p float

Controls diversity via nucleus sampling. Only tokens comprising the top_p probability mass are considered. Range: 0.0 to 1.0. Defaults to 1.0.

Example

Creating a custom task:

from pydantic import BaseModel

class TranslationResponse(BaseModel):
    translated_text: str
    source_language: str
    target_language: str

custom_task = PreparedTask(
    instructions="Translate the following text to French:",
    response_format=TranslationResponse,
    temperature=0.1,
    top_p=0.9
)
Note

This class is frozen (immutable) to ensure task configurations cannot be accidentally modified after creation.

Source code in src/openaivec/task/model.py
@dataclass(frozen=True)
class PreparedTask:
    """A data class representing a complete task configuration for OpenAI API calls.

    This class encapsulates all the necessary parameters for executing a task,
    including the instructions to be sent to the model, the expected response
    format using Pydantic models, and sampling parameters for controlling
    the model's output behavior.

    Attributes:
        instructions (str): The prompt or instructions to send to the OpenAI model.
            This should contain clear, specific directions for the task.
        response_format (Type[T]): A Pydantic model class that defines the expected
            structure of the response. Must inherit from BaseModel.
        temperature (float): Controls randomness in the model's output.
            Range: 0.0 to 1.0. Lower values make output more deterministic.
            Defaults to 0.0.
        top_p (float): Controls diversity via nucleus sampling. Only tokens
            comprising the top_p probability mass are considered.
            Range: 0.0 to 1.0. Defaults to 1.0.

    Example:
        Creating a custom task:

        ```python
        from pydantic import BaseModel

        class TranslationResponse(BaseModel):
            translated_text: str
            source_language: str
            target_language: str

        custom_task = PreparedTask(
            instructions="Translate the following text to French:",
            response_format=TranslationResponse,
            temperature=0.1,
            top_p=0.9
        )
        ```

    Note:
        This class is frozen (immutable) to ensure task configurations
        cannot be accidentally modified after creation.
    """
    instructions: str
    response_format: Type[T]
    temperature: float = 0.0
    top_p: float = 1.0