Basics of pandas_ext¶
pandas_ext
is a module that extends the functionality of the popular pandas
library in Python with OpenAI's API.
# Import necessary libraries
from typing import List
import pandas as pd
from pydantic import BaseModel
We define a list of English entities and create a Pandas DataFrame from this list.
# Define a list of entities to translate
fruits: List[str] = ["apple", "banana", "orange", "grape", "kiwi", "mango", "peach", "pear", "pineapple", "strawberry"]
fruits_df = pd.DataFrame({"name": fruits})
fruits_df
name | |
---|---|
0 | apple |
1 | banana |
2 | orange |
3 | grape |
4 | kiwi |
5 | mango |
6 | peach |
7 | pear |
8 | pineapple |
9 | strawberry |
import openaivec.pandas_ext¶
This example demonstrates how to integrate the openaivec.pandas_ext
module with Pandas for text translation tasks. Follow the examples below for single and multi-language translations.
If environment variavle OPENAI_API_KEY
is set, pandas_ext
automatically use the client openai.OpenAI
.
If environment variables AZURE_OPENAI_API_KEY
, AZURE_OPENAI_ENDPOINT
and AZURE_OPENAI_API_VERSION
are set, pandas_ext
automatically use the client openai.AzureOpenAI
.
If you must use specific instance of openai.OpenAI
, please set client with pandas_ext.use
.
import openai
from openaivec import pandas_ext
# Set OpenAI Client
pandas_ext.use(openai.OpenAI())
# Set models for responses and embeddings
pandas_ext.responses_model("gpt-4.1-nano")
pandas_ext.embeddings_model("text-embedding-3-small")
The model name 'gpt-4.1-nano' is not supported by tiktoken. Instead, using the 'o200k_base' encoding.
Process the columns with OpenAI¶
Once we load pandas_ext
, we are able to process with series with simple accessof pd.Series.ai.response
.
# Translate name to French and add as a new column
s: pd.Series = fruits_df.name.ai.responses("translate to French")
s
0 pomme 1 banane 2 orange 3 raisin 4 kiwi 5 mangue 6 pêche 7 poire 8 ananas 9 fraise Name: name, dtype: object
And embeddings also works with method pd.Series.ai.embed
e: pd.Series = fruits_df.name.ai.embeddings()
e
0 [0.01764064, -0.016817328, -0.041843545, 0.019... 1 [0.013411593, -0.020545648, -0.033350088, -0.0... 2 [-0.025922043, -0.0055465647, -0.006110964, 0.... 3 [-0.038692072, 0.009548252, -0.020608373, -0.0... 4 [-0.0057398607, -0.021460608, -0.026025245, 0.... 5 [0.055455774, -0.008839109, -0.019977605, -0.0... 6 [0.030673496, -0.041959558, -0.013912023, 0.03... 7 [0.023664422, -0.022354774, -0.008752595, 0.03... 8 [0.020983547, -0.060567692, -0.002925918, 0.02... 9 [0.020106195, -0.014350146, -0.040745355, -0.0... Name: name, dtype: object
Count tokens with pd.Series.ai.count_tokens
num_tokens: pd.Series = fruits_df.name.ai.count_tokens()
num_tokens
0 1 1 1 2 1 3 2 4 2 5 2 6 2 7 1 8 2 9 3 Name: num_tokens, dtype: int64
Structured Output with pandas_ext¶
Structured output is also available in pd.Series.ai.predict
.
# Define a structured output model for translations (Example: using Pydantic for structured output)
class Translation(BaseModel):
en: str # English
fr: str # French
ja: str # Japanese
es: str # Spanish
de: str # German
it: str # Italian
pt: str # Portuguese
ru: str # Russian
translations: pd.Series = fruits_df.name.ai.responses(
instructions="translate to multiple languages",
response_format=Translation
)
translations
0 en='Apple' fr='Pomme' ja='リンゴ' es='Manzana' de... 1 en='Banana' fr='Banane' ja='バナナ' es='Banana' d... 2 en='Orange' fr='Orange' ja='オレンジ' es='Naranja'... 3 en='Grape' fr='Raisin' ja='ブドウ' es='Uva' de='T... 4 en='Kiwi' fr='Kiwi' ja='キウイ' es='Kiwi' de='Kiw... 5 en='Mango' fr='Mangue' ja='マンゴー' es='Mango' de... 6 en='Peach' fr='Pêche' ja='モモ' es='Durazno' de=... 7 en='Pear' fr='Poire' ja='梨' es='Pera' de='Birn... 8 en='Pineapple' fr='Ananas' ja='パイナップル' es='Piñ... 9 en='Strawberry' fr='Fraise' ja='イチゴ' es='Fresa... Name: name, dtype: object
And these values of pd.Series
are instance of pydantic.BaseModel
.
pd.Series.ai.extract
method can parse each element as pd.DataFrame
translations.ai.extract()
name_en | name_fr | name_ja | name_es | name_de | name_it | name_pt | name_ru | |
---|---|---|---|---|---|---|---|---|
0 | Apple | Pomme | リンゴ | Manzana | Apfel | Mela | Maçã | Яблоко |
1 | Banana | Banane | バナナ | Banana | Banane | Banana | Banana | Банан |
2 | Orange | Orange | オレンジ | Naranja | Orange | Arancia | Laranja | Апельсин |
3 | Grape | Raisin | ブドウ | Uva | Traube | Uva | Uva | Виноград |
4 | Kiwi | Kiwi | キウイ | Kiwi | Kiwi | Kiwi | Kiwi | Киви |
5 | Mango | Mangue | マンゴー | Mango | Mango | Mango | Manga | Манго |
6 | Peach | Pêche | モモ | Durazno | Pfirsich | Pesca | Pêssego | Персик |
7 | Pear | Poire | 梨 | Pera | Birne | Pera | Pêra | Груша |
8 | Pineapple | Ananas | パイナップル | Piña | Ananas | Ananas | Abacaxi | Ананас |
9 | Strawberry | Fraise | イチゴ | Fresa | Erdbeere | Fragola | Morango | Клубника |
Example of Data Enrichment of fruit table¶
These interfaces can be seamlessly integreted with pd.DataFrame
APIs.
Let's enrich your data with power of LLMs!
fruits_df.pipe(
# assign a new column
lambda df: df.assign(
# Assign the color column using a openai model
color=lambda df: df.name.ai.responses("Return the color of given fruit"),
# Assign the embedding column using a openai model
embedding=lambda df: df.name.ai.embeddings(),
# Assign the multilingual translation column using a openai model
translation=lambda df: df.name.ai.responses(
instructions="translate to multiple languages",
response_format=Translation # Use the structured output model with pydantic.BaseModel
)
)
# Extract the translation column from the structured output
.ai.extract(column="translation")
)
name | color | embedding | translation_en | translation_fr | translation_ja | translation_es | translation_de | translation_it | translation_pt | translation_ru | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | apple | Red | [0.01764064, -0.016817328, -0.041843545, 0.019... | Apple | Pomme | リンゴ | Manzana | Apfel | Mela | Maçã | Яблоко |
1 | banana | Yellow | [0.013411593, -0.020545648, -0.033350088, -0.0... | Banana | Banane | バナナ | Banana | Banane | Banana | Banana | Банан |
2 | orange | Orange | [-0.025922043, -0.0055465647, -0.006110964, 0.... | Orange | Orange | オレンジ | Naranja | Orange | Arancia | Laranja | Апельсин |
3 | grape | Purple | [-0.038692072, 0.009548252, -0.020608373, -0.0... | Grape | Raisin | ブドウ | Uva | Traube | Uva | Uva | Виноград |
4 | kiwi | Brown/Green (inside) and Brown (outside) | [-0.0057398607, -0.021460608, -0.026025245, 0.... | Kiwi | Kiwi | キウイ | Kiwi | Kiwi | Kiwi | Kiwi | Киви |
5 | mango | Yellow/Orange | [0.055455774, -0.008839109, -0.019977605, -0.0... | Mango | Mangue | マンゴー | Mango | Mango | Mango | Manga | Манго |
6 | peach | Yellow/Orange | [0.030673496, -0.041959558, -0.013912023, 0.03... | Peach | Pêche | モモ | Durazno | Pfirsich | Pesca | Pêssego | Персик |
7 | pear | Green/Yellow | [0.023664422, -0.022354774, -0.008752595, 0.03... | Pear | Poire | 梨 | Pera | Birne | Pera | Pêra | Груша |
8 | pineapple | Brown/Green (outside) and Yellow (inside) | [0.020983547, -0.060567692, -0.002925918, 0.02... | Pineapple | Ananas | パイナップル | Piña | Ananas | Ananas | Abacaxi | Ананас |
9 | strawberry | Red | [0.020106195, -0.014350146, -0.040745355, -0.0... | Strawberry | Fraise | イチゴ | Fresa | Erdbeere | Fragola | Morango | Клубника |