Generate FAQ for README¶

Let's get started with pandas-chatflow!!

In [1]:

Copied!





import json
from openai import OpenAI
from openaivec import pandas_ext
from pydantic import BaseModel, Field
import pandas as pd

pandas_ext.use(OpenAI())

pandas_ext.responses_model("gpt-4.1")
import json
from openai import OpenAI
from openaivec import pandas_ext
from pydantic import BaseModel, Field
import pandas as pd

pandas_ext.use(OpenAI())

pandas_ext.responses_model("gpt-4.1")

The model name 'gpt-4.1' is not supported by tiktoken. Instead, using the 'o200k_base' encoding.

In [2]:

Copied!





docs_df: pd.DataFrame = pd.DataFrame(
    {"title": "readme", "body": [open("../../README.md").read()]}
)
docs_df
docs_df: pd.DataFrame = pd.DataFrame(
    {"title": "readme", "body": [open("../../README.md").read()]}
)
docs_df

Out[2]:

	title	body
0	readme	# What is this?\n\nopenaivec is a Python l...

In [3]:

Copied!





class Question(BaseModel):
    question: str = Field(description="The question to ask the model.")
    answer: str = Field(description="The answer to the question.")

class Section(BaseModel):
    title: str = Field(description="The title of the section.")
    content: str = Field(description="The content of the section.")
    questions: list[Question] = Field(description="List of questions and answers related to the section.")

class Document(BaseModel):
    sections: list[Section] = Field(description="List of sections in the document.")
class Question(BaseModel):
    question: str = Field(description="The question to ask the model.")
    answer: str = Field(description="The answer to the question.")

class Section(BaseModel):
    title: str = Field(description="The title of the section.")
    content: str = Field(description="The content of the section.")
    questions: list[Question] = Field(description="List of questions and answers related to the section.")

class Document(BaseModel):
    sections: list[Section] = Field(description="List of sections in the document.")

In [4]:

Copied!





sections_df = docs_df.pipe(
    lambda df: df
    .assign(
        section=lambda df: df["body"].ai.responses(
            instructions="""
            Generate a list of FAQ for each section of the document. 
            Break down the document into as many detailed sections as possible, 
            regardless of markdown format.
            """,
            response_format=Document
        )
        .map(lambda x: x.sections)
    )
    .drop(columns=["body"])
    .explode("section")
    .ai.extract("section")
)

sections_df
sections_df = docs_df.pipe(
    lambda df: df
    .assign(
        section=lambda df: df["body"].ai.responses(
            instructions="""
            Generate a list of FAQ for each section of the document. 
            Break down the document into as many detailed sections as possible, 
            regardless of markdown format.
            """,
            response_format=Document
        )
        .map(lambda x: x.sections)
    )
    .drop(columns=["body"])
    .explode("section")
    .ai.extract("section")
)

sections_df

Out[4]:

title	section_title	section_content	section_questions
readme	Introduction to openaivec	openaivec is a Python library for efficient te...	[{'question': 'What is openaivec?', 'answer': ...
readme	Generative Mutation for Tabular Data	openaivec allows you to mutate columns in Pand...	[{'question': 'How do I use openaivec with a P...
readme	Overview and Features	openaivec provides a vectorized interface for ...	[{'question': 'What are the main features of o...
readme	Requirements and Installation	openaivec requires Python 3.10 or higher. Inst...	[{'question': 'What are the requirements for o...
readme	Basic Usage	You can use openaivec synchronously by initial...	[{'question': 'How do I use openaivec synchron...
readme	Using with Pandas DataFrame	openaivec.pandas_ext extends pandas.Series wit...	[{'question': 'How do I enable openaivec for P...
readme	Using with Apache Spark UDF	You can create UDFs for Apache Spark using UDF...	[{'question': 'How do I create a Spark UDF wit...
readme	Building Prompts with FewShotPromptBuilder	FewShotPromptBuilder helps you build few-shot ...	[{'question': 'What is FewShotPromptBuilder?',...
readme	Improving Prompts with OpenAI	FewShotPromptBuilder's improve method uses Ope...	[{'question': 'How do I improve a prompt?', 'a...
readme	Using with Microsoft Fabric	Instructions for integrating openaivec with Mi...	[{'question': 'What is Microsoft Fabric?', 'an...
readme	Contributing	Guidelines for contributing to the project, in...	[{'question': 'How can I contribute to openaiv...
readme	Community	Join the Discord community for developers at h...	[{'question': 'Is there a community for openai...

In [5]:

Copied!





questions_df: pd.DataFrame = sections_df.pipe(
    lambda df: df
    .drop(columns=["section_content"])
    .explode("section_questions")
    .ai.extract("section_questions")
    .reset_index(drop=True)
)
questions_df: pd.DataFrame = sections_df.pipe(
    lambda df: df
    .drop(columns=["section_content"])
    .explode("section_questions")
    .ai.extract("section_questions")
    .reset_index(drop=True)
)

In [6]:

Copied!

from IPython.display import Markdown, display

display(Markdown(questions_df.to_markdown()))
from IPython.display import Markdown, display

display(Markdown(questions_df.to_markdown()))

	title	section_title	section_questions_question	section_questions_answer
0	readme	Introduction to openaivec	What is openaivec?	A Python library for efficient text processing using the OpenAI API, with integration for Pandas and Apache Spark.
1	readme	Introduction to openaivec	What can I do with openaivec?	You can generate embeddings, text responses, and perform other language model tasks directly within your data processing workflows.
2	readme	Introduction to openaivec	Where can I find the full API reference?	The full API reference is available at https://openaivec.anareg.design/.
3	readme	Generative Mutation for Tabular Data	How do I use openaivec with a Pandas Series?	You can call the ai.responses method on a Series with a natural language instruction, e.g., animals.ai.responses('Translate the animal names to Chinese.').
4	readme	Generative Mutation for Tabular Data	What kind of results can I expect?	The results are generated by the OpenAI model, such as translating animal names to Chinese or identifying if a word is related to Python language.
5	readme	Generative Mutation for Tabular Data	Can I use openaivec to process multiple columns at once?	Yes, you can use DataFrame.assign with multiple lambda functions to process several columns simultaneously.
6	readme	Overview and Features	What are the main features of openaivec?	Vectorized API requests, Pandas DataFrame integration, Apache Spark UDF builder, and compatibility with multiple OpenAI clients including Azure OpenAI.
7	readme	Overview and Features	How does vectorization help?	It allows processing multiple inputs in a single API call, reducing latency and simplifying code.
8	readme	Requirements and Installation	What are the requirements for openaivec?	Python 3.10 or higher.
9	readme	Requirements and Installation	How do I install openaivec?	Run pip install openaivec.
10	readme	Requirements and Installation	How do I uninstall openaivec?	Run pip uninstall openaivec.
11	readme	Basic Usage	How do I use openaivec synchronously?	Initialize a BatchResponses client with your OpenAI client and parameters, then call parse on your input list.
12	readme	Basic Usage	Where can I find a complete example?	See examples/basic_usage.ipynb in the repository.
13	readme	Using with Pandas DataFrame	How do I enable openaivec for Pandas?	Import pandas_ext and call pandas_ext.use(OpenAI()).
14	readme	Using with Pandas DataFrame	How do I set models for responses and embeddings?	Use pandas_ext.responses_model and pandas_ext.embeddings_model to set the desired models.
15	readme	Using with Pandas DataFrame	How do I use ai.responses in a DataFrame?	Use df.assign with a lambda function that calls df.column.ai.responses with your instruction.
16	readme	Using with Apache Spark UDF	How do I create a Spark UDF with openaivec?	Use UDFBuilder.of_azureopenai with your API credentials, then register UDFs with spark.udf.register.
17	readme	Using with Apache Spark UDF	What are some example use cases for Spark UDFs?	Extracting flavor or product type from product names in a DataFrame.
18	readme	Using with Apache Spark UDF	How do I use the UDFs in SQL queries?	Register the UDFs and call them in your SELECT statements as needed.
19	readme	Building Prompts with FewShotPromptBuilder	What is FewShotPromptBuilder?	A class that helps you build few-shot learning prompts with a simple interface.
20	readme	Building Prompts with FewShotPromptBuilder	How do I use FewShotPromptBuilder?	Specify a purpose, cautions, and examples, then call build to get the prompt in XML format.
21	readme	Building Prompts with FewShotPromptBuilder	Why use few-shot learning?	Providing examples in prompts can significantly improve LLM performance.
22	readme	Improving Prompts with OpenAI	How do I improve a prompt?	Call the improve method on FewShotPromptBuilder with your OpenAI client, model name, and max_iter.
23	readme	Improving Prompts with OpenAI	What does the improve method do?	It refines the prompt by removing contradictions, ambiguities, and redundancies, iterating up to max_iter times.
24	readme	Using with Microsoft Fabric	What is Microsoft Fabric?	A unified, cloud-based analytics platform integrating data engineering, warehousing, and business intelligence.
25	readme	Using with Microsoft Fabric	How do I add openaivec to Microsoft Fabric?	Create an environment, add openaivec from PyPI to the custom library, and use it in your notebook.
26	readme	Using with Microsoft Fabric	How do I use openaivec in a Fabric notebook?	Import openaivec.spark.UDFBuilder and use it as you would in a regular Python environment.
27	readme	Contributing	How can I contribute to openaivec?	Fork the repository, create a branch, add tests if needed, ensure tests pass, and make sure your code lints.
28	readme	Contributing	How do I install development dependencies?	Run uv sync --all-extras --dev.
29	readme	Contributing	How do I reformat the code?	Run uv run ruff check . --fix.
30	readme	Community	Is there a community for openaivec?	Yes, you can join the Discord community for developers.

In [7]:

Copied!





ja_questions_df: pd.DataFrame = questions_df.pipe(
    lambda df: df
    .ai.responses(
        instructions="""
        Translate given json into japanese with same schema.
        Just return the json without any additional text.
        """
    )
    .map(json.loads)
    .ai.extract()
)
ja_questions_df: pd.DataFrame = questions_df.pipe(
    lambda df: df
    .ai.responses(
        instructions="""
        Translate given json into japanese with same schema.
        Just return the json without any additional text.
        """
    )
    .map(json.loads)
    .ai.extract()
)

In [8]:

Copied!

display(Markdown(ja_questions_df.to_markdown()))
display(Markdown(ja_questions_df.to_markdown()))

	record_title	record_section_title	record_section_questions_question	record_section_questions_answer
0	readme	openaivecの紹介	openaivecとは何ですか？	OpenAI APIを利用した効率的なテキスト処理のためのPythonライブラリで、PandasやApache Sparkとの統合が可能です。
1	readme	openaivecの紹介	openaivecで何ができますか？	埋め込み生成、テキスト応答、その他の言語モデルタスクをデータ処理ワークフロー内で直接実行できます。
2	readme	openaivecの紹介	APIリファレンスはどこで見られますか？	完全なAPIリファレンスは https://openaivec.anareg.design/ でご覧いただけます。
3	readme	表データの生成的変換	PandasのSeriesでopenaivecを使うには？	Seriesに対してai.responsesメソッドを自然言語の指示とともに呼び出します。例：animals.ai.responses('動物名を中国語に翻訳してください。')
4	readme	表データの生成的変換	どのような結果が得られますか？	OpenAIモデルによって生成された結果が得られます。例えば、動物名の中国語翻訳や、単語がPython言語に関連しているかの判定などです。
5	readme	表データの生成的変換	複数のカラムを同時に処理できますか？	はい、DataFrame.assignと複数のlambda関数を使って、複数カラムを同時に処理できます。
6	readme	概要と特徴	openaivecの主な特徴は？	ベクトル化APIリクエスト、Pandas DataFrame統合、Apache Spark UDFビルダー、Azure OpenAIを含む複数のOpenAIクライアントとの互換性があります。
7	readme	概要と特徴	ベクトル化はどのように役立ちますか？	複数の入力を1回のAPIコールで処理でき、レイテンシを削減しコードを簡素化します。
8	readme	要件とインストール	openaivecの要件は？	Python 3.10以上が必要です。
9	readme	要件とインストール	openaivecのインストール方法は？	pip install openaivec を実行してください。
10	readme	要件とインストール	openaivecのアンインストール方法は？	pip uninstall openaivec を実行してください。
11	readme	基本的な使い方	openaivecを同期的に使うには？	OpenAIクライアントとパラメータでBatchResponsesクライアントを初期化し、入力リストに対してparseを呼び出します。
12	readme	基本的な使い方	完全な例はどこで見られますか？	リポジトリ内のexamples/basic_usage.ipynbをご覧ください。
13	readme	Pandas DataFrameでの利用	Pandasでopenaivecを有効にするには？	pandas_extをインポートし、pandas_ext.use(OpenAI())を呼び出します。
14	readme	Pandas DataFrameでの利用	応答や埋め込みのモデルを設定するには？	pandas_ext.responses_modelとpandas_ext.embeddings_modelで希望のモデルを設定します。
15	readme	Pandas DataFrameでの利用	DataFrameでai.responsesを使うには？	df.assignで、df.column.ai.responsesを呼び出すlambda関数を使います。
16	readme	Apache Spark UDFでの利用	openaivecでSpark UDFを作成するには？	UDFBuilder.of_azureopenaiにAPI認証情報を渡し、spark.udf.registerでUDFを登録します。
17	readme	Apache Spark UDFでの利用	Spark UDFの利用例は？	DataFrame内の製品名からフレーバーや商品タイプを抽出するなどです。
18	readme	Apache Spark UDFでの利用	SQLクエリでUDFを使うには？	UDFを登録し、SELECT文で必要に応じて呼び出します。
19	readme	FewShotPromptBuilderによるプロンプト構築	FewShotPromptBuilderとは？	シンプルなインターフェースでfew-shot学習用プロンプトを構築できるクラスです。
20	readme	FewShotPromptBuilderによるプロンプト構築	FewShotPromptBuilderの使い方は？	目的、注意事項、例を指定し、buildを呼び出すとXML形式のプロンプトが得られます。
21	readme	FewShotPromptBuilderによるプロンプト構築	few-shot学習を使う理由は？	プロンプトに例を与えることで、LLMの性能が大きく向上するためです。
22	readme	OpenAIによるプロンプト改善	プロンプトを改善するには？	FewShotPromptBuilderのimproveメソッドをOpenAIクライアント、モデル名、max_iterとともに呼び出します。
23	readme	OpenAIによるプロンプト改善	improveメソッドは何をしますか？	矛盾、曖昧さ、冗長性を取り除き、max_iter回まで繰り返してプロンプトを洗練します。
24	readme	Microsoft Fabricでの利用	Microsoft Fabricとは？	データエンジニアリング、ウェアハウジング、BIを統合したクラウドベースの分析プラットフォームです。
25	readme	Microsoft Fabricでの利用	Microsoft Fabricにopenaivecを追加するには？	環境を作成し、PyPIからopenaivecをカスタムライブラリに追加し、ノートブックで利用します。
26	readme	Microsoft Fabricでの利用	Fabricノートブックでopenaivecを使うには？	openaivec.spark.UDFBuilderをインポートし、通常のPython環境と同様に利用します。
27	readme	コントリビューション	openaivecに貢献するには？	リポジトリをフォークし、ブランチを作成、必要ならテストを追加、テストが通ることとコードのリンティングを確認してください。
28	readme	コントリビューション	開発用依存関係のインストール方法は？	uv sync --all-extras --dev を実行してください。
29	readme	コントリビューション	コードの再フォーマット方法は？	uv run ruff check . --fix を実行してください。
30	readme	コミュニティ	openaivecのコミュニティはありますか？	はい、開発者向けのDiscordコミュニティがあります。