Generate FAQ for README¶
Let's get started with pandas-chatflow!!
In [1]:
Copied!
import json
from openai import OpenAI
from openaivec import pandas_ext
from pydantic import BaseModel, Field
import pandas as pd
pandas_ext.use(OpenAI())
pandas_ext.responses_model("gpt-4.1")
import json
from openai import OpenAI
from openaivec import pandas_ext
from pydantic import BaseModel, Field
import pandas as pd
pandas_ext.use(OpenAI())
pandas_ext.responses_model("gpt-4.1")
The model name 'gpt-4.1' is not supported by tiktoken. Instead, using the 'o200k_base' encoding.
In [2]:
Copied!
docs_df: pd.DataFrame = pd.DataFrame(
{"title": "readme", "body": [open("../../README.md").read()]}
)
docs_df
docs_df: pd.DataFrame = pd.DataFrame(
{"title": "readme", "body": [open("../../README.md").read()]}
)
docs_df
Out[2]:
title | body | |
---|---|---|
0 | readme | # What is this?\n\n**openaivec** is a Python l... |
In [3]:
Copied!
class Question(BaseModel):
question: str = Field(description="The question to ask the model.")
answer: str = Field(description="The answer to the question.")
class Section(BaseModel):
title: str = Field(description="The title of the section.")
content: str = Field(description="The content of the section.")
questions: list[Question] = Field(description="List of questions and answers related to the section.")
class Document(BaseModel):
sections: list[Section] = Field(description="List of sections in the document.")
class Question(BaseModel):
question: str = Field(description="The question to ask the model.")
answer: str = Field(description="The answer to the question.")
class Section(BaseModel):
title: str = Field(description="The title of the section.")
content: str = Field(description="The content of the section.")
questions: list[Question] = Field(description="List of questions and answers related to the section.")
class Document(BaseModel):
sections: list[Section] = Field(description="List of sections in the document.")
In [4]:
Copied!
sections_df = docs_df.pipe(
lambda df: df
.assign(
section=lambda df: df["body"].ai.responses(
instructions="""
Generate a list of FAQ for each section of the document.
Break down the document into as many detailed sections as possible,
regardless of markdown format.
""",
response_format=Document
)
.map(lambda x: x.sections)
)
.drop(columns=["body"])
.explode("section")
.ai.extract("section")
)
sections_df
sections_df = docs_df.pipe(
lambda df: df
.assign(
section=lambda df: df["body"].ai.responses(
instructions="""
Generate a list of FAQ for each section of the document.
Break down the document into as many detailed sections as possible,
regardless of markdown format.
""",
response_format=Document
)
.map(lambda x: x.sections)
)
.drop(columns=["body"])
.explode("section")
.ai.extract("section")
)
sections_df
Out[4]:
title | section_title | section_content | section_questions | |
---|---|---|---|---|
0 | readme | Introduction to openaivec | openaivec is a Python library for efficient te... | [{'question': 'What is openaivec?', 'answer': ... |
0 | readme | Generative Mutation for Tabular Data | openaivec allows you to mutate columns in Pand... | [{'question': 'How do I use openaivec with a P... |
0 | readme | Overview and Features | openaivec provides a vectorized interface for ... | [{'question': 'What are the main features of o... |
0 | readme | Requirements and Installation | openaivec requires Python 3.10 or higher. Inst... | [{'question': 'What are the requirements for o... |
0 | readme | Basic Usage | You can use openaivec synchronously by initial... | [{'question': 'How do I use openaivec synchron... |
0 | readme | Using with Pandas DataFrame | openaivec.pandas_ext extends pandas.Series wit... | [{'question': 'How do I enable openaivec for P... |
0 | readme | Using with Apache Spark UDF | You can create UDFs for Apache Spark using UDF... | [{'question': 'How do I create a Spark UDF wit... |
0 | readme | Building Prompts with FewShotPromptBuilder | FewShotPromptBuilder helps you build few-shot ... | [{'question': 'What is FewShotPromptBuilder?',... |
0 | readme | Improving Prompts with OpenAI | FewShotPromptBuilder's improve method uses Ope... | [{'question': 'How do I improve a prompt?', 'a... |
0 | readme | Using with Microsoft Fabric | Instructions for integrating openaivec with Mi... | [{'question': 'What is Microsoft Fabric?', 'an... |
0 | readme | Contributing | Guidelines for contributing to the project, in... | [{'question': 'How can I contribute to openaiv... |
0 | readme | Community | Join the Discord community for developers at h... | [{'question': 'Is there a community for openai... |
In [5]:
Copied!
questions_df: pd.DataFrame = sections_df.pipe(
lambda df: df
.drop(columns=["section_content"])
.explode("section_questions")
.ai.extract("section_questions")
.reset_index(drop=True)
)
questions_df: pd.DataFrame = sections_df.pipe(
lambda df: df
.drop(columns=["section_content"])
.explode("section_questions")
.ai.extract("section_questions")
.reset_index(drop=True)
)
In [6]:
Copied!
from IPython.display import Markdown, display
display(Markdown(questions_df.to_markdown()))
from IPython.display import Markdown, display
display(Markdown(questions_df.to_markdown()))
title | section_title | section_questions_question | section_questions_answer | |
---|---|---|---|---|
0 | readme | Introduction to openaivec | What is openaivec? | A Python library for efficient text processing using the OpenAI API, with integration for Pandas and Apache Spark. |
1 | readme | Introduction to openaivec | What can I do with openaivec? | You can generate embeddings, text responses, and perform other language model tasks directly within your data processing workflows. |
2 | readme | Introduction to openaivec | Where can I find the full API reference? | The full API reference is available at https://openaivec.anareg.design/. |
3 | readme | Generative Mutation for Tabular Data | How do I use openaivec with a Pandas Series? | You can call the ai.responses method on a Series with a natural language instruction, e.g., animals.ai.responses('Translate the animal names to Chinese.'). |
4 | readme | Generative Mutation for Tabular Data | What kind of results can I expect? | The results are generated by the OpenAI model, such as translating animal names to Chinese or identifying if a word is related to Python language. |
5 | readme | Generative Mutation for Tabular Data | Can I use openaivec to process multiple columns at once? | Yes, you can use DataFrame.assign with multiple lambda functions to process several columns simultaneously. |
6 | readme | Overview and Features | What are the main features of openaivec? | Vectorized API requests, Pandas DataFrame integration, Apache Spark UDF builder, and compatibility with multiple OpenAI clients including Azure OpenAI. |
7 | readme | Overview and Features | How does vectorization help? | It allows processing multiple inputs in a single API call, reducing latency and simplifying code. |
8 | readme | Requirements and Installation | What are the requirements for openaivec? | Python 3.10 or higher. |
9 | readme | Requirements and Installation | How do I install openaivec? | Run pip install openaivec. |
10 | readme | Requirements and Installation | How do I uninstall openaivec? | Run pip uninstall openaivec. |
11 | readme | Basic Usage | How do I use openaivec synchronously? | Initialize a BatchResponses client with your OpenAI client and parameters, then call parse on your input list. |
12 | readme | Basic Usage | Where can I find a complete example? | See examples/basic_usage.ipynb in the repository. |
13 | readme | Using with Pandas DataFrame | How do I enable openaivec for Pandas? | Import pandas_ext and call pandas_ext.use(OpenAI()). |
14 | readme | Using with Pandas DataFrame | How do I set models for responses and embeddings? | Use pandas_ext.responses_model and pandas_ext.embeddings_model to set the desired models. |
15 | readme | Using with Pandas DataFrame | How do I use ai.responses in a DataFrame? | Use df.assign with a lambda function that calls df.column.ai.responses with your instruction. |
16 | readme | Using with Apache Spark UDF | How do I create a Spark UDF with openaivec? | Use UDFBuilder.of_azureopenai with your API credentials, then register UDFs with spark.udf.register. |
17 | readme | Using with Apache Spark UDF | What are some example use cases for Spark UDFs? | Extracting flavor or product type from product names in a DataFrame. |
18 | readme | Using with Apache Spark UDF | How do I use the UDFs in SQL queries? | Register the UDFs and call them in your SELECT statements as needed. |
19 | readme | Building Prompts with FewShotPromptBuilder | What is FewShotPromptBuilder? | A class that helps you build few-shot learning prompts with a simple interface. |
20 | readme | Building Prompts with FewShotPromptBuilder | How do I use FewShotPromptBuilder? | Specify a purpose, cautions, and examples, then call build to get the prompt in XML format. |
21 | readme | Building Prompts with FewShotPromptBuilder | Why use few-shot learning? | Providing examples in prompts can significantly improve LLM performance. |
22 | readme | Improving Prompts with OpenAI | How do I improve a prompt? | Call the improve method on FewShotPromptBuilder with your OpenAI client, model name, and max_iter. |
23 | readme | Improving Prompts with OpenAI | What does the improve method do? | It refines the prompt by removing contradictions, ambiguities, and redundancies, iterating up to max_iter times. |
24 | readme | Using with Microsoft Fabric | What is Microsoft Fabric? | A unified, cloud-based analytics platform integrating data engineering, warehousing, and business intelligence. |
25 | readme | Using with Microsoft Fabric | How do I add openaivec to Microsoft Fabric? | Create an environment, add openaivec from PyPI to the custom library, and use it in your notebook. |
26 | readme | Using with Microsoft Fabric | How do I use openaivec in a Fabric notebook? | Import openaivec.spark.UDFBuilder and use it as you would in a regular Python environment. |
27 | readme | Contributing | How can I contribute to openaivec? | Fork the repository, create a branch, add tests if needed, ensure tests pass, and make sure your code lints. |
28 | readme | Contributing | How do I install development dependencies? | Run uv sync --all-extras --dev. |
29 | readme | Contributing | How do I reformat the code? | Run uv run ruff check . --fix. |
30 | readme | Community | Is there a community for openaivec? | Yes, you can join the Discord community for developers. |
In [7]:
Copied!
ja_questions_df: pd.DataFrame = questions_df.pipe(
lambda df: df
.ai.responses(
instructions="""
Translate given json into japanese with same schema.
Just return the json without any additional text.
"""
)
.map(json.loads)
.ai.extract()
)
ja_questions_df: pd.DataFrame = questions_df.pipe(
lambda df: df
.ai.responses(
instructions="""
Translate given json into japanese with same schema.
Just return the json without any additional text.
"""
)
.map(json.loads)
.ai.extract()
)
In [8]:
Copied!
display(Markdown(ja_questions_df.to_markdown()))
display(Markdown(ja_questions_df.to_markdown()))
record_title | record_section_title | record_section_questions_question | record_section_questions_answer | |
---|---|---|---|---|
0 | readme | openaivecの紹介 | openaivecとは何ですか? | OpenAI APIを利用した効率的なテキスト処理のためのPythonライブラリで、PandasやApache Sparkとの統合が可能です。 |
1 | readme | openaivecの紹介 | openaivecで何ができますか? | 埋め込み生成、テキスト応答、その他の言語モデルタスクをデータ処理ワークフロー内で直接実行できます。 |
2 | readme | openaivecの紹介 | APIリファレンスはどこで見られますか? | 完全なAPIリファレンスは https://openaivec.anareg.design/ でご覧いただけます。 |
3 | readme | 表データの生成的変換 | PandasのSeriesでopenaivecを使うには? | Seriesに対してai.responsesメソッドを自然言語の指示とともに呼び出します。例:animals.ai.responses('動物名を中国語に翻訳してください。') |
4 | readme | 表データの生成的変換 | どのような結果が得られますか? | OpenAIモデルによって生成された結果が得られます。例えば、動物名の中国語翻訳や、単語がPython言語に関連しているかの判定などです。 |
5 | readme | 表データの生成的変換 | 複数のカラムを同時に処理できますか? | はい、DataFrame.assignと複数のlambda関数を使って、複数カラムを同時に処理できます。 |
6 | readme | 概要と特徴 | openaivecの主な特徴は? | ベクトル化APIリクエスト、Pandas DataFrame統合、Apache Spark UDFビルダー、Azure OpenAIを含む複数のOpenAIクライアントとの互換性があります。 |
7 | readme | 概要と特徴 | ベクトル化はどのように役立ちますか? | 複数の入力を1回のAPIコールで処理でき、レイテンシを削減しコードを簡素化します。 |
8 | readme | 要件とインストール | openaivecの要件は? | Python 3.10以上が必要です。 |
9 | readme | 要件とインストール | openaivecのインストール方法は? | pip install openaivec を実行してください。 |
10 | readme | 要件とインストール | openaivecのアンインストール方法は? | pip uninstall openaivec を実行してください。 |
11 | readme | 基本的な使い方 | openaivecを同期的に使うには? | OpenAIクライアントとパラメータでBatchResponsesクライアントを初期化し、入力リストに対してparseを呼び出します。 |
12 | readme | 基本的な使い方 | 完全な例はどこで見られますか? | リポジトリ内のexamples/basic_usage.ipynbをご覧ください。 |
13 | readme | Pandas DataFrameでの利用 | Pandasでopenaivecを有効にするには? | pandas_extをインポートし、pandas_ext.use(OpenAI())を呼び出します。 |
14 | readme | Pandas DataFrameでの利用 | 応答や埋め込みのモデルを設定するには? | pandas_ext.responses_modelとpandas_ext.embeddings_modelで希望のモデルを設定します。 |
15 | readme | Pandas DataFrameでの利用 | DataFrameでai.responsesを使うには? | df.assignで、df.column.ai.responsesを呼び出すlambda関数を使います。 |
16 | readme | Apache Spark UDFでの利用 | openaivecでSpark UDFを作成するには? | UDFBuilder.of_azureopenaiにAPI認証情報を渡し、spark.udf.registerでUDFを登録します。 |
17 | readme | Apache Spark UDFでの利用 | Spark UDFの利用例は? | DataFrame内の製品名からフレーバーや商品タイプを抽出するなどです。 |
18 | readme | Apache Spark UDFでの利用 | SQLクエリでUDFを使うには? | UDFを登録し、SELECT文で必要に応じて呼び出します。 |
19 | readme | FewShotPromptBuilderによるプロンプト構築 | FewShotPromptBuilderとは? | シンプルなインターフェースでfew-shot学習用プロンプトを構築できるクラスです。 |
20 | readme | FewShotPromptBuilderによるプロンプト構築 | FewShotPromptBuilderの使い方は? | 目的、注意事項、例を指定し、buildを呼び出すとXML形式のプロンプトが得られます。 |
21 | readme | FewShotPromptBuilderによるプロンプト構築 | few-shot学習を使う理由は? | プロンプトに例を与えることで、LLMの性能が大きく向上するためです。 |
22 | readme | OpenAIによるプロンプト改善 | プロンプトを改善するには? | FewShotPromptBuilderのimproveメソッドをOpenAIクライアント、モデル名、max_iterとともに呼び出します。 |
23 | readme | OpenAIによるプロンプト改善 | improveメソッドは何をしますか? | 矛盾、曖昧さ、冗長性を取り除き、max_iter回まで繰り返してプロンプトを洗練します。 |
24 | readme | Microsoft Fabricでの利用 | Microsoft Fabricとは? | データエンジニアリング、ウェアハウジング、BIを統合したクラウドベースの分析プラットフォームです。 |
25 | readme | Microsoft Fabricでの利用 | Microsoft Fabricにopenaivecを追加するには? | 環境を作成し、PyPIからopenaivecをカスタムライブラリに追加し、ノートブックで利用します。 |
26 | readme | Microsoft Fabricでの利用 | Fabricノートブックでopenaivecを使うには? | openaivec.spark.UDFBuilderをインポートし、通常のPython環境と同様に利用します。 |
27 | readme | コントリビューション | openaivecに貢献するには? | リポジトリをフォークし、ブランチを作成、必要ならテストを追加、テストが通ることとコードのリンティングを確認してください。 |
28 | readme | コントリビューション | 開発用依存関係のインストール方法は? | uv sync --all-extras --dev を実行してください。 |
29 | readme | コントリビューション | コードの再フォーマット方法は? | uv run ruff check . --fix を実行してください。 |
30 | readme | コミュニティ | openaivecのコミュニティはありますか? | はい、開発者向けのDiscordコミュニティがあります。 |