Compare commits
49 Commits
mike_quest
...
shyamal-te
Author | SHA1 | Date | |
---|---|---|---|
5b6577845b | |||
027c9233fe | |||
1b211d2a87 | |||
4a117bc0d2 | |||
c9ec5d79f3 | |||
d538ffc014 | |||
32596ae85b | |||
c951d099b9 | |||
89de43faf4 | |||
139af20193 | |||
e8e5c39a1c | |||
71cc3f4a9b | |||
563f8db035 | |||
02295444f7 | |||
f4a149bf20 | |||
bd26141162 | |||
f601888f18 | |||
8c3b8196e7 | |||
1ad5b85a46 | |||
6d31729d05 | |||
c5f64fa27a | |||
853264fc55 | |||
5704646be0 | |||
c76bbfcf8d | |||
e26b8dd3c8 | |||
bff53e71ed | |||
5da9f28eb8 | |||
e1144afc97 | |||
1ca1c2946b | |||
b40d006c7a | |||
4195d712cc | |||
627fdd8c84 | |||
edb02b8bc9 | |||
d987d2329a | |||
65d0925e59 | |||
2dc64b1567 | |||
efa71942c6 | |||
04f7e93c62 | |||
bd061ad2d4 | |||
350b9a7333 | |||
6eae26d5cc | |||
7dfdb9aa05 | |||
b77e9b34e8 | |||
cfe03038bd | |||
8b2d68f031 | |||
2515ddc7b8 | |||
25d641d21f | |||
ac406fc126 | |||
22f25af04f |
59
README.md
@ -1,8 +1,8 @@
|
||||
# OpenAI Cookbook
|
||||
|
||||
This repository shares example code and example prompts for accomplishing common tasks with the [OpenAI API](https://openai.com/api/).
|
||||
This repository shares example code and example prompts for accomplishing common tasks with the [OpenAI API].
|
||||
|
||||
To try these examples yourself, you’ll need an OpenAI account. [Create a free account to get started.](https://beta.openai.com/signup)
|
||||
To try these examples yourself, you’ll need an OpenAI account. [Create a free account to get started.][API Signup]
|
||||
|
||||
Most code examples are written in Python, though the concepts can be applied in any language.
|
||||
|
||||
@ -10,13 +10,13 @@ In the same way that a cookbook's recipes don't span all possible meals or techn
|
||||
|
||||
## Related resources
|
||||
|
||||
Beyond the code examples here, you can also learn about the [OpenAI API](https://openai.com/api/) from the following resources:
|
||||
Beyond the code examples here, you can also learn about the [OpenAI API] from the following resources:
|
||||
|
||||
* Try out GPT-3 in the [OpenAI Playground](https://beta.openai.com/playground)
|
||||
* Read about the API in the [OpenAI Documentation](https://beta.openai.com/docs/introduction)
|
||||
* Discuss the API in the [OpenAI Community Forum](https://community.openai.com/top?period=monthly)
|
||||
* Look for help in the [OpenAI Help Center](https://help.openai.com/en/)
|
||||
* See example prompts in the [OpenAI Examples](https://beta.openai.com/examples)
|
||||
* Try out GPT-3 in the [OpenAI Playground]
|
||||
* Read about the API in the [OpenAI Documentation]
|
||||
* Discuss the API in the [OpenAI Community Forum]
|
||||
* Look for help in the [OpenAI Help Center]
|
||||
* See example prompts in the [OpenAI Examples]
|
||||
|
||||
## Examples, organized by capability
|
||||
|
||||
@ -87,7 +87,7 @@ Beyond the code examples here, you can also learn about the [OpenAI API](https:/
|
||||
|
||||
## How large language models work
|
||||
|
||||
[Large language models](https://openai.com/blog/better-language-models/) are functions that map text to text. Given an input string of text, a large language model tries to predict the text that will come next.
|
||||
[Large language models][Large language models Blog Post] are functions that map text to text. Given an input string of text, a large language model tries to predict the text that will come next.
|
||||
|
||||
The magic of large language models is that by being trained to minimize this prediction error over vast quantities of text, the models end up learning concepts useful for these predictions. For example, they learn concepts like:
|
||||
|
||||
@ -102,7 +102,7 @@ The magic of large language models is that by being trained to minimize this pre
|
||||
|
||||
None of these capabilities are explicitly programmed in - they all emerge as a result of training.
|
||||
|
||||
GPT-3's capabilities now power [hundreds of different software products](https://openai.com/blog/gpt-3-apps/), including productivity apps, education apps, games, and more.
|
||||
GPT-3's capabilities now power [hundreds of different software products][GPT3 Apps Blog Post], including productivity apps, education apps, games, and more.
|
||||
|
||||
## How to control a large language model
|
||||
|
||||
@ -182,7 +182,7 @@ Output:
|
||||
|
||||
### Fine-tuned prompt example
|
||||
|
||||
With enough training examples, you can [fine-tune](https://beta.openai.com/docs/guides/fine-tuning) a custom model. In this case, instructions become unnecessary, as the model can learn the task from the training data provided. However, it can be helpful to include separator sequences (e.g., `->` or `###` or any string that doesn't commonly appear in your inputs) to tell the model when the prompt has ended and the output should begin. Without separator sequences, there is a risk that the model continues elaborating on the input text rather than starting on the answer you want to see.
|
||||
With enough training examples, you can [fine-tune][Fine Tuning Docs] a custom model. In this case, instructions become unnecessary, as the model can learn the task from the training data provided. However, it can be helpful to include separator sequences (e.g., `->` or `###` or any string that doesn't commonly appear in your inputs) to tell the model when the prompt has ended and the output should begin. Without separator sequences, there is a risk that the model continues elaborating on the input text rather than starting on the answer you want to see.
|
||||
|
||||
Example fine-tuned prompt (for a model that has been custom trained on similar prompt-completion pairs):
|
||||
|
||||
@ -203,7 +203,7 @@ Output:
|
||||
|
||||
### More prompt advice
|
||||
|
||||
For more prompt examples, visit [OpenAI Examples](https://beta.openai.com/examples).
|
||||
For more prompt examples, visit [OpenAI Examples][OpenAI Examples].
|
||||
|
||||
In general, the input prompt is the best lever for improving model outputs. You can try tricks like:
|
||||
|
||||
@ -360,7 +360,7 @@ People & titles: William Baxter (Assistant Attorney General), Charles L. Brown (
|
||||
|
||||
### 3. Edit text
|
||||
|
||||
In addition to the [completion API endpoint](https://beta.openai.com/docs/api-reference/completions), OpenAI now offers an [edit API endpoint](https://beta.openai.com/docs/api-reference/edits) ([blog post](https://openai.com/blog/gpt-3-edit-insert/)). In contrast to completions, which only take a single text input, edits take two text inputs: the instruction and the text to be modified.
|
||||
In addition to the [completion API endpoint][Completion API Docs], OpenAI now offers an [edit API endpoint][Edit API Docs] ([blog post][GPT3 Edit Blog Post]). In contrast to completions, which only take a single text input, edits take two text inputs: the instruction and the text to be modified.
|
||||
|
||||
An example edit prompt:
|
||||
|
||||
@ -435,7 +435,7 @@ When it comes to translation, large language models particularly shine at combin
|
||||
|
||||
### 4. Compare text
|
||||
|
||||
The [OpenAI API embeddings endpoint](https://beta.openai.com/docs/guides/embeddings) can be used to measure similarity between pieces of text ([blog post](https://openai.com/blog/introducing-text-and-code-embeddings/)). By leveraging GPT-3's understanding of text, these embeddings [achieved state-of-the-art results](https://arxiv.org/abs/2201.10005) on benchmarks in both unsupervised learning and transfer learning settings.
|
||||
The [OpenAI API embeddings endpoint][Embeddings Docs] can be used to measure similarity between pieces of text ([blog post][Embeddings Blog Post]). By leveraging GPT-3's understanding of text, these embeddings [achieved state-of-the-art results](https://arxiv.org/abs/2201.10005) on benchmarks in both unsupervised learning and transfer learning settings.
|
||||
|
||||
Embeddings can be used for semantic search, recommendations, cluster analysis, near-duplicate detection, and more.
|
||||
|
||||
@ -476,11 +476,11 @@ In the following notebook, we provide an example method for customizing your emb
|
||||
|
||||
## Code Capabilities
|
||||
|
||||
Large language models aren't only great at text - they can be great at code too. OpenAI's specialized code model is called [Codex](https://openai.com/blog/openai-codex/).
|
||||
Large language models aren't only great at text - they can be great at code too. OpenAI's specialized code model is called [Codex].
|
||||
|
||||
Codex powers [more than 70 products](https://openai.com/blog/codex-apps/), including:
|
||||
Codex powers [more than 70 products][Codex Apps Blog Post], including:
|
||||
|
||||
* [GitHub Copilot](https://copilot.github.com/) (autocompletes code in VS Code and other IDEs)
|
||||
* [GitHub Copilot] (autocompletes code in VS Code and other IDEs)
|
||||
* [Pygma](https://pygma.app/) (turns Figma designs into code)
|
||||
* [Replit](https://replit.com/) (has an 'Explain code' button and other features)
|
||||
* [Warp](https://www.warp.dev/) (a smart terminal with AI command search)
|
||||
@ -618,9 +618,9 @@ As you can see, `code-davinci-edit-001` was able to successfully reduce the func
|
||||
|
||||
The OpenAI API also features code search embeddings, which can measure the relevance of a section of code to a text query, or the similarity between two sections of code.
|
||||
|
||||
OpenAI code search embeddings significantly improved the state-of-the-art on the [CodeSearchNet](https://github.com/github/CodeSearchNet) evaluation suite, scoring 93.5% versus the previous record of 77.4%.
|
||||
OpenAI code search embeddings significantly improved the state-of-the-art on the [CodeSearchNet] evaluation suite, scoring 93.5% versus the previous record of 77.4%.
|
||||
|
||||
Read more about OpenAI's code embeddings in the [blog post announcement](https://openai.com/blog/introducing-text-and-code-embeddings/) or [documentation](https://beta.openai.com/docs/guides/embeddings).
|
||||
Read more about OpenAI's code embeddings in the [blog post announcement][Embeddings Blog Post] or [documentation][Embeddings Docs].
|
||||
|
||||
Code embeddings can be useful for use cases such as:
|
||||
|
||||
@ -630,3 +630,24 @@ Code embeddings can be useful for use cases such as:
|
||||
An example of code search is shown in [Code_search.ipynb](examples/Code_search.ipynb).
|
||||
|
||||
We haven't written an example of code clustering, but the idea is the same as the text clustering in [Clustering.ipynb](examples/Clustering.ipynb).
|
||||
|
||||
|
||||
[OpenAI API]: https://openai.com/api/
|
||||
[Embeddings Docs]: https://beta.openai.com/docs/guides/embeddings
|
||||
[Edit API Docs]: https://beta.openai.com/docs/api-reference/edits
|
||||
[Completion API Docs]: https://beta.openai.com/docs/api-reference/completions
|
||||
[Fine Tuning Docs]: https://beta.openai.com/docs/guides/fine-tuning
|
||||
[CodeSearchNet]: https://github.com/github/CodeSearchNet
|
||||
[Embeddings Blog Post]: https://openai.com/blog/introducing-text-and-code-embeddings/
|
||||
[Codex Apps Blog Post]: https://openai.com/blog/codex-apps/
|
||||
[GPT3 Edit Blog Post]: https://openai.com/blog/gpt-3-edit-insert/
|
||||
[Large language models Blog Post]: https://openai.com/blog/better-language-models/
|
||||
[GitHub Copilot]: https://copilot.github.com/
|
||||
[Codex]: https://openai.com/blog/openai-codex/
|
||||
[API Signup]: https://beta.openai.com/signup
|
||||
[GPT3 Apps Blog Post]: https://openai.com/blog/gpt-3-apps/
|
||||
[OpenAI Playground]: https://beta.openai.com/playground
|
||||
[OpenAI Documentation]: https://beta.openai.com/docs/introduction
|
||||
[OpenAI Community Forum]: https://community.openai.com/top?period=monthly
|
||||
[OpenAI Help Center]: https://help.openai.com/en/
|
||||
[OpenAI Examples]: https://beta.openai.com/examples
|
||||
|
136
examples/Classification_using_embeddings.ipynb
Normal file
@ -28,8 +28,11 @@
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"\n",
|
||||
"embedding = openai.Embedding.create(input=\"Sample document text goes here\", engine=\"text-similarity-davinci-001\")['data'][0]['embedding']\n",
|
||||
"len(embedding)"
|
||||
"embedding = openai.Embedding.create(\n",
|
||||
" input=\"Sample document text goes here\",\n",
|
||||
" engine=\"text-similarity-davinci-001\"\n",
|
||||
")[\"data\"][0][\"embedding\"]\n",
|
||||
"len(embedding)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -49,21 +52,23 @@
|
||||
"import openai\n",
|
||||
"from tenacity import retry, wait_random_exponential, stop_after_attempt\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))\n",
|
||||
"def get_embedding(text: str, engine=\"text-similarity-davinci-001\") -> List[float]:\n",
|
||||
"def get_embedding(text: str, engine=\"text-similarity-davinci-001\") -> list[float]:\n",
|
||||
"\n",
|
||||
" # replace newlines, which can negatively affect performance.\n",
|
||||
" text = text.replace(\"\\n\", \" \")\n",
|
||||
"\n",
|
||||
" return openai.Embedding.create(input=[text], engine=engine)[\"data\"][0][\"embedding\"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"embedding = get_embedding(\"Sample query text goes here\", engine=\"text-search-ada-query-001\")\n",
|
||||
"print(len(embedding))"
|
||||
"print(len(embedding))\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 53,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -76,16 +81,14 @@
|
||||
],
|
||||
"source": [
|
||||
"embedding = get_embedding(\"Sample document text goes here\", engine=\"text-search-ada-doc-001\")\n",
|
||||
"print(len(embedding))"
|
||||
"print(len(embedding))\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"interpreter": {
|
||||
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.7.3 64-bit ('base': conda)",
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
@ -98,9 +101,14 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
604
examples/How_to_handle_rate_limits.ipynb
Normal file
@ -0,0 +1,604 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to handle rate limits\n",
|
||||
"\n",
|
||||
"When you call the OpenAI API repeatedly, you may encounter error messages that say `429: 'Too Many Requests'` or `RateLimitError`. These error messages come from exceeding the API's rate limits.\n",
|
||||
"\n",
|
||||
"Rate limits are a common practice for APIs, and they're put in place for a few different reasons.\n",
|
||||
"\n",
|
||||
"- First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.\n",
|
||||
"- Second, rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, OpenAI ensures that everyone has an opportunity to use the API without experiencing slowdowns.\n",
|
||||
"- Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience for all users.\n",
|
||||
"\n",
|
||||
"Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation of the API for its users.\n",
|
||||
"\n",
|
||||
"In this guide, we'll share some tips for avoiding and handling rate limit errors."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Default rate limits\n",
|
||||
"\n",
|
||||
"As of Sep 2022, the default rate limits are:\n",
|
||||
"\n",
|
||||
"<table>\n",
|
||||
"<thead>\n",
|
||||
" <tr>\n",
|
||||
" <th></th>\n",
|
||||
" <th>Text Completion & Embedding endpoints</th>\n",
|
||||
" <th>Code & Edit endpoints</th>\n",
|
||||
" </tr>\n",
|
||||
"</thead>\n",
|
||||
"<tbody>\n",
|
||||
" <tr>\n",
|
||||
" <td>Free trial users</td>\n",
|
||||
" <td>\n",
|
||||
" <ul>\n",
|
||||
" <li>20 requests / minute</li>\n",
|
||||
" <li>150,000 tokens / minute</li>\n",
|
||||
" </ul>\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <ul>\n",
|
||||
" <li>20 requests / minute</li>\n",
|
||||
" <li>150,000 tokens / minute</li>\n",
|
||||
" </ul>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Pay-as-you-go users (in your first 48 hours)</td>\n",
|
||||
" <td>\n",
|
||||
" <ul>\n",
|
||||
" <li>60 requests / minute</li>\n",
|
||||
" <li>250,000 davinci tokens / minute (and proportionally more for smaller models)</li>\n",
|
||||
" </ul>\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <ul>\n",
|
||||
" <li>20 requests / minute</li>\n",
|
||||
" <li>150,000 tokens / minute</li>\n",
|
||||
" </ul>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Pay-as-you-go users (after your first 48 hours)</td>\n",
|
||||
" <td>\n",
|
||||
" <ul>\n",
|
||||
" <li>3,000 requests / minute</li>\n",
|
||||
" <li>250,000 davinci tokens / minute (and proportionally more for smaller models)</li>\n",
|
||||
" </ul>\n",
|
||||
" </td>\n",
|
||||
" <td>\n",
|
||||
" <ul>\n",
|
||||
" <li>20 requests / minute</li>\n",
|
||||
" <li>150,000 tokens / minute</li>\n",
|
||||
" </ul>\n",
|
||||
" </td>\n",
|
||||
" </tr>\n",
|
||||
"</tbody>\n",
|
||||
"</table>\n",
|
||||
"\n",
|
||||
"For reference, 1,000 tokens is roughly a page of text.\n",
|
||||
"\n",
|
||||
"### Other rate limit resources\n",
|
||||
"\n",
|
||||
"Read more about OpenAI's rate limits in the [OpenAI Help Center](https://help.openai.com/en/):\n",
|
||||
"\n",
|
||||
"- [Is API usage subject to any rate limits?](https://help.openai.com/en/articles/5955598-is-api-usage-subject-to-any-rate-limits)\n",
|
||||
"- [How can I solve 429: 'Too Many Requests' errors?](https://help.openai.com/en/articles/5955604-how-can-i-solve-429-too-many-requests-errors)\n",
|
||||
"\n",
|
||||
"### Requesting a rate limit increase\n",
|
||||
"\n",
|
||||
"If you'd like your organization's rate limit increased, please feel free to reach out to <support@openai.com> with the following information:\n",
|
||||
"\n",
|
||||
"- The model(s) you need increased limits on\n",
|
||||
"- The estimated rate of requests\n",
|
||||
"- The reason for the increase"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example rate limit error\n",
|
||||
"\n",
|
||||
"A rate limit error will occur when API requests are sent too quickly. If using the OpenAI Python library, they will look something like:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min. Contact support@openai.com if you continue to have issues or if you’d like to request an increase.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Below is example code for triggering a rate limit error."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import openai # for making OpenAI API requests\n",
|
||||
"\n",
|
||||
"# request a bunch of completions in a loop\n",
|
||||
"for _ in range(100):\n",
|
||||
" openai.Completion.create(\n",
|
||||
" model=\"code-cushman-001\",\n",
|
||||
" prompt=\"def magic_function():\\n\\t\",\n",
|
||||
" max_tokens=10,\n",
|
||||
" )\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## How to avoid rate limit errors\n",
|
||||
"\n",
|
||||
"### Retrying with exponential backoff\n",
|
||||
"\n",
|
||||
"One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.\n",
|
||||
"\n",
|
||||
"This approach has many benefits:\n",
|
||||
"\n",
|
||||
"- Automatic retries means you can recover from rate limit errors without crashes or missing data\n",
|
||||
"- Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail\n",
|
||||
"- Adding random jitter to the delay helps retries from all hitting at the same time\n",
|
||||
"\n",
|
||||
"Note that unsuccessful requests contribute to your per-minute limit, so continuously resending a request won’t work.\n",
|
||||
"\n",
|
||||
"Below are a few example solutions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Example #1: Using the Tenacity library\n",
|
||||
"\n",
|
||||
"[Tenacity](https://tenacity.readthedocs.io/en/latest/) is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.\n",
|
||||
"\n",
|
||||
"To add exponential backoff to your requests, you can use the `tenacity.retry` [decorator](https://peps.python.org/pep-0318/). The following example uses the `tenacity.wait_random_exponential` function to add random exponential backoff to a request.\n",
|
||||
"\n",
|
||||
"Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<OpenAIObject text_completion id=cmpl-5oowO391reUW8RGVfFyzBM1uBs4A5 at 0x10d8cae00> JSON: {\n",
|
||||
" \"choices\": [\n",
|
||||
" {\n",
|
||||
" \"finish_reason\": \"length\",\n",
|
||||
" \"index\": 0,\n",
|
||||
" \"logprobs\": null,\n",
|
||||
" \"text\": \" a little girl dreamed of becoming a model.\\n\\nNowadays, that dream\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"created\": 1662793900,\n",
|
||||
" \"id\": \"cmpl-5oowO391reUW8RGVfFyzBM1uBs4A5\",\n",
|
||||
" \"model\": \"text-davinci-002\",\n",
|
||||
" \"object\": \"text_completion\",\n",
|
||||
" \"usage\": {\n",
|
||||
" \"completion_tokens\": 16,\n",
|
||||
" \"prompt_tokens\": 5,\n",
|
||||
" \"total_tokens\": 21\n",
|
||||
" }\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import openai # for OpenAI API calls\n",
|
||||
"from tenacity import (\n",
|
||||
" retry,\n",
|
||||
" stop_after_attempt,\n",
|
||||
" wait_random_exponential,\n",
|
||||
") # for exponential backoff\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))\n",
|
||||
"def completion_with_backoff(**kwargs):\n",
|
||||
" return openai.Completion.create(**kwargs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"completion_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Example #2: Using the backoff library\n",
|
||||
"\n",
|
||||
"Another library that provides function decorators for backoff and retry is [backoff](https://pypi.org/project/backoff/).\n",
|
||||
"\n",
|
||||
"Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<OpenAIObject text_completion id=cmpl-5oowPhIdUvshEsF1rBhhwE9KFfI3M at 0x111043680> JSON: {\n",
|
||||
" \"choices\": [\n",
|
||||
" {\n",
|
||||
" \"finish_reason\": \"length\",\n",
|
||||
" \"index\": 0,\n",
|
||||
" \"logprobs\": null,\n",
|
||||
" \"text\": \" two children lived in a poor country village. In the winter, the temperature would\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"created\": 1662793901,\n",
|
||||
" \"id\": \"cmpl-5oowPhIdUvshEsF1rBhhwE9KFfI3M\",\n",
|
||||
" \"model\": \"text-davinci-002\",\n",
|
||||
" \"object\": \"text_completion\",\n",
|
||||
" \"usage\": {\n",
|
||||
" \"completion_tokens\": 16,\n",
|
||||
" \"prompt_tokens\": 5,\n",
|
||||
" \"total_tokens\": 21\n",
|
||||
" }\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import backoff # for exponential backoff\n",
|
||||
"import openai # for OpenAI API calls\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"@backoff.on_exception(backoff.expo, openai.error.RateLimitError)\n",
|
||||
"def completions_with_backoff(**kwargs):\n",
|
||||
" return openai.Completion.create(**kwargs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"completions_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Example 3: Manual backoff implementation\n",
|
||||
"\n",
|
||||
"If you don't want to use third-party libraries, you can implement your own backoff logic."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<OpenAIObject text_completion id=cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT at 0x111024220> JSON: {\n",
|
||||
" \"choices\": [\n",
|
||||
" {\n",
|
||||
" \"finish_reason\": \"length\",\n",
|
||||
" \"index\": 0,\n",
|
||||
" \"logprobs\": null,\n",
|
||||
" \"text\": \" a man decided to greatly improve his karma by turning his life around.\\n\\n\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"created\": 1662793903,\n",
|
||||
" \"id\": \"cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT\",\n",
|
||||
" \"model\": \"text-davinci-002\",\n",
|
||||
" \"object\": \"text_completion\",\n",
|
||||
" \"usage\": {\n",
|
||||
" \"completion_tokens\": 16,\n",
|
||||
" \"prompt_tokens\": 5,\n",
|
||||
" \"total_tokens\": 21\n",
|
||||
" }\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"import random\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"import openai\n",
|
||||
"\n",
|
||||
"# define a retry decorator\n",
|
||||
"def retry_with_exponential_backoff(\n",
|
||||
" func,\n",
|
||||
" initial_delay: float = 1,\n",
|
||||
" exponential_base: float = 2,\n",
|
||||
" jitter: bool = True,\n",
|
||||
" max_retries: int = 10,\n",
|
||||
" errors: tuple = (openai.error.RateLimitError,),\n",
|
||||
"):\n",
|
||||
" \"\"\"Retry a function with exponential backoff.\"\"\"\n",
|
||||
"\n",
|
||||
" def wrapper(*args, **kwargs):\n",
|
||||
" # Initialize variables\n",
|
||||
" num_retries = 0\n",
|
||||
" delay = initial_delay\n",
|
||||
"\n",
|
||||
" # Loop until a successful response or max_retries is hit or an exception is raised\n",
|
||||
" while True:\n",
|
||||
" try:\n",
|
||||
" return func(*args, **kwargs)\n",
|
||||
"\n",
|
||||
" # Retry on specified errors\n",
|
||||
" except errors as e:\n",
|
||||
" # Increment retries\n",
|
||||
" num_retries += 1\n",
|
||||
"\n",
|
||||
" # Check if max retries has been reached\n",
|
||||
" if num_retries > max_retries:\n",
|
||||
" raise Exception(\n",
|
||||
" f\"Maximum number of retries ({max_retries}) exceeded.\"\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Increment the delay\n",
|
||||
" delay *= exponential_base * (1 + jitter * random.random())\n",
|
||||
"\n",
|
||||
" # Sleep for the delay\n",
|
||||
" time.sleep(delay)\n",
|
||||
"\n",
|
||||
" # Raise exceptions for any errors not specified\n",
|
||||
" except Exception as e:\n",
|
||||
" raise e\n",
|
||||
"\n",
|
||||
" return wrapper\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"@retry_with_exponential_backoff\n",
|
||||
"def completions_with_backoff(**kwargs):\n",
|
||||
" return openai.Completion.create(**kwargs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"completions_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## How to maximize throughput of batch processing given rate limits\n",
|
||||
"\n",
|
||||
"If you're processing real-time requests from users, backoff and retry is a great strategy to minimize latency while avoiding rate limit errors.\n",
|
||||
"\n",
|
||||
"However, if you're processing large volumes of batch data, where throughput matters more than latency, there are a few other things you can do in addition to backoff and retry.\n",
|
||||
"\n",
|
||||
"### Proactively adding delay between requests\n",
|
||||
"\n",
|
||||
"If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, it's possible that a good fraction of your request budget will be 'wasted' on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.\n",
|
||||
"\n",
|
||||
"Here, one potential solution is to calculate your rate limit and add a delay equal to its reciprocal (e.g., if your rate limit 20 requests per minute, add a delay of 3 seconds to each request). This can help you operate near the rate limit ceiling without hitting it and incurring wasted requests.\n",
|
||||
"\n",
|
||||
"#### Example of adding delay to a request"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<OpenAIObject text_completion id=cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp at 0x11b2c7680> JSON: {\n",
|
||||
" \"choices\": [\n",
|
||||
" {\n",
|
||||
" \"finish_reason\": \"length\",\n",
|
||||
" \"index\": 0,\n",
|
||||
" \"logprobs\": null,\n",
|
||||
" \"text\": \" there was an idyllic little farm that sat by a babbling brook\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"created\": 1662793907,\n",
|
||||
" \"id\": \"cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp\",\n",
|
||||
" \"model\": \"text-davinci-002\",\n",
|
||||
" \"object\": \"text_completion\",\n",
|
||||
" \"usage\": {\n",
|
||||
" \"completion_tokens\": 16,\n",
|
||||
" \"prompt_tokens\": 5,\n",
|
||||
" \"total_tokens\": 21\n",
|
||||
" }\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"import time\n",
|
||||
"import openai\n",
|
||||
"\n",
|
||||
"# Define a function that adds a delay to a Completion API call\n",
|
||||
"def delayed_completion(delay_in_seconds: float = 1, **kwargs):\n",
|
||||
" \"\"\"Delay a completion by a specified amount of time.\"\"\"\n",
|
||||
"\n",
|
||||
" # Sleep for the delay\n",
|
||||
" time.sleep(delay_in_seconds)\n",
|
||||
"\n",
|
||||
" # Call the Completion API and return the result\n",
|
||||
" return openai.Completion.create(**kwargs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Calculate the delay based on your rate limit\n",
|
||||
"rate_limit_per_minute = 20\n",
|
||||
"delay = 60.0 / rate_limit_per_minute\n",
|
||||
"\n",
|
||||
"delayed_completion(\n",
|
||||
" delay_in_seconds=delay,\n",
|
||||
" model=\"text-davinci-002\",\n",
|
||||
" prompt=\"Once upon a time,\"\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"### Batching requests\n",
|
||||
"\n",
|
||||
"The OpenAI API has separate limits for requests per minute and tokens per minute.\n",
|
||||
"\n",
|
||||
"If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with the smaller models.\n",
|
||||
"\n",
|
||||
"Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a list of strings to `prompt` parameter instead of a single string.\n",
|
||||
"\n",
|
||||
"**Warning:** the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the `index` field.\n",
|
||||
"\n",
|
||||
"#### Example without batching"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Once upon a time, before there were grandiloquent tales of the massacre at Fort Mims, there were stories of\n",
|
||||
"Once upon a time, a full-sized search and rescue was created. However, CIDIs are the addition of requiring\n",
|
||||
"Once upon a time, Schubert was hot with the films. “Schubert sings of honey, flowers,\n",
|
||||
"Once upon a time, you could watch these films on your VCR, sometimes years after their initial theatrical release, and there\n",
|
||||
"Once upon a time, there was a forest. In that forest, the forest animals ruled. The forest animals had their homes\n",
|
||||
"Once upon a time, there were two programs that complained about false positive scans. Peacock and Midnight Manager alike, only\n",
|
||||
"Once upon a time, a long, long time ago, tragedy struck. it was the darkest of nights, and there was\n",
|
||||
"Once upon a time, when Adam was a perfect little gentleman, he was presented at Court as a guarantee of good character.\n",
|
||||
"Once upon a time, Adam and Eve made a mistake. They ate the fruit from the tree of immortality and split the consequences\n",
|
||||
"Once upon a time, there was a set of programming fundamental principles known as the \"X model.\" This is a set of\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import openai # for making OpenAI API requests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"num_stories = 10\n",
|
||||
"prompt = \"Once upon a time,\"\n",
|
||||
"\n",
|
||||
"# serial example, with one story completion per request\n",
|
||||
"for _ in range(num_stories):\n",
|
||||
" response = openai.Completion.create(\n",
|
||||
" model=\"curie\",\n",
|
||||
" prompt=prompt,\n",
|
||||
" max_tokens=20,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # print story\n",
|
||||
" print(prompt + response.choices[0].text)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Example with batching"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Once upon a time, there were two sisters, Eliza Pickering and Ariana 'Ari' Lucas. When these lovely\n",
|
||||
"Once upon a time, Keene was stung by a worm — actually, probably a python — snaking through his leg\n",
|
||||
"Once upon a time, there was a professor of physics during the depression. It was difficult, during this time, to get\n",
|
||||
"Once upon a time, before you got sick, you told stories to all and sundry, and your listeners believed in you\n",
|
||||
"Once upon a time, there was one very old nice donkey. He was incredibly smart, in a very old, kind of\n",
|
||||
"Once upon a time, the property of a common lodging house was a common cup for all the inhabitants. Betimes a constant\n",
|
||||
"Once upon a time, in an unspecified country, there was a witch who had an illegal product. It was highly effective,\n",
|
||||
"Once upon a time, a long time ago, I turned 13, my beautiful dog Duncan swept me up into his jaws like\n",
|
||||
"Once upon a time, as a thoroughly reformed creature from an army of Nazis, he took On Judgement Day myself and his\n",
|
||||
"Once upon a time, Capcom made a game for the Atari VCS called Missile Command. While it was innovative at the time\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import openai # for making OpenAI API requests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"num_stories = 10\n",
|
||||
"prompts = [\"Once upon a time,\"] * num_stories\n",
|
||||
"\n",
|
||||
"# batched example, with 10 stories completions per request\n",
|
||||
"response = openai.Completion.create(\n",
|
||||
" model=\"curie\",\n",
|
||||
" prompt=prompts,\n",
|
||||
" max_tokens=20,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# match completions to prompts by index\n",
|
||||
"stories = [\"\"] * len(prompts)\n",
|
||||
"for choice in response.choices:\n",
|
||||
" stories[choice.index] = prompts[choice.index] + choice.text\n",
|
||||
"\n",
|
||||
"# print stories\n",
|
||||
"for story in stories:\n",
|
||||
" print(story)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
369
examples/How_to_stream_completions.ipynb
Normal file
@ -0,0 +1,369 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to stream completions\n",
|
||||
"\n",
|
||||
"By default, when you send a prompt to the OpenAI Completions endpoint, it computes the entire completion and sends it back in a single response.\n",
|
||||
"\n",
|
||||
"If you're generating very long completions from a davinci-level model, waiting for the response can take many seconds. As of Aug 2022, responses from `text-davinci-002` typically take something like ~1 second plus ~2 seconds per 100 completion tokens.\n",
|
||||
"\n",
|
||||
"If you want to get the response faster, you can 'stream' the completion as it's being generated. This allows you to start printing or otherwise processing the beginning of the completion before the entire completion is finished.\n",
|
||||
"\n",
|
||||
"To stream completions, set `stream=True` when calling the Completions endpoint. This will return an object that streams back text as [data-only server-sent events](https://app.mode.com/openai/reports/4fce5ba22b5b/runs/f518a0be4495).\n",
|
||||
"\n",
|
||||
"Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n",
|
||||
"\n",
|
||||
"Below is a Python code example of how to receive streaming completions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# imports\n",
|
||||
"import openai # for OpenAI API calls\n",
|
||||
"import time # for measuring time savings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## A typical completion request\n",
|
||||
"\n",
|
||||
"With a typical Completions API call, the text is first computed and then returned all at once."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Full response received 7.32 seconds after request\n",
|
||||
"Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Example of an OpenAI Completion request\n",
|
||||
"# https://beta.openai.com/docs/api-reference/completions/create\n",
|
||||
"\n",
|
||||
"# record the time before the request is sent\n",
|
||||
"start_time = time.time()\n",
|
||||
"\n",
|
||||
"# send a Completion request to count to 100\n",
|
||||
"response = openai.Completion.create(\n",
|
||||
" model='text-davinci-002',\n",
|
||||
" prompt='1,2,3,',\n",
|
||||
" max_tokens=193,\n",
|
||||
" temperature=0,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# calculate the time it took to receive the response\n",
|
||||
"response_time = time.time() - start_time\n",
|
||||
"\n",
|
||||
"# extract the text from the response\n",
|
||||
"completion_text = response['choices'][0]['text']\n",
|
||||
"\n",
|
||||
"# print the time delay and text received\n",
|
||||
"print(f\"Full response received {response_time:.2f} seconds after request\")\n",
|
||||
"print(f\"Full text received: {completion_text}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## A streaming completion request\n",
|
||||
"\n",
|
||||
"With a streaming Completions API call, the text is sent back via a series of events. In Python, you can iterate over these events with a `for` loop."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Text received: 4 (0.16 seconds after request)\n",
|
||||
"Text received: , (0.19 seconds after request)\n",
|
||||
"Text received: 5 (0.21 seconds after request)\n",
|
||||
"Text received: , (0.24 seconds after request)\n",
|
||||
"Text received: 6 (0.27 seconds after request)\n",
|
||||
"Text received: , (0.29 seconds after request)\n",
|
||||
"Text received: 7 (0.32 seconds after request)\n",
|
||||
"Text received: , (0.35 seconds after request)\n",
|
||||
"Text received: 8 (0.37 seconds after request)\n",
|
||||
"Text received: , (0.40 seconds after request)\n",
|
||||
"Text received: 9 (0.43 seconds after request)\n",
|
||||
"Text received: , (0.46 seconds after request)\n",
|
||||
"Text received: 10 (0.48 seconds after request)\n",
|
||||
"Text received: , (0.51 seconds after request)\n",
|
||||
"Text received: 11 (0.54 seconds after request)\n",
|
||||
"Text received: , (0.56 seconds after request)\n",
|
||||
"Text received: 12 (0.59 seconds after request)\n",
|
||||
"Text received: , (0.62 seconds after request)\n",
|
||||
"Text received: 13 (0.64 seconds after request)\n",
|
||||
"Text received: , (0.67 seconds after request)\n",
|
||||
"Text received: 14 (0.70 seconds after request)\n",
|
||||
"Text received: , (0.72 seconds after request)\n",
|
||||
"Text received: 15 (0.75 seconds after request)\n",
|
||||
"Text received: , (0.78 seconds after request)\n",
|
||||
"Text received: 16 (0.84 seconds after request)\n",
|
||||
"Text received: , (0.84 seconds after request)\n",
|
||||
"Text received: 17 (0.86 seconds after request)\n",
|
||||
"Text received: , (0.89 seconds after request)\n",
|
||||
"Text received: 18 (0.91 seconds after request)\n",
|
||||
"Text received: , (0.94 seconds after request)\n",
|
||||
"Text received: 19 (1.41 seconds after request)\n",
|
||||
"Text received: , (1.41 seconds after request)\n",
|
||||
"Text received: 20 (1.41 seconds after request)\n",
|
||||
"Text received: , (1.41 seconds after request)\n",
|
||||
"Text received: 21 (1.41 seconds after request)\n",
|
||||
"Text received: , (1.41 seconds after request)\n",
|
||||
"Text received: 22 (1.41 seconds after request)\n",
|
||||
"Text received: , (1.41 seconds after request)\n",
|
||||
"Text received: 23 (1.41 seconds after request)\n",
|
||||
"Text received: , (1.41 seconds after request)\n",
|
||||
"Text received: 24 (1.46 seconds after request)\n",
|
||||
"Text received: , (1.46 seconds after request)\n",
|
||||
"Text received: 25 (1.46 seconds after request)\n",
|
||||
"Text received: , (1.55 seconds after request)\n",
|
||||
"Text received: 26 (1.61 seconds after request)\n",
|
||||
"Text received: , (1.65 seconds after request)\n",
|
||||
"Text received: 27 (1.66 seconds after request)\n",
|
||||
"Text received: , (1.70 seconds after request)\n",
|
||||
"Text received: 28 (1.72 seconds after request)\n",
|
||||
"Text received: , (1.75 seconds after request)\n",
|
||||
"Text received: 29 (1.78 seconds after request)\n",
|
||||
"Text received: , (2.05 seconds after request)\n",
|
||||
"Text received: 30 (2.08 seconds after request)\n",
|
||||
"Text received: , (2.13 seconds after request)\n",
|
||||
"Text received: 31 (2.16 seconds after request)\n",
|
||||
"Text received: , (2.20 seconds after request)\n",
|
||||
"Text received: 32 (2.26 seconds after request)\n",
|
||||
"Text received: , (2.28 seconds after request)\n",
|
||||
"Text received: 33 (2.31 seconds after request)\n",
|
||||
"Text received: , (2.35 seconds after request)\n",
|
||||
"Text received: 34 (2.38 seconds after request)\n",
|
||||
"Text received: , (2.54 seconds after request)\n",
|
||||
"Text received: 35 (2.55 seconds after request)\n",
|
||||
"Text received: , (2.59 seconds after request)\n",
|
||||
"Text received: 36 (2.61 seconds after request)\n",
|
||||
"Text received: , (2.64 seconds after request)\n",
|
||||
"Text received: 37 (2.67 seconds after request)\n",
|
||||
"Text received: , (2.71 seconds after request)\n",
|
||||
"Text received: 38 (2.86 seconds after request)\n",
|
||||
"Text received: , (2.89 seconds after request)\n",
|
||||
"Text received: 39 (2.92 seconds after request)\n",
|
||||
"Text received: , (2.95 seconds after request)\n",
|
||||
"Text received: 40 (2.99 seconds after request)\n",
|
||||
"Text received: , (3.01 seconds after request)\n",
|
||||
"Text received: 41 (3.04 seconds after request)\n",
|
||||
"Text received: , (3.08 seconds after request)\n",
|
||||
"Text received: 42 (3.15 seconds after request)\n",
|
||||
"Text received: , (3.33 seconds after request)\n",
|
||||
"Text received: 43 (3.36 seconds after request)\n",
|
||||
"Text received: , (3.43 seconds after request)\n",
|
||||
"Text received: 44 (3.47 seconds after request)\n",
|
||||
"Text received: , (3.50 seconds after request)\n",
|
||||
"Text received: 45 (3.53 seconds after request)\n",
|
||||
"Text received: , (3.56 seconds after request)\n",
|
||||
"Text received: 46 (3.59 seconds after request)\n",
|
||||
"Text received: , (3.63 seconds after request)\n",
|
||||
"Text received: 47 (3.65 seconds after request)\n",
|
||||
"Text received: , (3.68 seconds after request)\n",
|
||||
"Text received: 48 (3.71 seconds after request)\n",
|
||||
"Text received: , (3.77 seconds after request)\n",
|
||||
"Text received: 49 (3.77 seconds after request)\n",
|
||||
"Text received: , (3.79 seconds after request)\n",
|
||||
"Text received: 50 (3.82 seconds after request)\n",
|
||||
"Text received: , (3.85 seconds after request)\n",
|
||||
"Text received: 51 (3.89 seconds after request)\n",
|
||||
"Text received: , (3.91 seconds after request)\n",
|
||||
"Text received: 52 (3.93 seconds after request)\n",
|
||||
"Text received: , (3.96 seconds after request)\n",
|
||||
"Text received: 53 (3.98 seconds after request)\n",
|
||||
"Text received: , (4.04 seconds after request)\n",
|
||||
"Text received: 54 (4.05 seconds after request)\n",
|
||||
"Text received: , (4.07 seconds after request)\n",
|
||||
"Text received: 55 (4.10 seconds after request)\n",
|
||||
"Text received: , (4.13 seconds after request)\n",
|
||||
"Text received: 56 (4.19 seconds after request)\n",
|
||||
"Text received: , (4.20 seconds after request)\n",
|
||||
"Text received: 57 (4.20 seconds after request)\n",
|
||||
"Text received: , (4.23 seconds after request)\n",
|
||||
"Text received: 58 (4.26 seconds after request)\n",
|
||||
"Text received: , (4.30 seconds after request)\n",
|
||||
"Text received: 59 (4.31 seconds after request)\n",
|
||||
"Text received: , (4.59 seconds after request)\n",
|
||||
"Text received: 60 (4.61 seconds after request)\n",
|
||||
"Text received: , (4.64 seconds after request)\n",
|
||||
"Text received: 61 (4.67 seconds after request)\n",
|
||||
"Text received: , (4.72 seconds after request)\n",
|
||||
"Text received: 62 (4.73 seconds after request)\n",
|
||||
"Text received: , (4.76 seconds after request)\n",
|
||||
"Text received: 63 (4.80 seconds after request)\n",
|
||||
"Text received: , (4.83 seconds after request)\n",
|
||||
"Text received: 64 (4.86 seconds after request)\n",
|
||||
"Text received: , (4.89 seconds after request)\n",
|
||||
"Text received: 65 (4.92 seconds after request)\n",
|
||||
"Text received: , (4.94 seconds after request)\n",
|
||||
"Text received: 66 (4.97 seconds after request)\n",
|
||||
"Text received: , (5.00 seconds after request)\n",
|
||||
"Text received: 67 (5.03 seconds after request)\n",
|
||||
"Text received: , (5.06 seconds after request)\n",
|
||||
"Text received: 68 (5.09 seconds after request)\n",
|
||||
"Text received: , (5.14 seconds after request)\n",
|
||||
"Text received: 69 (5.16 seconds after request)\n",
|
||||
"Text received: , (5.19 seconds after request)\n",
|
||||
"Text received: 70 (5.22 seconds after request)\n",
|
||||
"Text received: , (5.28 seconds after request)\n",
|
||||
"Text received: 71 (5.30 seconds after request)\n",
|
||||
"Text received: , (5.33 seconds after request)\n",
|
||||
"Text received: 72 (5.36 seconds after request)\n",
|
||||
"Text received: , (5.38 seconds after request)\n",
|
||||
"Text received: 73 (5.41 seconds after request)\n",
|
||||
"Text received: , (5.44 seconds after request)\n",
|
||||
"Text received: 74 (5.48 seconds after request)\n",
|
||||
"Text received: , (5.51 seconds after request)\n",
|
||||
"Text received: 75 (5.53 seconds after request)\n",
|
||||
"Text received: , (5.56 seconds after request)\n",
|
||||
"Text received: 76 (5.60 seconds after request)\n",
|
||||
"Text received: , (5.62 seconds after request)\n",
|
||||
"Text received: 77 (5.65 seconds after request)\n",
|
||||
"Text received: , (5.68 seconds after request)\n",
|
||||
"Text received: 78 (5.71 seconds after request)\n",
|
||||
"Text received: , (5.77 seconds after request)\n",
|
||||
"Text received: 79 (5.77 seconds after request)\n",
|
||||
"Text received: , (5.79 seconds after request)\n",
|
||||
"Text received: 80 (5.82 seconds after request)\n",
|
||||
"Text received: , (5.85 seconds after request)\n",
|
||||
"Text received: 81 (5.88 seconds after request)\n",
|
||||
"Text received: , (5.92 seconds after request)\n",
|
||||
"Text received: 82 (5.93 seconds after request)\n",
|
||||
"Text received: , (5.97 seconds after request)\n",
|
||||
"Text received: 83 (5.98 seconds after request)\n",
|
||||
"Text received: , (6.01 seconds after request)\n",
|
||||
"Text received: 84 (6.04 seconds after request)\n",
|
||||
"Text received: , (6.07 seconds after request)\n",
|
||||
"Text received: 85 (6.09 seconds after request)\n",
|
||||
"Text received: , (6.11 seconds after request)\n",
|
||||
"Text received: 86 (6.14 seconds after request)\n",
|
||||
"Text received: , (6.17 seconds after request)\n",
|
||||
"Text received: 87 (6.19 seconds after request)\n",
|
||||
"Text received: , (6.22 seconds after request)\n",
|
||||
"Text received: 88 (6.24 seconds after request)\n",
|
||||
"Text received: , (6.27 seconds after request)\n",
|
||||
"Text received: 89 (6.30 seconds after request)\n",
|
||||
"Text received: , (6.31 seconds after request)\n",
|
||||
"Text received: 90 (6.35 seconds after request)\n",
|
||||
"Text received: , (6.36 seconds after request)\n",
|
||||
"Text received: 91 (6.40 seconds after request)\n",
|
||||
"Text received: , (6.44 seconds after request)\n",
|
||||
"Text received: 92 (6.46 seconds after request)\n",
|
||||
"Text received: , (6.49 seconds after request)\n",
|
||||
"Text received: 93 (6.51 seconds after request)\n",
|
||||
"Text received: , (6.54 seconds after request)\n",
|
||||
"Text received: 94 (6.56 seconds after request)\n",
|
||||
"Text received: , (6.59 seconds after request)\n",
|
||||
"Text received: 95 (6.62 seconds after request)\n",
|
||||
"Text received: , (6.64 seconds after request)\n",
|
||||
"Text received: 96 (6.68 seconds after request)\n",
|
||||
"Text received: , (6.68 seconds after request)\n",
|
||||
"Text received: 97 (6.70 seconds after request)\n",
|
||||
"Text received: , (6.73 seconds after request)\n",
|
||||
"Text received: 98 (6.75 seconds after request)\n",
|
||||
"Text received: , (6.78 seconds after request)\n",
|
||||
"Text received: 99 (6.90 seconds after request)\n",
|
||||
"Text received: , (6.92 seconds after request)\n",
|
||||
"Text received: 100 (7.25 seconds after request)\n",
|
||||
"Full response received 7.25 seconds after request\n",
|
||||
"Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Example of an OpenAI Completion request, using the stream=True option\n",
|
||||
"# https://beta.openai.com/docs/api-reference/completions/create\n",
|
||||
"\n",
|
||||
"# record the time before the request is sent\n",
|
||||
"start_time = time.time()\n",
|
||||
"\n",
|
||||
"# send a Completion request to count to 100\n",
|
||||
"response = openai.Completion.create(\n",
|
||||
" model='text-davinci-002',\n",
|
||||
" prompt='1,2,3,',\n",
|
||||
" max_tokens=193,\n",
|
||||
" temperature=0,\n",
|
||||
" stream=True, # this time, we set stream=True\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# create variables to collect the stream of events\n",
|
||||
"collected_events = []\n",
|
||||
"completion_text = ''\n",
|
||||
"# iterate through the stream of events\n",
|
||||
"for event in response:\n",
|
||||
" event_time = time.time() - start_time # calculate the time delay of the event\n",
|
||||
" collected_events.append(event) # save the event response\n",
|
||||
" event_text = event['choices'][0]['text'] # extract the text\n",
|
||||
" completion_text += event_text # append the text\n",
|
||||
" print(f\"Text received: {event_text} ({event_time:.2f} seconds after request)\") # print the delay and text\n",
|
||||
"\n",
|
||||
"# print the time delay and text received\n",
|
||||
"print(f\"Full response received {event_time:.2f} seconds after request\")\n",
|
||||
"print(f\"Full text received: {completion_text}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Time comparison\n",
|
||||
"\n",
|
||||
"In the example above, both requests took about 7 seconds to fully complete.\n",
|
||||
"\n",
|
||||
"However, with the streaming request, you would have received the first token after 0.16 seconds, and subsequent tokens after about ~0.035 seconds each."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
@ -45,57 +45,48 @@
|
||||
" <th>Text</th>\n",
|
||||
" <th>combined</th>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>Id</th>\n",
|
||||
" <th></th>\n",
|
||||
" <th></th>\n",
|
||||
" <th></th>\n",
|
||||
" <th></th>\n",
|
||||
" <th></th>\n",
|
||||
" <th></th>\n",
|
||||
" <th></th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>1303862400</td>\n",
|
||||
" <td>B001E4KFG0</td>\n",
|
||||
" <td>A3SGXH7AUHU8GW</td>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>1351123200</td>\n",
|
||||
" <td>B003XPF9BO</td>\n",
|
||||
" <td>A3R7JR3FMEBXQB</td>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>Good Quality Dog Food</td>\n",
|
||||
" <td>I have bought several of the Vitality canned d...</td>\n",
|
||||
" <td>Title: Good Quality Dog Food; Content: I have ...</td>\n",
|
||||
" <td>where does one start...and stop... with a tre...</td>\n",
|
||||
" <td>Wanted to save some to bring to my Chicago fam...</td>\n",
|
||||
" <td>Title: where does one start...and stop... wit...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>1346976000</td>\n",
|
||||
" <td>B00813GRG4</td>\n",
|
||||
" <td>A1D87F6ZCVE5NK</td>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>1351123200</td>\n",
|
||||
" <td>B003JK537S</td>\n",
|
||||
" <td>A3JBPC3WFUT5ZP</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>Not as Advertised</td>\n",
|
||||
" <td>Product arrived labeled as Jumbo Salted Peanut...</td>\n",
|
||||
" <td>Title: Not as Advertised; Content: Product arr...</td>\n",
|
||||
" <td>Arrived in pieces</td>\n",
|
||||
" <td>Not pleased at all. When I opened the box, mos...</td>\n",
|
||||
" <td>Title: Arrived in pieces; Content: Not pleased...</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Time ProductId UserId Score Summary \\\n",
|
||||
"Id \n",
|
||||
"1 1303862400 B001E4KFG0 A3SGXH7AUHU8GW 5 Good Quality Dog Food \n",
|
||||
"2 1346976000 B00813GRG4 A1D87F6ZCVE5NK 1 Not as Advertised \n",
|
||||
" Time ProductId UserId Score \\\n",
|
||||
"0 1351123200 B003XPF9BO A3R7JR3FMEBXQB 5 \n",
|
||||
"1 1351123200 B003JK537S A3JBPC3WFUT5ZP 1 \n",
|
||||
"\n",
|
||||
" Summary \\\n",
|
||||
"0 where does one start...and stop... with a tre... \n",
|
||||
"1 Arrived in pieces \n",
|
||||
"\n",
|
||||
" Text \\\n",
|
||||
"Id \n",
|
||||
"1 I have bought several of the Vitality canned d... \n",
|
||||
"2 Product arrived labeled as Jumbo Salted Peanut... \n",
|
||||
"0 Wanted to save some to bring to my Chicago fam... \n",
|
||||
"1 Not pleased at all. When I opened the box, mos... \n",
|
||||
"\n",
|
||||
" combined \n",
|
||||
"Id \n",
|
||||
"1 Title: Good Quality Dog Food; Content: I have ... \n",
|
||||
"2 Title: Not as Advertised; Content: Product arr... "
|
||||
"0 Title: where does one start...and stop... wit... \n",
|
||||
"1 Title: Arrived in pieces; Content: Not pleased... "
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
@ -106,7 +97,8 @@
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"df = pd.read_csv('input/Reviews.csv', index_col=0)\n",
|
||||
"input_datapath = 'data/fine_food_reviews_1k.csv' # to save space, we provide a pre-filtered dataset\n",
|
||||
"df = pd.read_csv(input_datapath, index_col=0)\n",
|
||||
"df = df[['Time', 'ProductId', 'UserId', 'Score', 'Summary', 'Text']]\n",
|
||||
"df = df.dropna()\n",
|
||||
"df['combined'] = \"Title: \" + df.Summary.str.strip() + \"; Content: \" + df.Text.str.strip()\n",
|
||||
@ -161,16 +153,14 @@
|
||||
"# This will take just under 10 minutes\n",
|
||||
"df['babbage_similarity'] = df.combined.apply(lambda x: get_embedding(x, engine='text-similarity-babbage-001'))\n",
|
||||
"df['babbage_search'] = df.combined.apply(lambda x: get_embedding(x, engine='text-search-babbage-doc-001'))\n",
|
||||
"df.to_csv('output/embedded_1k_reviews.csv')"
|
||||
"df.to_csv('data/fine_food_reviews_with_embeddings_1k.csv')"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"interpreter": {
|
||||
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.7.3 64-bit ('base': conda)",
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
@ -185,7 +175,12 @@
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
@ -9,12 +9,12 @@
|
||||
"\n",
|
||||
"Many use cases require GPT-3 to respond to user questions with insightful answers. For example, a customer support chatbot may need to provide answers to common questions. The GPT models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.\n",
|
||||
"\n",
|
||||
"In this notebook we will demonstrate a method for enabling GPT-3 able to answer questions using a library of text as a reference, by using document embeddings and retrieval. We'll be using a dataset of Wikipedia articles about the 2020 Summer Olympic Games. Please see [this notebook](examples/fine-tuned_qa/olympics-1-collect-data.ipynb) to follow the data gathering process."
|
||||
"In this notebook we will demonstrate a method for enabling GPT-3 able to answer questions using a library of text as a reference, by using document embeddings and retrieval. We'll be using a dataset of Wikipedia articles about the 2020 Summer Olympic Games. Please see [this notebook](fine-tuned_qa/olympics-1-collect-data.ipynb) to follow the data gathering process."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 1,
|
||||
"id": "9e3839a6-9146-4f60-b74b-19abbc24278d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -38,7 +38,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 2,
|
||||
"id": "a167516c-7c19-4bda-afa5-031aa0ae13bb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -48,7 +48,7 @@
|
||||
"\"The 2020 Summer Olympics men's high jump was won by Mariusz Przybylski of Poland.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -83,7 +83,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 3,
|
||||
"id": "a5451371-17fe-4ef3-aa02-affcf4edb0e0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -93,7 +93,7 @@
|
||||
"\"Sorry, I don't know.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -125,7 +125,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 4,
|
||||
"id": "fceaf665-2602-4788-bc44-9eb256a6f955",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -135,7 +135,7 @@
|
||||
"\"Gianmarco Tamberi and Mutaz Essa Barshim won the 2020 Summer Olympics men's high jump.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -200,7 +200,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 5,
|
||||
"id": "cc9c8d69-e234-48b4-87e3-935970e1523a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -245,33 +245,34 @@
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>Volleyball at the 2020 Summer Olympics – Women's tournament</th>\n",
|
||||
" <th>Format</th>\n",
|
||||
" <td>The preliminary round was a competition betwee...</td>\n",
|
||||
" <td>132</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>Rugby sevens at the 2020 Summer Olympics – Women's qualification</th>\n",
|
||||
" <th>South America</th>\n",
|
||||
" <td>Sudamérica Rugby held a tournament on 1–2 June...</td>\n",
|
||||
" <th>Nordic combined at the 2016 Winter Youth Olympics</th>\n",
|
||||
" <th>Summary</th>\n",
|
||||
" <td>Nordic combined at the 2016 Winter Youth Olymp...</td>\n",
|
||||
" <td>56</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th rowspan=\"2\" valign=\"top\">Canoeing at the 2020 Summer Olympics – Men's slalom C-1</th>\n",
|
||||
" <th>Competition format</th>\n",
|
||||
" <td>Slalom canoeing uses a three-round format, wit...</td>\n",
|
||||
" <td>136</td>\n",
|
||||
" <th>Morocco at the 2020 Summer Olympics</th>\n",
|
||||
" <th>Judo</th>\n",
|
||||
" <td>Morocco qualified two female judoka for each o...</td>\n",
|
||||
" <td>106</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>Qualification</th>\n",
|
||||
" <td>A National Olympic Committee (NOC) entered onl...</td>\n",
|
||||
" <td>171</td>\n",
|
||||
" <th>Guinea-Bissau at the 2020 Summer Olympics</th>\n",
|
||||
" <th>Wrestling</th>\n",
|
||||
" <td>Guinea-Bissau qualified two wrestlers for each...</td>\n",
|
||||
" <td>69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>Tunisia at the 2020 Summer Olympics</th>\n",
|
||||
" <th>Athletics</th>\n",
|
||||
" <td>Tunisian athletes further achieved the entry s...</td>\n",
|
||||
" <td>48</td>\n",
|
||||
" <th>Rome bid for the 2020 Summer Olympics</th>\n",
|
||||
" <th>History</th>\n",
|
||||
" <td>The Italian National Olympic Committee (CONI) ...</td>\n",
|
||||
" <td>738</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>Italy at the 2020 Summer Olympics</th>\n",
|
||||
" <th>Slalom</th>\n",
|
||||
" <td>Italian canoeists qualified one boat for each ...</td>\n",
|
||||
" <td>76</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
@ -280,22 +281,22 @@
|
||||
"text/plain": [
|
||||
" content \\\n",
|
||||
"title heading \n",
|
||||
"Volleyball at the 2020 Summer Olympics – Women'... Format The preliminary round was a competition betwee... \n",
|
||||
"Rugby sevens at the 2020 Summer Olympics – Wome... South America Sudamérica Rugby held a tournament on 1–2 June... \n",
|
||||
"Canoeing at the 2020 Summer Olympics – Men's sl... Competition format Slalom canoeing uses a three-round format, wit... \n",
|
||||
" Qualification A National Olympic Committee (NOC) entered onl... \n",
|
||||
"Tunisia at the 2020 Summer Olympics Athletics Tunisian athletes further achieved the entry s... \n",
|
||||
"Nordic combined at the 2016 Winter Youth Olympics Summary Nordic combined at the 2016 Winter Youth Olymp... \n",
|
||||
"Morocco at the 2020 Summer Olympics Judo Morocco qualified two female judoka for each o... \n",
|
||||
"Guinea-Bissau at the 2020 Summer Olympics Wrestling Guinea-Bissau qualified two wrestlers for each... \n",
|
||||
"Rome bid for the 2020 Summer Olympics History The Italian National Olympic Committee (CONI) ... \n",
|
||||
"Italy at the 2020 Summer Olympics Slalom Italian canoeists qualified one boat for each ... \n",
|
||||
"\n",
|
||||
" tokens \n",
|
||||
"title heading \n",
|
||||
"Volleyball at the 2020 Summer Olympics – Women'... Format 132 \n",
|
||||
"Rugby sevens at the 2020 Summer Olympics – Wome... South America 56 \n",
|
||||
"Canoeing at the 2020 Summer Olympics – Men's sl... Competition format 136 \n",
|
||||
" Qualification 171 \n",
|
||||
"Tunisia at the 2020 Summer Olympics Athletics 48 "
|
||||
"Nordic combined at the 2016 Winter Youth Olympics Summary 56 \n",
|
||||
"Morocco at the 2020 Summer Olympics Judo 106 \n",
|
||||
"Guinea-Bissau at the 2020 Summer Olympics Wrestling 69 \n",
|
||||
"Rome bid for the 2020 Summer Olympics History 738 \n",
|
||||
"Italy at the 2020 Summer Olympics Slalom 76 "
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -324,7 +325,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 6,
|
||||
"id": "4b874907-5109-4eef-ad9a-add4367925a3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -337,7 +338,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 7,
|
||||
"id": "ba475f30-ef7f-431c-b60d-d5970b62ad09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -368,7 +369,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 8,
|
||||
"id": "737266aa-cbe7-4691-87c1-fce8a31632f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -398,7 +399,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 9,
|
||||
"id": "ab50bfca-cb02-41c6-b338-4400abe1d86e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -412,7 +413,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 10,
|
||||
"id": "b9a8c713-c8a9-47dc-85a4-871ee1395566",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -446,7 +447,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 11,
|
||||
"id": "dcd680e9-f194-4180-b14f-fc357498eb92",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -476,7 +477,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 12,
|
||||
"id": "e3a27d73-f47f-480d-b336-079414f749cb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -495,7 +496,7 @@
|
||||
" (\"Athletics at the 2020 Summer Olympics – Women's long jump\", 'Summary'))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -506,7 +507,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 13,
|
||||
"id": "729c2ce7-8540-4ab2-bb3a-76c4dfcb689c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -525,7 +526,7 @@
|
||||
" (\"Athletics at the 2020 Summer Olympics – Women's pole vault\", 'Summary'))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -554,7 +555,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 14,
|
||||
"id": "b763ace2-1946-48e0-8ff1-91ba335d47a0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -564,7 +565,7 @@
|
||||
"'Context separator contains 3 tokens'"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -581,7 +582,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": 15,
|
||||
"id": "0c5c0509-eeb9-4552-a5d4-6ace04ef73dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -618,7 +619,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 16,
|
||||
"id": "f614045a-3917-4b28-9643-7e0c299ec1a7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -670,7 +671,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 17,
|
||||
"id": "b0edfec7-9243-4573-92e0-253d31c771ad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -685,7 +686,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 18,
|
||||
"id": "9c1c9a69-848e-4099-a90d-c8da36c153d5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -715,7 +716,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"execution_count": 19,
|
||||
"id": "c233e449-bf33-4c9e-b095-6a4dd278c8fd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -735,7 +736,7 @@
|
||||
"'Gianmarco Tamberi and Mutaz Essa Barshim emerged as joint winners of the event following a tie between both of them as they cleared 2.37m.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 24,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -760,7 +761,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"execution_count": 20,
|
||||
"id": "1127867b-2884-44bb-9439-0e8ae171c835",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -785,7 +786,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"execution_count": 21,
|
||||
"id": "720d9e0b-b189-4101-91ee-babf736199e6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -810,7 +811,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"execution_count": 22,
|
||||
"id": "4e8e51cc-e4eb-4557-9e09-2929d4df5b7f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -837,7 +838,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"execution_count": 23,
|
||||
"id": "37c83519-e3c6-4c44-8b4a-98cbb3a5f5ba",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -870,7 +871,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"execution_count": 24,
|
||||
"id": "26a1a9ef-e1ee-4f80-a1b1-6164ccfa5bac",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -897,7 +898,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"execution_count": 25,
|
||||
"id": "9fba8a63-eb81-4661-ae17-59bb5e2933d6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -928,7 +929,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"execution_count": 26,
|
||||
"id": "2d4c693b-cdb9-4f4c-bd1b-f77b29097a1f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -961,7 +962,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@ -976,6 +977,11 @@
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -13,14 +13,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Babbage similarity embedding performance on 1k Amazon reviews: mse=0.38, mae=0.39\n"
|
||||
"Babbage similarity embedding performance on 1k Amazon reviews: mse=0.39, mae=0.38\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -32,8 +32,9 @@
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"from sklearn.metrics import mean_squared_error, mean_absolute_error\n",
|
||||
"\n",
|
||||
"df = pd.read_csv('output/embedded_1k_reviews.csv')\n",
|
||||
"df['babbage_similarity'] = df.babbage_similarity.apply(eval).apply(np.array)\n",
|
||||
"datafile_path = \"https://cdn.openai.com/API/examples/data/fine_food_reviews_with_embeddings_1k.csv\" # for your convenience, we precomputed the embeddings\n",
|
||||
"df = pd.read_csv(datafile_path)\n",
|
||||
"df[\"babbage_similarity\"] = df.babbage_similarity.apply(eval).apply(np.array)\n",
|
||||
"\n",
|
||||
"X_train, X_test, y_train, y_test = train_test_split(list(df.babbage_similarity.values), df.Score, test_size=0.2, random_state=42)\n",
|
||||
"\n",
|
||||
@ -41,30 +42,31 @@
|
||||
"rfr.fit(X_train, y_train)\n",
|
||||
"preds = rfr.predict(X_test)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"mse = mean_squared_error(y_test, preds)\n",
|
||||
"mae = mean_absolute_error(y_test, preds)\n",
|
||||
"\n",
|
||||
"print(f\"Babbage similarity embedding performance on 1k Amazon reviews: mse={mse:.2f}, mae={mae:.2f}\")"
|
||||
"print(f\"Babbage similarity embedding performance on 1k Amazon reviews: mse={mse:.2f}, mae={mae:.2f}\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Dummy mean prediction performance on Amazon reviews: mse=1.77, mae=1.04\n"
|
||||
"Dummy mean prediction performance on Amazon reviews: mse=1.81, mae=1.08\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"bmse = mean_squared_error(y_test, np.repeat(y_test.mean(), len(y_test)))\n",
|
||||
"bmae = mean_absolute_error(y_test, np.repeat(y_test.mean(), len(y_test)))\n",
|
||||
"print(f\"Dummy mean prediction performance on Amazon reviews: mse={bmse:.2f}, mae={bmae:.2f}\")"
|
||||
"print(\n",
|
||||
" f\"Dummy mean prediction performance on Amazon reviews: mse={bmse:.2f}, mae={bmae:.2f}\"\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -83,11 +85,9 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"interpreter": {
|
||||
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.7.3 64-bit ('base': conda)",
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
@ -100,9 +100,14 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
@ -18,9 +18,9 @@
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"df = pd.read_csv('output/embedded_1k_reviews.csv')\n",
|
||||
"df['babbage_search'] = df.babbage_search.apply(eval).apply(np.array)"
|
||||
"datafile_path = \"https://cdn.openai.com/API/examples/data/fine_food_reviews_with_embeddings_1k.csv\" # for your convenience, we precomputed the embeddings\n",
|
||||
"df = pd.read_csv(datafile_path)\n",
|
||||
"df[\"babbage_search\"] = df.babbage_search.apply(eval).apply(np.array)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -39,11 +39,11 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Fantastic Instant Refried beans: Fantastic Instant Refried Beans have been a staple for my family now for nearly 20 years. All 7 of us love it and my grown kids are passing on the tradition.\n",
|
||||
"\n",
|
||||
"Jamaican Blue beans: Excellent coffee bean for roasting. Our family just purchased another 5 pounds for more roasting. Plenty of flavor and mild on acidity when roasted to a dark brown bean and befor\n",
|
||||
"\n",
|
||||
"Good Buy: I liked the beans. They were vacuum sealed, plump and moist. Would recommend them for any use. I personally split and stuck them in some vodka to make vanilla extract. Yum!\n",
|
||||
"\n",
|
||||
"Fantastic Instant Refried beans: Fantastic Instant Refried Beans have been a staple for my family now for nearly 20 years. All 7 of us love it and my grown kids are passing on the tradition.\n",
|
||||
"Delicious!: I enjoy this white beans seasoning, it gives a rich flavor to the beans I just love it, my mother in law didn't know about this Zatarain's brand and now she is traying different seasoning\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
@ -53,16 +53,26 @@
|
||||
"\n",
|
||||
"# search through the reviews for a specific product\n",
|
||||
"def search_reviews(df, product_description, n=3, pprint=True):\n",
|
||||
" embedding = get_embedding(product_description, engine='text-search-babbage-query-001')\n",
|
||||
" df['similarities'] = df.babbage_search.apply(lambda x: cosine_similarity(x, embedding))\n",
|
||||
" embedding = get_embedding(\n",
|
||||
" product_description,\n",
|
||||
" engine=\"text-search-babbage-query-001\"\n",
|
||||
" )\n",
|
||||
" df[\"similarities\"] = df.babbage_search.apply(lambda x: cosine_similarity(x, embedding))\n",
|
||||
"\n",
|
||||
" res = df.sort_values('similarities', ascending=False).head(n).combined.str.replace('Title: ','').str.replace('; Content:', ': ')\n",
|
||||
" res = (\n",
|
||||
" df.sort_values(\"similarities\", ascending=False)\n",
|
||||
" .head(n)\n",
|
||||
" .combined.str.replace(\"Title: \", \"\")\n",
|
||||
" .str.replace(\"; Content:\", \": \")\n",
|
||||
" )\n",
|
||||
" if pprint:\n",
|
||||
" for r in res:\n",
|
||||
" print(r[:200])\n",
|
||||
" print()\n",
|
||||
" return res\n",
|
||||
"res = search_reviews(df, 'delicious beans', n=3)\n"
|
||||
"\n",
|
||||
"\n",
|
||||
"res = search_reviews(df, \"delicious beans\", n=3)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -74,17 +84,17 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Rustichella ROCKS!: Anything this company makes is worthwhile eating! My favorite is their Trenne.<br />Their whole wheat pasta is the best I have ever had.\n",
|
||||
"\n",
|
||||
"sooo good: tastes so good. Worth the money. My boyfriend hates wheat pasta and LOVES this. cooks fast tastes great.I love this brand and started buying more of their pastas. Bulk is best.\n",
|
||||
"\n",
|
||||
"Wonderful: Came quickly. Was plentiful and delicious and cheaper than in the store. You will enjoy it if you like thick pasta.\n",
|
||||
"Tasty and Quick Pasta: Barilla Whole Grain Fusilli with Vegetable Marinara is tasty and has an excellent chunky vegetable marinara. I just wish there was more of it. If you aren't starving or on a \n",
|
||||
"\n",
|
||||
"Rustichella ROCKS!: Anything this company makes is worthwhile eating! My favorite is their Trenne.<br />Their whole wheat pasta is the best I have ever had.\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"res = search_reviews(df, 'whole wheat pasta', n=3)"
|
||||
"res = search_reviews(df, \"whole wheat pasta\", n=3)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -109,7 +119,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"res = search_reviews(df, 'bad delivery', n=1)"
|
||||
"res = search_reviews(df, \"bad delivery\", n=1)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -134,7 +144,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"res = search_reviews(df, 'spoilt', n=1)"
|
||||
"res = search_reviews(df, \"spoilt\", n=1)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -148,22 +158,20 @@
|
||||
"text": [
|
||||
"Good food: The only dry food my queen cat will eat. Helps prevent hair balls. Good packaging. Arrives promptly. Recommended by a friend who sells pet food.\n",
|
||||
"\n",
|
||||
"A great deal on Greenies: Paid only $22 with free shipping for 96 teenies compared to about $35 at the pet store. How can you go wrong with a deal like that? The dog begs for his daily Greenie. Got \n",
|
||||
"Good product: I like that this is a better product for my pets but really for the price of it I couldn't afford to buy this all the time. My cat isn't very picky usually and she ate this, we usually \n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"res = search_reviews(df, 'pet food', n=2)"
|
||||
"res = search_reviews(df, \"pet food\", n=2)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"interpreter": {
|
||||
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.7.3 64-bit ('base': conda)",
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
@ -176,9 +184,14 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
@ -39,7 +39,7 @@
|
||||
"import numpy as np\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"\n",
|
||||
"df = pd.read_csv('output/embedded_babbage_similarity_50k.csv', index_col=0)\n",
|
||||
"df = pd.read_csv('output/embedded_babbage_similarity_50k.csv', index_col=0) # note that you will need to generate this file to run the code below\n",
|
||||
"df['babbage_similarity'] = df.babbage_similarity.apply(eval).apply(np.array)\n",
|
||||
"X_train, X_test, y_train, y_test = train_test_split(df, df.Score, test_size = 0.2, random_state=42)\n",
|
||||
"\n",
|
||||
|
196
examples/azure/embeddings.ipynb
Normal file
@ -0,0 +1,196 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure embeddings example\n",
|
||||
"In this example we'll try to go over all operations for embeddings that can be done using the Azure endpoints. \\\n",
|
||||
"This example focuses on finetuning but touches on the majority of operations that are also available using the API. This example is meant to be a quick way of showing simple operations and is not meant as a tutorial."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"from openai import cli"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"In the following section the endpoint and key need to be set up of the next sections to work. \\\n",
|
||||
"Please go to https://portal.azure.com, find your resource and then under \"Resource Management\" -> \"Keys and Endpoints\" look for the \"Endpoint\" value and one of the Keys. They will act as api_base and api_key in the code below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai.api_key = '' # Please add your api key here\n",
|
||||
"openai.api_base = '' # Please add your endpoint here\n",
|
||||
"\n",
|
||||
"openai.api_type = 'azure'\n",
|
||||
"openai.api_version = '2022-03-01-preview' # this may change in the future"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deployments\n",
|
||||
"In this section we are going to create a deployment using the finetune model that we just adapted and then used the deployment to create a simple completion operation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deployments: Create Manually\n",
|
||||
"Let's create a deployment using the text-similarity-curie-001 engine. You can create a new deployment by going to your Resource in your portal under \"Resource Management\" -> \"Deployments\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### (Optional) Deployments: Create Programatically\n",
|
||||
"We can also create a deployment using code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = \"text-similarity-curie-001\"\n",
|
||||
"\n",
|
||||
"# Now let's create the deployment\n",
|
||||
"print(f'Creating a new deployment with model: {model}')\n",
|
||||
"result = openai.Deployment.create(model=model, scale_settings={\"scale_type\":\"standard\"})\n",
|
||||
"deployment_id = result[\"id\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### (Optional) Deployments: Retrieving\n",
|
||||
"Now let's check the status of the newly created deployment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f'Checking for deployment status.')\n",
|
||||
"resp = openai.Deployment.retrieve(id=deployment_id)\n",
|
||||
"status = resp[\"status\"]\n",
|
||||
"print(f'Deployment {deployment_id} is with status: {status}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deployments: Listing\n",
|
||||
"Now because creating a new deployment takes a long time, let's look in the subscription for an already finished deployment that succeeded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('While deployment running, selecting a completed one.')\n",
|
||||
"deployment_id = None\n",
|
||||
"result = openai.Deployment.list()\n",
|
||||
"for deployment in result.data:\n",
|
||||
" if deployment[\"status\"] == \"succeeded\":\n",
|
||||
" deployment_id = deployment[\"id\"]\n",
|
||||
" break\n",
|
||||
"\n",
|
||||
"if not deployment_id:\n",
|
||||
" print('No deployment with status: succeeded found.')\n",
|
||||
"else:\n",
|
||||
" print(f'Found a successful deployment with id: {deployment_id}.')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Embeddings\n",
|
||||
"Now let's send a sample embedding to the deployment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = openai.Embedding.create(deployment_id=deployment_id,\n",
|
||||
" input=\"The food was delicious and the waiter...\")\n",
|
||||
" \n",
|
||||
"print(embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### (Optional) Deployments: Delete\n",
|
||||
"Finally let's delete the deployment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f'Deleting deployment: {deployment_id}')\n",
|
||||
"openai.Deployment.delete(sid=deployment_id)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.9 ('openai')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
475
examples/azure/finetuning.ipynb
Normal file
@ -0,0 +1,475 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Fine tuning example\n",
|
||||
"In this example we'll try to go over all operations that can be done using the Azure endpoints and their differences with the openAi endpoints (if any).<br>\n",
|
||||
"This example focuses on finetuning but touches on the majority of operations that are also available using the API. This example is meant to be a quick way of showing simple operations and is not meant as a finetune model adaptation tutorial.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"from openai import cli"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"In the following section the endpoint and key need to be set up of the next sections to work.<br> Please go to https://portal.azure.com, find your resource and then under \"Resource Management\" -> \"Keys and Endpoints\" look for the \"Endpoint\" value and one of the Keys. They will act as api_base and api_key in the code below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai.api_key = '' # Please add your api key here\n",
|
||||
"openai.api_base = '' # Please add your endpoint here\n",
|
||||
"\n",
|
||||
"openai.api_type = 'azure'\n",
|
||||
"openai.api_version = '2022-03-01-preview' # this may change in the future"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Microsoft Active Directory Authentication\n",
|
||||
"Instead of key based authentication, you can use Active Directory to authenticate using credential tokens. Uncomment the next code section to use credential based authentication:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\"\"\"\n",
|
||||
"from azure.identity import DefaultAzureCredential\n",
|
||||
"\n",
|
||||
"default_credential = DefaultAzureCredential()\n",
|
||||
"token = default_credential.get_token(\"https://cognitiveservices.azure.com\")\n",
|
||||
"\n",
|
||||
"openai.api_type = 'azure_ad'\n",
|
||||
"openai.api_key = token.token\n",
|
||||
"openai.api_version = '2022-03-01-preview' # this may change in the future\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"openai.api_base = '' # Please add your endpoint here\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Files\n",
|
||||
"In the next section we will focus on the files operations: importing, listing, retrieving, deleting. For this we need to create 2 temporary files with some sample data. For the sake of simplicity, we will use the same data for training and validation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import shutil\n",
|
||||
"import json\n",
|
||||
"\n",
|
||||
"training_file_name = 'training.jsonl'\n",
|
||||
"validation_file_name = 'validation.jsonl'\n",
|
||||
"\n",
|
||||
"sample_data = [{\"prompt\": \"When I go to the store, I want an\", \"completion\": \"apple\"},\n",
|
||||
" {\"prompt\": \"When I go to work, I want a\", \"completion\": \"coffe\"},\n",
|
||||
" {\"prompt\": \"When I go home, I want a\", \"completion\": \"soda\"}]\n",
|
||||
"\n",
|
||||
"print(f'Generating the training file: {training_file_name}')\n",
|
||||
"with open(training_file_name, 'w') as training_file:\n",
|
||||
" for entry in sample_data:\n",
|
||||
" json.dump(entry, training_file)\n",
|
||||
" training_file.write('\\n')\n",
|
||||
"\n",
|
||||
"print(f'Copying the training file to the validation file')\n",
|
||||
"shutil.copy(training_file_name, validation_file_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Files: Listing\n",
|
||||
"List all of the uploaded files and check for the ones that are named \"training.jsonl\" or \"validation.jsonl\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('Checking for existing uploaded files.')\n",
|
||||
"results = []\n",
|
||||
"files = openai.File.list().data\n",
|
||||
"print(f'Found {len(files)} total uploaded files in the subscription.')\n",
|
||||
"for item in files:\n",
|
||||
" if item[\"filename\"] in [training_file_name, validation_file_name]:\n",
|
||||
" results.append(item[\"id\"])\n",
|
||||
"print(f'Found {len(results)} already uploaded files that match our names.')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Files: Deleting\n",
|
||||
"Let's now delete those found files (if any) since we're going to be re-uploading them next."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f'Deleting already uploaded files.')\n",
|
||||
"for id in results:\n",
|
||||
" openai.File.delete(sid = id)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Files: Importing & Retrieving\n",
|
||||
"Now, let's import our two files ('training.jsonl' and 'validation.jsonl') and keep those IDs since we're going to use them later for finetuning.<br>\n",
|
||||
"For this operation we are going to use the cli wrapper which does a bit more checks before uploading and also gives us progress. In addition, after uploading we're going to check the status our import until it has succeeded (or failed if something goes wrong)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"\n",
|
||||
"def check_status(training_id, validation_id):\n",
|
||||
" train_status = openai.File.retrieve(training_id)[\"status\"]\n",
|
||||
" valid_status = openai.File.retrieve(validation_id)[\"status\"]\n",
|
||||
" print(f'Status (training_file | validation_file): {train_status} | {valid_status}')\n",
|
||||
" return (train_status, valid_status)\n",
|
||||
"\n",
|
||||
"#importing our two files\n",
|
||||
"training_id = cli.FineTune._get_or_upload(training_file_name, True)\n",
|
||||
"validation_id = cli.FineTune._get_or_upload(validation_file_name, True)\n",
|
||||
"\n",
|
||||
"#checking the status of the imports\n",
|
||||
"(train_status, valid_status) = check_status(training_id, validation_id)\n",
|
||||
"\n",
|
||||
"while train_status not in [\"succeeded\", \"failed\"] or valid_status not in [\"succeeded\", \"failed\"]:\n",
|
||||
" time.sleep(1)\n",
|
||||
" (train_status, valid_status) = check_status(training_id, validation_id)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Files: Downloading\n",
|
||||
"Now let's download one of the files, the training file for example, to check that everything was in order during importing and all bits are there."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f'Downloading training file: {training_id}')\n",
|
||||
"result = openai.File.download(training_id)\n",
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Finetune\n",
|
||||
"In this section we are going to use the two training and validation files that we imported in the previous section, to train a finetune model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Finetune: Adapt\n",
|
||||
"First let's create the finetune adaptation job."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"create_args = {\n",
|
||||
" \"training_file\": training_id,\n",
|
||||
" \"validation_file\": validation_id,\n",
|
||||
" \"model\": \"curie\",\n",
|
||||
" \"compute_classification_metrics\": True,\n",
|
||||
" \"classification_n_classes\": 3\n",
|
||||
"}\n",
|
||||
"resp = openai.FineTune.create(**create_args)\n",
|
||||
"job_id = resp[\"id\"]\n",
|
||||
"status = resp[\"status\"]\n",
|
||||
"\n",
|
||||
"print(f'Fine-tunning model with jobID: {job_id}.')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Finetune: Streaming\n",
|
||||
"While the job runs, we can subscribe to the streaming events to check the progress of the operation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import signal\n",
|
||||
"import datetime\n",
|
||||
"\n",
|
||||
"def signal_handler(sig, frame):\n",
|
||||
" status = openai.FineTune.retrieve(job_id).status\n",
|
||||
" print(f\"Stream interrupted. Job is still {status}.\")\n",
|
||||
" return\n",
|
||||
"\n",
|
||||
"print('Streaming events for the fine-tuning job: {job_id}')\n",
|
||||
"signal.signal(signal.SIGINT, signal_handler)\n",
|
||||
"\n",
|
||||
"events = openai.FineTune.stream_events(job_id)\n",
|
||||
"try:\n",
|
||||
" for event in events:\n",
|
||||
" print(f'{datetime.datetime.fromtimestamp(event[\"created_at\"])} {event[\"message\"]}')\n",
|
||||
"\n",
|
||||
"except Exception:\n",
|
||||
" print(\"Stream interrupted (client disconnected).\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Finetune: Listing and Retrieving\n",
|
||||
"Now let's check that our operation was successful and in addition we can look at all of the finetuning operations using a list operation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"status = openai.FineTune.retrieve(id=job_id)[\"status\"]\n",
|
||||
"if status not in [\"succeeded\", \"failed\"]:\n",
|
||||
" print(f'Job not in terminal status: {status}. Waiting.')\n",
|
||||
" while status not in [\"succeeded\", \"failed\"]:\n",
|
||||
" time.sleep(2)\n",
|
||||
" status = openai.FineTune.retrieve(id=job_id)[\"status\"]\n",
|
||||
" print(f'Status: {status}')\n",
|
||||
"else:\n",
|
||||
" print(f'Finetune job {job_id} finished with status: {status}')\n",
|
||||
"\n",
|
||||
"print('Checking other finetune jobs in the subscription.')\n",
|
||||
"result = openai.FineTune.list()\n",
|
||||
"print(f'Found {len(result)} finetune jobs.')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Finetune: Deleting\n",
|
||||
"Finally we can delete our finetune job.<br>\n",
|
||||
"WARNING: Please skip this step if you want to continue with the next section as the finetune model is needed. (The delete code is commented out by default)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# openai.FineTune.delete(sid=job_id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deployments\n",
|
||||
"In this section we are going to create a deployment using the finetune model that we just adapted and then used the deployment to create a simple completion operation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deployments: Create\n",
|
||||
"Let's create a deployment using the fine-tune model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Fist let's get the model of the previous job:\n",
|
||||
"result = openai.FineTune.retrieve(id=job_id)\n",
|
||||
"if result[\"status\"] == 'succeeded':\n",
|
||||
" model = result[\"fine_tuned_model\"]\n",
|
||||
"\n",
|
||||
"# Now let's create the deployment\n",
|
||||
"print(f'Creating a new deployment with model: {model}')\n",
|
||||
"result = openai.Deployment.create(model=model, scale_settings={\"scale_type\":\"standard\"})\n",
|
||||
"deployment_id = result[\"id\"]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deployments: Retrieving\n",
|
||||
"Now let's check the status of the newly created deployment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f'Checking for deployment status.')\n",
|
||||
"resp = openai.Deployment.retrieve(id=deployment_id)\n",
|
||||
"status = resp[\"status\"]\n",
|
||||
"print(f'Deployment {deployment_id} is with status: {status}')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deployments: Listing\n",
|
||||
"Now because creating a new deployment takes a long time, let's look in the subscription for an already finished deployment that succeeded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('While deployment running, selecting a completed one.')\n",
|
||||
"deployment_id = None\n",
|
||||
"result = openai.Deployment.list()\n",
|
||||
"for deployment in result.data:\n",
|
||||
" if deployment[\"status\"] == \"succeeded\":\n",
|
||||
" deployment_id = deployment[\"id\"]\n",
|
||||
" break\n",
|
||||
"\n",
|
||||
"if not deployment_id:\n",
|
||||
" print('No deployment with status: succeeded found.')\n",
|
||||
"else:\n",
|
||||
" print(f'Found a successful deployment with id: {deployment_id}.')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Completions\n",
|
||||
"Now let's send a sample completion to the deployment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print('Sending a test completion job')\n",
|
||||
"start_phrase = 'When I go to the store, I want a'\n",
|
||||
"response = openai.Completion.create(deployment_id=deployment_id, prompt=start_phrase, max_tokens=4)\n",
|
||||
"text = response['choices'][0]['text'].replace('\\n', '').replace(' .', '.').strip()\n",
|
||||
"print(f'\"{start_phrase} {text}\"')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deployments: Delete\n",
|
||||
"Finally let's delete the deployment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f'Deleting deployment: {deployment_id}')\n",
|
||||
"openai.Deployment.delete(sid=deployment_id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Thank you"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.9 64-bit ('3.9.9')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.9"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "cb9817b186a29e4e9713184d901f26c1ee05ad25243d878baff7f31bb1fef480"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
1001
examples/data/fine_food_reviews_1k.csv
Normal file
2001
examples/data/snli_1.0_train_2k.csv
Normal file
@ -1,150 +0,0 @@
|
||||
import argparse
|
||||
|
||||
import openai
|
||||
|
||||
|
||||
def create_context(
|
||||
question, search_file_id, max_len=1800, search_model="ada", max_rerank=10
|
||||
):
|
||||
"""
|
||||
Create a context for a question by finding the most similar context from the search file.
|
||||
:param question: The question
|
||||
:param search_file_id: The file id of the search file
|
||||
:param max_len: The maximum length of the returned context (in tokens)
|
||||
:param search_model: The search model to use
|
||||
:param max_rerank: The maximum number of reranking
|
||||
:return: The context
|
||||
"""
|
||||
results = openai.Engine(search_model).search(
|
||||
search_model=search_model,
|
||||
query=question,
|
||||
max_rerank=max_rerank,
|
||||
file=search_file_id,
|
||||
return_metadata=True,
|
||||
)
|
||||
returns = []
|
||||
cur_len = 0
|
||||
for result in results["data"]:
|
||||
cur_len += int(result["metadata"]) + 4
|
||||
if cur_len > max_len:
|
||||
break
|
||||
returns.append(result["text"])
|
||||
return "\n\n###\n\n".join(returns)
|
||||
|
||||
|
||||
def answer_question(
|
||||
search_file_id="<SEARCH_FILE_ID>",
|
||||
fine_tuned_qa_model="<FT_QA_MODEL_ID>",
|
||||
question="Which country won the European Football championship in 2021?",
|
||||
max_len=1800,
|
||||
search_model="ada",
|
||||
max_rerank=10,
|
||||
debug=False,
|
||||
stop_sequence=["\n", "."],
|
||||
max_tokens=100,
|
||||
):
|
||||
"""
|
||||
Answer a question based on the most similar context from the search file, using your fine-tuned model.
|
||||
:param question: The question
|
||||
:param fine_tuned_qa_model: The fine tuned QA model
|
||||
:param search_file_id: The file id of the search file
|
||||
:param max_len: The maximum length of the returned context (in tokens)
|
||||
:param search_model: The search model to use
|
||||
:param max_rerank: The maximum number of reranking
|
||||
:param debug: Whether to output debug information
|
||||
:param stop_sequence: The stop sequence for Q&A model
|
||||
:param max_tokens: The maximum number of tokens to return
|
||||
:return: The answer
|
||||
"""
|
||||
context = create_context(
|
||||
question,
|
||||
search_file_id,
|
||||
max_len=max_len,
|
||||
search_model=search_model,
|
||||
max_rerank=max_rerank,
|
||||
)
|
||||
if debug:
|
||||
print("Context:\n" + context)
|
||||
print("\n\n")
|
||||
try:
|
||||
# fine-tuned models requires model parameter, whereas other models require engine parameter
|
||||
model_param = (
|
||||
{"model": fine_tuned_qa_model}
|
||||
if ":" in fine_tuned_qa_model
|
||||
and fine_tuned_qa_model.split(":")[1].startswith("ft")
|
||||
else {"engine": fine_tuned_qa_model}
|
||||
)
|
||||
response = openai.Completion.create(
|
||||
prompt=f"Answer the question based on the context below\n\nText: {context}\n\n---\n\nQuestion: {question}\nAnswer:",
|
||||
temperature=0,
|
||||
max_tokens=max_tokens,
|
||||
top_p=1,
|
||||
frequency_penalty=0,
|
||||
presence_penalty=0,
|
||||
stop=stop_sequence,
|
||||
**model_param,
|
||||
)
|
||||
return response["choices"][0]["text"]
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return ""
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Rudimentary functionality of the answers endpoint with a fine-tuned Q&A model.",
|
||||
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--search_file_id", help="Search file id", required=True, type=str
|
||||
)
|
||||
parser.add_argument(
|
||||
"--fine_tuned_qa_model", help="Fine-tuned QA model id", required=True, type=str
|
||||
)
|
||||
parser.add_argument(
|
||||
"--question", help="Question to answer", required=True, type=str
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max_len",
|
||||
help="Maximum length of the returned context (in tokens)",
|
||||
default=1800,
|
||||
type=int,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--search_model", help="Search model to use", default="ada", type=str
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max_rerank",
|
||||
help="Maximum number of reranking for the search",
|
||||
default=10,
|
||||
type=int,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--debug", help="Print debug information (context used)", action="store_true"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--stop_sequence",
|
||||
help="Stop sequences for the Q&A model",
|
||||
default=["\n", "."],
|
||||
nargs="+",
|
||||
type=str,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max_tokens",
|
||||
help="Maximum number of tokens to return",
|
||||
default=100,
|
||||
type=int,
|
||||
)
|
||||
args = parser.parse_args()
|
||||
response = answer_question(
|
||||
search_file_id=args.search_file_id,
|
||||
fine_tuned_qa_model=args.fine_tuned_qa_model,
|
||||
question=args.question,
|
||||
max_len=args.max_len,
|
||||
search_model=args.search_model,
|
||||
max_rerank=args.max_rerank,
|
||||
debug=args.debug,
|
||||
stop_sequence=args.stop_sequence,
|
||||
max_tokens=args.max_tokens,
|
||||
)
|
||||
print(f"Answer:{response}")
|
@ -1,637 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 3. Train a fine-tuning model specialized for Q&A\n",
|
||||
"This notebook will utilize the dataset of context, question and answer pairs to additionally create adversarial questions and context pairs, where the question was not generated on that context. In those cases the model will be prompted to answer \"No sufficient context for answering the question\". We will also train a discriminator model, which predicts whether the question can be answered based on the context or not.\n",
|
||||
"\n",
|
||||
"We will add hard adversarial examples as well, which will be based either on semantically similar sections, or neighbouring sections, originating from the same article."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>title</th>\n",
|
||||
" <th>heading</th>\n",
|
||||
" <th>content</th>\n",
|
||||
" <th>tokens</th>\n",
|
||||
" <th>context</th>\n",
|
||||
" <th>questions</th>\n",
|
||||
" <th>answers</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>2020 Summer Olympics</td>\n",
|
||||
" <td>Summary</td>\n",
|
||||
" <td>The 2020 Summer Olympics (Japanese: 2020年夏季オリン...</td>\n",
|
||||
" <td>713</td>\n",
|
||||
" <td>2020 Summer Olympics\\nSummary\\n\\nThe 2020 Summ...</td>\n",
|
||||
" <td>1. What is the 2020 Summer Olympics?\\n2. When ...</td>\n",
|
||||
" <td>1. The 2020 Summer Olympics is an internationa...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>2020 Summer Olympics</td>\n",
|
||||
" <td>Host city selection</td>\n",
|
||||
" <td>The International Olympic Committee (IOC) vote...</td>\n",
|
||||
" <td>126</td>\n",
|
||||
" <td>2020 Summer Olympics\\nHost city selection\\n\\nT...</td>\n",
|
||||
" <td>1. \\n2. \\n3. \\n4.</td>\n",
|
||||
" <td>1. What is the International Olympic Committee...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>2020 Summer Olympics</td>\n",
|
||||
" <td>Impact of the COVID-19 pandemic</td>\n",
|
||||
" <td>In January 2020, concerns were raised about th...</td>\n",
|
||||
" <td>369</td>\n",
|
||||
" <td>2020 Summer Olympics\\nImpact of the COVID-19 p...</td>\n",
|
||||
" <td>1. What was the COVID-19 pandemic?\\n2. How did...</td>\n",
|
||||
" <td>1. The COVID-19 pandemic was a pandemic that o...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>2020 Summer Olympics</td>\n",
|
||||
" <td>Qualifying event cancellation and postponement</td>\n",
|
||||
" <td>Concerns about the pandemic began to affect qu...</td>\n",
|
||||
" <td>298</td>\n",
|
||||
" <td>2020 Summer Olympics\\nQualifying event cancell...</td>\n",
|
||||
" <td>1. What was the original location of the Asia ...</td>\n",
|
||||
" <td>1. The original location of the Asia & Oceania...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>2020 Summer Olympics</td>\n",
|
||||
" <td>Effect on doping tests</td>\n",
|
||||
" <td>Mandatory doping tests were being severely res...</td>\n",
|
||||
" <td>163</td>\n",
|
||||
" <td>2020 Summer Olympics\\nEffect on doping tests\\n...</td>\n",
|
||||
" <td>1. What was the COVID-19 pandemic?\\n2. What di...</td>\n",
|
||||
" <td>1. The COVID-19 pandemic was a pandemic that o...</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" title heading \\\n",
|
||||
"0 2020 Summer Olympics Summary \n",
|
||||
"1 2020 Summer Olympics Host city selection \n",
|
||||
"2 2020 Summer Olympics Impact of the COVID-19 pandemic \n",
|
||||
"3 2020 Summer Olympics Qualifying event cancellation and postponement \n",
|
||||
"4 2020 Summer Olympics Effect on doping tests \n",
|
||||
"\n",
|
||||
" content tokens \\\n",
|
||||
"0 The 2020 Summer Olympics (Japanese: 2020年夏季オリン... 713 \n",
|
||||
"1 The International Olympic Committee (IOC) vote... 126 \n",
|
||||
"2 In January 2020, concerns were raised about th... 369 \n",
|
||||
"3 Concerns about the pandemic began to affect qu... 298 \n",
|
||||
"4 Mandatory doping tests were being severely res... 163 \n",
|
||||
"\n",
|
||||
" context \\\n",
|
||||
"0 2020 Summer Olympics\\nSummary\\n\\nThe 2020 Summ... \n",
|
||||
"1 2020 Summer Olympics\\nHost city selection\\n\\nT... \n",
|
||||
"2 2020 Summer Olympics\\nImpact of the COVID-19 p... \n",
|
||||
"3 2020 Summer Olympics\\nQualifying event cancell... \n",
|
||||
"4 2020 Summer Olympics\\nEffect on doping tests\\n... \n",
|
||||
"\n",
|
||||
" questions \\\n",
|
||||
"0 1. What is the 2020 Summer Olympics?\\n2. When ... \n",
|
||||
"1 1. \\n2. \\n3. \\n4. \n",
|
||||
"2 1. What was the COVID-19 pandemic?\\n2. How did... \n",
|
||||
"3 1. What was the original location of the Asia ... \n",
|
||||
"4 1. What was the COVID-19 pandemic?\\n2. What di... \n",
|
||||
"\n",
|
||||
" answers \n",
|
||||
"0 1. The 2020 Summer Olympics is an internationa... \n",
|
||||
"1 1. What is the International Olympic Committee... \n",
|
||||
"2 1. The COVID-19 pandemic was a pandemic that o... \n",
|
||||
"3 1. The original location of the Asia & Oceania... \n",
|
||||
"4 1. The COVID-19 pandemic was a pandemic that o... "
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"import pandas as pd\n",
|
||||
"df = pd.read_csv('olympics-data/olympics_qa.csv')\n",
|
||||
"olympics_search_fileid = \"file-c3shd8wqF3vSCKaukW4Jr1TT\"\n",
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Split the sections into a training and testing set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(3014, 754)"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)\n",
|
||||
"len(train_df), len(test_df)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"we check that he separator we intend to use isn't present within the contexts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"0"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df.context.str.contains('->').sum()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3.1 Create the fine-tuning datasets for Q&A and discriminator models\n",
|
||||
"The fine-tuning dataset is created in the following way. For every corresponding question, answer and context pair we create:\n",
|
||||
"- Positive example: correct question, answer, context pair\n",
|
||||
"- Negative examples:\n",
|
||||
" - random negative example, where the random context is paired with the question \n",
|
||||
" - two hard negative examples\n",
|
||||
" - one originating from the same wikipedia article\n",
|
||||
" - another, which is most similar to the correct context\n",
|
||||
"\n",
|
||||
"This process is noisy, as sometimes the question might be answerable given a different context, but on average we hope this won't affect the peformance too much.\n",
|
||||
"\n",
|
||||
"We apply the same process of dataset creation for both the discriminator, and the Q&A answering model. We apply the process separately for the training and testing set, to ensure that the examples from the traing set don't feature within the test set."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import random\n",
|
||||
"\n",
|
||||
"def get_random_similar_contexts(question, context, file_id=olympics_search_fileid, search_model='ada', max_rerank=10):\n",
|
||||
" \"\"\"\n",
|
||||
" Find similar contexts to the given context using the search file\n",
|
||||
" \"\"\"\n",
|
||||
" try:\n",
|
||||
" results = openai.Engine(search_model).search(\n",
|
||||
" search_model=search_model, \n",
|
||||
" query=question, \n",
|
||||
" max_rerank=max_rerank,\n",
|
||||
" file=file_id\n",
|
||||
" )\n",
|
||||
" candidates = []\n",
|
||||
" for result in results['data'][:3]:\n",
|
||||
" if result['text'] == context:\n",
|
||||
" continue\n",
|
||||
" candidates.append(result['text'])\n",
|
||||
" random_candidate = random.choice(candidates)\n",
|
||||
" return random_candidate\n",
|
||||
" except Exception as e:\n",
|
||||
" print(e)\n",
|
||||
" return \"\"\n",
|
||||
"\n",
|
||||
"def create_fine_tuning_dataset(df, discriminator=False, n_negative=1, add_related=False):\n",
|
||||
" \"\"\"\n",
|
||||
" Create a dataset for fine tuning the OpenAI model; either for a discriminator model, \n",
|
||||
" or a model specializing in Q&A, where it says if no relevant context is found.\n",
|
||||
"\n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" df: pd.DataFrame\n",
|
||||
" The dataframe containing the question, answer and context pairs\n",
|
||||
" discriminator: bool\n",
|
||||
" Whether to create a dataset for the discriminator\n",
|
||||
" n_negative: int\n",
|
||||
" The number of random negative samples to add (using a random context)\n",
|
||||
" add_related: bool\n",
|
||||
" Whether to add the related contexts to the correct context. These are hard negative examples\n",
|
||||
"\n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" pd.DataFrame\n",
|
||||
" The dataframe containing the prompts and completions, ready for fine-tuning\n",
|
||||
" \"\"\"\n",
|
||||
" rows = []\n",
|
||||
" for i, row in df.iterrows():\n",
|
||||
" for q, a in zip((\"1.\" + row.questions).split('\\n'), (\"1.\" + row.answers).split('\\n')):\n",
|
||||
" if len(q) >10 and len(a) >10:\n",
|
||||
" if discriminator:\n",
|
||||
" rows.append({\"prompt\":f\"{row.context}\\nQuestion: {q[2:].strip()}\\n Related:\", \"completion\":f\" yes\"})\n",
|
||||
" else:\n",
|
||||
" rows.append({\"prompt\":f\"{row.context}\\nQuestion: {q[2:].strip()}\\nAnswer:\", \"completion\":f\" {a[2:].strip()}\"})\n",
|
||||
"\n",
|
||||
" for i, row in df.iterrows():\n",
|
||||
" for q in (\"1.\" + row.questions).split('\\n'):\n",
|
||||
" if len(q) >10:\n",
|
||||
" for j in range(n_negative + (2 if add_related else 0)):\n",
|
||||
" random_context = \"\"\n",
|
||||
" if j == 0 and add_related:\n",
|
||||
" # add the related contexts based on originating from the same wikipedia page\n",
|
||||
" subset = df[(df.title == row.title) & (df.context != row.context)]\n",
|
||||
" \n",
|
||||
" if len(subset) < 1:\n",
|
||||
" continue\n",
|
||||
" random_context = subset.sample(1).iloc[0].context\n",
|
||||
" if j == 1 and add_related:\n",
|
||||
" # add the related contexts based on the most similar contexts according to the search\n",
|
||||
" random_context = get_random_similar_contexts(q[2:].strip(), row.context, search_model='ada', max_rerank=10)\n",
|
||||
" else:\n",
|
||||
" while True:\n",
|
||||
" # add random context, which isn't the correct context\n",
|
||||
" random_context = df.sample(1).iloc[0].context\n",
|
||||
" if random_context != row.context:\n",
|
||||
" break\n",
|
||||
" if discriminator:\n",
|
||||
" rows.append({\"prompt\":f\"{random_context}\\nQuestion: {q[2:].strip()}\\n Related:\", \"completion\":f\" no\"})\n",
|
||||
" else:\n",
|
||||
" rows.append({\"prompt\":f\"{random_context}\\nQuestion: {q[2:].strip()}\\nAnswer:\", \"completion\":f\" No appropriate context found to answer the question.\"})\n",
|
||||
"\n",
|
||||
" return pd.DataFrame(rows) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We apply the same process of dataset creation for both the discriminator, and the Q&A answering model. We apply the process separately for the training and testing set, to ensure that the examples from the traing set don't feature within the test set."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": []
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for name, is_disc in [('discriminator', True), ('qa', False)]:\n",
|
||||
" for train_test, dt in [('train', train_df), ('test', test_df)]:\n",
|
||||
" ft = create_fine_tuning_dataset(dt, discriminator=is_disc, n_negative=1, add_related=True)\n",
|
||||
" ft.to_json(f'{name}_{train_test}.jsonl', orient='records', lines=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We formatted the data according to the recommendations from the fine-tuning tool, which is available using\n",
|
||||
"> openai tools fine_tunes.prepare_data -f qa_train.jsonl\n",
|
||||
"\n",
|
||||
"We highly recommend that you use this tool, which suggests improvements in your data formatting for fine-tuning.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3.2 Submit the datasets for fine-tuning"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": []
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!openai api fine_tunes.create -t \"olympics-data/discriminator_train.jsonl\" -v \"olympics-data/discriminator_test.jsonl\" --batch_size 16 --compute_classification_metrics --classification_positive_class \" yes\" --model ada"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": []
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!openai api fine_tunes.create -t \"olympics-data/qa_train.jsonl\" -v \"olympics-data/qa_test.jsonl\" --batch_size 16"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3.3 Using the fine-tuned models\n",
|
||||
"\n",
|
||||
"We will now use the fine-tuned discriminator and the fine-tuned Q&A model. By requesting logprobs, we can see how certain the discriminator is in a `yes` vs `no` answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[<OpenAIObject at 0x7fe812e602b0> JSON: {\n",
|
||||
" \" no\": -10.819577,\n",
|
||||
" \" yes\": -2.045765e-05\n",
|
||||
" }]"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"ft_discriminator = \"curie:ft-openai-internal-2021-08-23-23-58-57\"\n",
|
||||
"ft_qa = \"curie:ft-openai-internal-2021-08-23-17-54-10\"\n",
|
||||
"\n",
|
||||
"def apply_ft_discriminator(context, question, discriminator_model):\n",
|
||||
" \"\"\"\n",
|
||||
" Apply the fine tuned discriminator to a question, to assess whether it can be answered from the context.\n",
|
||||
" \"\"\"\n",
|
||||
" prompt = f\"{context}\\nQuestion: {question}\\n Related:\"\n",
|
||||
" result = openai.Completion.create(model=discriminator_model, prompt=prompt, max_tokens=1, temperature=0, top_p=1, n=1, logprobs=2)\n",
|
||||
" return result['choices'][0]['logprobs']['top_logprobs']\n",
|
||||
"\n",
|
||||
"apply_ft_discriminator('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.', \n",
|
||||
" 'What was the first human-made object in space?', ft_discriminator)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can see that the model can generalize well to different contexts and questions. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def apply_ft_qa_answer(context, question, answering_model):\n",
|
||||
" \"\"\"\n",
|
||||
" Apply the fine tuned discriminator to a question\n",
|
||||
" \"\"\"\n",
|
||||
" prompt = f\"{context}\\nQuestion: {question}\\nAnswer:\"\n",
|
||||
" result = openai.Completion.create(model=answering_model, prompt=prompt, max_tokens=30, temperature=0, top_p=1, n=1, stop=['.','\\n'])\n",
|
||||
" return result['choices'][0]['text']\n",
|
||||
"\n",
|
||||
"apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.', \n",
|
||||
" 'What was the first human-made object in space?', ft_qa)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can see that the model can answer the question, when the context is appropriate."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The Soviet Union was the first country to successfully launch a satellite into space'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.',\n",
|
||||
" 'What is impressive about the Soviet Union?', ft_qa)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' No appropriate context found to answer the question'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 1957.',\n",
|
||||
" 'How many cars were produced in the Soviet Union in 1970?', ft_qa)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can see that the model knows when to answer the question, and when to say that insufficient context is present to answer the question."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also combine a discriminator and a base model, or a fine-tuned Q&A model. Discriminator can essentially serve as a decision whether the question can be answered given the context or not."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Weather could cause a sport event to have no crowd'"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def answer_question_conditionally(answering_model, discriminator_model, context, question, discriminator_logprob_yes_modifier=0):\n",
|
||||
" logprobs = apply_ft_discriminator(context, question, discriminator_model)\n",
|
||||
" yes_logprob = logprobs[' yes'] if ' yes' in logprobs else -100\n",
|
||||
" no_logprob = logprobs[' no'] if ' no' in logprobs else -100\n",
|
||||
" if yes_logprob + discriminator_logprob_yes_modifier < no_logprob:\n",
|
||||
" return \" No appropriate context found to answer the question based on the discriminator.\"\n",
|
||||
" return apply_ft_qa_answer(context, question, answering_model)\n",
|
||||
"answer_question_conditionally(ft_qa, ft_discriminator, \n",
|
||||
" \"Crowdless games are a rare although not unheard-of occurrence in sports. \\\n",
|
||||
" When they do occur, it is usually the result of events beyond the control \\\n",
|
||||
" of the teams or fans, such as weather-related concerns, public health concerns, \\\n",
|
||||
" or wider civil disturbances unrelated to the game. For instance, \\\n",
|
||||
" the COVID-19 pandemic caused many sports leagues around the world \\\n",
|
||||
" to be played behind closed doors.\",\n",
|
||||
" \"Could weather cause a sport event to have no crowd?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above function illustrates how to potentially combine a discriminator and a fine-tuned Q&A model. This gives a more fine-grained control over how certain we want the model to be before it answers the question.\n",
|
||||
"\n",
|
||||
"We'll now take a look on how answers endpoint works - combining search to retrieve the relevant context from a knowledge base, and then using the fine-tuned Q&A model to answer the question."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3.4 Answering the question based on a knowledge base\n",
|
||||
"Finally we can use a logic similar to the [/answers](https://beta.openai.com/docs/api-reference/answers) endpoint, where we first search for the relevant context, and then ask a Q&A model to answer the question given that context. If you'd like to see the implementation details, check out the [`answers_with_ft.py`](answers_with_ft.py) file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" Canada won the Women's football tournament at the 2020 Olympic games\""
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from answers_with_ft import answer_question\n",
|
||||
"answer_question(olympics_search_fileid, ft_qa, \"Which country won the Women's football tournament at the 2020 Olympic games?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"interpreter": {
|
||||
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.7.3 64-bit ('base': conda)",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
BIN
images/chain_of_thought_fig1.png
Normal file
After Width: | Height: | Size: 799 KiB |
BIN
images/chain_of_thought_fig11.png
Normal file
After Width: | Height: | Size: 301 KiB |
BIN
images/chain_of_thought_fig3.png
Normal file
After Width: | Height: | Size: 355 KiB |
BIN
images/chain_of_thought_fig5.png
Normal file
After Width: | Height: | Size: 192 KiB |
BIN
images/faithful-reasoning_fig1.png
Normal file
After Width: | Height: | Size: 332 KiB |
BIN
images/faithful-reasoning_fig2.png
Normal file
After Width: | Height: | Size: 160 KiB |
BIN
images/faithful-reasoning_fig3.png
Normal file
After Width: | Height: | Size: 101 KiB |
BIN
images/faithful-reasoning_fig4.png
Normal file
After Width: | Height: | Size: 298 KiB |
BIN
images/faithful-reasoning_fig5.png
Normal file
After Width: | Height: | Size: 133 KiB |
BIN
images/faithful-reasoning_fig7.png
Normal file
After Width: | Height: | Size: 108 KiB |
BIN
images/faithful-reasoning_tab2.png
Normal file
After Width: | Height: | Size: 92 KiB |
BIN
images/faithful-reasoning_tab5.png
Normal file
After Width: | Height: | Size: 81 KiB |
BIN
images/least-to-most_fig1.png
Normal file
After Width: | Height: | Size: 303 KiB |
BIN
images/least-to-most_tab11.png
Normal file
After Width: | Height: | Size: 64 KiB |
BIN
images/least-to-most_tab4.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
images/least-to-most_tab9.png
Normal file
After Width: | Height: | Size: 55 KiB |
BIN
images/lm_cascades_fig1.png
Normal file
After Width: | Height: | Size: 32 KiB |
BIN
images/lm_cascades_fig3.png
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
images/lm_cascades_fig4.png
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
images/lm_cascades_fig5.png
Normal file
After Width: | Height: | Size: 57 KiB |
BIN
images/lm_cascades_fig6.png
Normal file
After Width: | Height: | Size: 35 KiB |
BIN
images/maieutic_fig2.png
Normal file
After Width: | Height: | Size: 126 KiB |
BIN
images/maieutic_fig6.png
Normal file
After Width: | Height: | Size: 198 KiB |
BIN
images/maieutic_tab1.png
Normal file
After Width: | Height: | Size: 175 KiB |
BIN
images/selection-inference_fig1.png
Normal file
After Width: | Height: | Size: 855 KiB |
BIN
images/selection-inference_fig4.png
Normal file
After Width: | Height: | Size: 487 KiB |
BIN
images/self-consistency_fig1.png
Normal file
After Width: | Height: | Size: 440 KiB |
BIN
images/self-consistency_fig3.png
Normal file
After Width: | Height: | Size: 692 KiB |
BIN
images/star_fig1.png
Normal file
After Width: | Height: | Size: 340 KiB |
BIN
images/star_tab1.png
Normal file
After Width: | Height: | Size: 273 KiB |
BIN
images/verifiers_fig3.png
Normal file
After Width: | Height: | Size: 216 KiB |
BIN
images/verifiers_fig5.png
Normal file
After Width: | Height: | Size: 281 KiB |
BIN
images/zero-shot_reasoners_fig1.png
Normal file
After Width: | Height: | Size: 1.8 MiB |
BIN
images/zero-shot_reasoners_fig2.png
Normal file
After Width: | Height: | Size: 362 KiB |
BIN
images/zero-shot_reasoners_tab1.png
Normal file
After Width: | Height: | Size: 323 KiB |
BIN
images/zero-shot_reasoners_tab5.png
Normal file
After Width: | Height: | Size: 305 KiB |
574
techniques_to_improve_reliability.md
Normal file
@ -0,0 +1,574 @@
|
||||
# Techniques to improve reliability
|
||||
|
||||
When GPT-3 fails on a task, what should you do?
|
||||
|
||||
- Search for a better prompt that elicits more reliable answers?
|
||||
- Invest in thousands of examples to fine-tune a custom model?
|
||||
- Assume the model is incapable of the task, and move on?
|
||||
|
||||
There is no simple answer - it depends. However, if your task involves logical reasoning or complexity, consider trying the techniques in this article to build more reliable, high-performing prompts.
|
||||
|
||||
## Why GPT-3 fails on complex tasks
|
||||
|
||||
If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit multiplication. With a few seconds, and some pen and paper, it's not too taxing to work out that 13 x 17 = 130 + 70 + 21 = 221.
|
||||
|
||||
Similarly, if you give GPT-3 a task that's too complex to do in the time it takes to calculate its next token, it may confabulate an incorrect guess. Yet, akin to humans, that doesn't necessarily mean the model is incapable of the task. With some time and space to reason things out, the model still may be able to answer reliably.
|
||||
|
||||
As an example, if you ask `text-davinci-002` the following math problem about juggling balls, it answers incorrectly:
|
||||
|
||||
```text-davinci-002
|
||||
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?
|
||||
A:
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
There are 8 blue golf balls.
|
||||
```
|
||||
|
||||
Does this mean that GPT-3 cannot do simple math problems? No; in fact, it turns out that by prompting the model with `Let's think step by step`, the model solves the problem reliably:
|
||||
|
||||
```text-davinci-002
|
||||
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?
|
||||
A: Let's think step by step.
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
There are 16 balls in total.
|
||||
Half of the balls are golf balls.
|
||||
That means that there are 8 golf balls.
|
||||
Half of the golf balls are blue.
|
||||
That means that there are 4 blue golf balls.
|
||||
```
|
||||
|
||||
Of course, it's hard to tell from only a single example whether this `Let's think step by step` trick actually works in general or just got lucky on this particular problem. But it really does work. On a benchmark of word math problems, the `Let's think step by step` trick raised GPT-3's solve rate massively, from a worthless 18% to a decent 79%!
|
||||
|
||||
## Model capabilities depend on context
|
||||
|
||||
When learning to work with GPT-3, one common conceptual mistake is to believe that its capabilities are fixed across all contexts. E.g., if GPT-3 gets a simple logic question wrong, then it must be incapable of simple logic.
|
||||
|
||||
But as the `Let's think step by step` example illustrates, apparent failures of GPT-3 can sometimes be remedied with a better prompt that helps the model steer itself toward the correct output.
|
||||
|
||||
## How to improve reliability on complex tasks
|
||||
|
||||
The rest of this article shares techniques for improving reliability of large language models on complex tasks. Although some of the techniques are specific to certain types of problems, many of them are built upon general principles that can be applied to a wide range of tasks, e.g.:
|
||||
|
||||
- Give clearer instructions
|
||||
- Split complex tasks into simpler subtasks
|
||||
- Structure the instruction to keep the model on task
|
||||
- Prompt the model to explain before answering
|
||||
- Ask for justifications of many possible answers, and then synthesize
|
||||
- Generate many outputs, and then use the model to pick the best one
|
||||
- Fine-tune custom models to maximize performance
|
||||
|
||||
## Split complex tasks into simpler tasks
|
||||
|
||||
One way to give a model more time and space to think is to break tasks into simpler pieces.
|
||||
|
||||
As an example, consider a task where we ask the model a multiple-choice question about some text - in this case, a game of Clue. When asked directly, `text-davinci-002` isn't able to put clues 3 & 5 together, and answers incorrectly:
|
||||
|
||||
```text-davinci-002
|
||||
Use the following clues to answer the following multiple-choice question.
|
||||
|
||||
Clues:
|
||||
1. Miss Scarlett was the only person in the lounge.
|
||||
2. The person with the pipe was in the kitchen.
|
||||
3. Colonel Mustard was the only person in the observatory.
|
||||
4. Professor Plum was not in the library nor the billiard room.
|
||||
5. The person with the candlestick was in the observatory.
|
||||
|
||||
Question: Was Colonel Mustard in the observatory with the candlestick?
|
||||
(a) Yes; Colonel Mustard was in the observatory with the candlestick
|
||||
(b) No; Colonel Mustard was not in the observatory with the candlestick
|
||||
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
|
||||
|
||||
Solution:
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
|
||||
```
|
||||
|
||||
Although clues 3 and 5 establish that Colonel Mustard was the only person in the observatory and that the person in the observatory had the candlestick, the models fails to combine them into a correct answer of (a) Yes.
|
||||
|
||||
However, instead of asking for the answer directly, we can split the task into three pieces:
|
||||
|
||||
- First, go through the clues one by one and and consider whether the clue is potentially relevant
|
||||
- Second, combine the relevant clues to reason out the answer to the question
|
||||
- Third, write the final answer: either (a), (b), or (c)
|
||||
|
||||
```text-davinci-002
|
||||
Use the following clues to answer the following multiple-choice question, using the following procedure:
|
||||
(1) First, go through the clues one by one and and consider whether the clue is potentially relevant
|
||||
(2) Second, combine the relevant clues to reason out the answer to the question
|
||||
(3) Third, map the answer to one of the multiple choice answers: either (a), (b), or (c)
|
||||
|
||||
Clues:
|
||||
1. Miss Scarlett was the only person in the lounge.
|
||||
2. The person with the pipe was in the kitchen.
|
||||
3. Colonel Mustard was the only person in the observatory.
|
||||
4. Professor Plum was not in the library nor the billiard room.
|
||||
5. The person with the candlestick was in the observatory.
|
||||
|
||||
Question: Was Colonel Mustard in the observatory with the candlestick?
|
||||
(a) Yes; Colonel Mustard was in the observatory with the candlestick
|
||||
(b) No; Colonel Mustard was not in the observatory with the candlestick
|
||||
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
|
||||
|
||||
Solution:
|
||||
(1) First, go through the clues one by one and and consider whether the clue is potentially relevant:
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
Clue 1: Miss Scarlett was the only person in the lounge. This clue is not relevant.
|
||||
Clue 2: The person with the pipe was in the kitchen. This clue is not relevant.
|
||||
Clue 3: Colonel Mustard was the only person in the observatory. This clue is relevant.
|
||||
Clue 4: Professor Plum was not in the library nor the billiard room. This clue is not relevant.
|
||||
Clue 5: The person with the candlestick was in the observatory. This clue is relevant.
|
||||
|
||||
(2) Second, combine the relevant clues to reason out the answer to the question:
|
||||
Clue 3 and Clue 5 both say that the person with the candlestick was in the observatory. Colonel Mustard was the only person in the observatory, so Colonel Mustard was the person with the candlestick.
|
||||
|
||||
(3) Third, write the final answer:
|
||||
(a) Yes; Colonel Mustard was in the observatory with the candlestick
|
||||
```
|
||||
|
||||
By giving the model more time and space to think, and guiding it along a reasoning plan, it's able to figure out the correct answer of (a) Yes.
|
||||
|
||||
Another benefit of splitting complex instructions into smaller subtasks is that it can help keep the model focused on each subtask.
|
||||
|
||||
For example, if we ask `text-davinci-002` to summarize a text in its original language, the model can lapse back into English:
|
||||
|
||||
```text-davinci-002
|
||||
Summarize the text using the original language of the text. The summary should be one sentence long.
|
||||
|
||||
Text:
|
||||
"""
|
||||
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano statista, "hombre de Estado") es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad. La estadística es una ciencia formal deductiva, con un conocimiento propio, dinámico y en continuo desarrollo obtenido a través del método científico formal. En ocasiones, las ciencias fácticas necesitan utilizar técnicas estadísticas durante su proceso de investigación factual, con el fin de obtener nuevos conocimientos basados en la experimentación y en la observación. En estos casos, la aplicación de la estadística permite el análisis de datos provenientes de una muestra representativa, que busca explicar las correlaciones y dependencias de un fenómeno físico o natural, de ocurrencia en forma aleatoria o condicional.
|
||||
"""
|
||||
|
||||
Summary:
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
The text explains that statistics is a science that studies the variability, collection, organization, analysis, interpretation, and presentation of data, as well as the random process that generates them following the laws of probability.
|
||||
```
|
||||
|
||||
However, if we first ask the model to identify the language of the text, and then summarize the text, it becomes more reliable:
|
||||
|
||||
```text-davinci-002
|
||||
First, identify the language of the text. Second, summarize the text using the original language of the text. The summary should be one sentence long.
|
||||
|
||||
Text:
|
||||
"""
|
||||
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano statista, "hombre de Estado") es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad. La estadística es una ciencia formal deductiva, con un conocimiento propio, dinámico y en continuo desarrollo obtenido a través del método científico formal. En ocasiones, las ciencias fácticas necesitan utilizar técnicas estadísticas durante su proceso de investigación factual, con el fin de obtener nuevos conocimientos basados en la experimentación y en la observación. En estos casos, la aplicación de la estadística permite el análisis de datos provenientes de una muestra representativa, que busca explicar las correlaciones y dependencias de un fenómeno físico o natural, de ocurrencia en forma aleatoria o condicional.
|
||||
"""
|
||||
|
||||
Language:
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
Spanish
|
||||
|
||||
La estadística es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad.
|
||||
```
|
||||
|
||||
## Prompt the model to explain before answering
|
||||
|
||||
Another powerful technique for improving the reliability of answers is to prompt the model to gradually reason out the answer rather than jumping immediately to the final answer. By 'thinking aloud' the model can be far more likely to arrive at the correct answer.
|
||||
|
||||
### Zero-shot
|
||||
|
||||
#### Method
|
||||
|
||||
Published by [Takeshi Kojima et al. in 2022](https://arxiv.org/abs/2205.11916), the easiest way to prompt a model to reason out the answer is to simply prepend answers with `Let's think step by step.` Figure 2 illustrates an example:
|
||||
|
||||
[
|
||||
<br>Source: *Large Language Models are Zero-Shot Reasoners* by Takeshi Kojima et al. (2022).](https://arxiv.org/abs/2205.11916)
|
||||
|
||||
#### Results
|
||||
|
||||
Applying this simple trick to the MultiArith math dataset, the authors found `Let's think step by step` quadrupled the accuracy, from 18% to 79%!
|
||||
|
||||
[
|
||||
<br>Source: *Large Language Models are Zero-Shot Reasoners* by Takeshi Kojima et al. (2022).](https://arxiv.org/abs/2205.11916)
|
||||
|
||||
#### Implications
|
||||
|
||||
Although the `Let's think step by step` trick works well on math problems, it's not effective on all tasks. The authors found that it was most helpful for multi-step arithmetic problems, symbolic reasoning problems, strategy problems, and other reasoning problems. It didn't help with simple math problems or common sense questions, and presumably wouldn't help with many other non-reasoning tasks either.
|
||||
|
||||
[
|
||||
<br>Source: *Large Language Models are Zero-Shot Reasoners* by Takeshi Kojima et al. (2022).](https://arxiv.org/abs/2205.11916)
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2205.11916).
|
||||
|
||||
If you apply this technique to your own tasks, don't be afraid to experiment with customizing the instruction. `Let's think step by step` is rather generic, so you may find better performance with instructions that hew to a stricter format customized to your use case. For example, if you were you can try more structured variants like `First, think step by step about why X might be true. Second, think step by step about why Y might be true. Third, think step by step about whether X or Y makes more sense.`. And you can even give the model an example format to help keep it on track, e.g.:
|
||||
|
||||
```text-davinci-002
|
||||
Using the IRS guidance below, answer the following questions using this format:
|
||||
(1) For each criterion, determine whether it is met by the vehicle purchase
|
||||
- {Criterion} Let's think step by step. {explanation} {yes or no, or if the question does not apply then N/A}.
|
||||
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the answer is likely {yes or no}."
|
||||
|
||||
IRS guidance:
|
||||
"""
|
||||
You may be eligible for a federal tax credit under Section 30D if you purchased a car or truck that meets the following criteria:
|
||||
- Does the vehicle have at least four wheels?
|
||||
- Does the vehicle weigh less than 14,000 pounds?
|
||||
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged from an external source?
|
||||
- Was the vehicle purchased in a year before 2022?
|
||||
- If so, has the manufacturer sold less than 200,000 qualifying vehicles? (Tesla and GM have sold more than 200,000 qualifying vehicles.)
|
||||
- Was the vehicle purchased in a year after 2022?
|
||||
- If so, is the vehicle present in the following list of North American-assembled vehicles? (The only electric vehicles assembled in North America are the Audi Q5, BMW 330e, BMW X5, Chevrolet Bolt EUV, Chevrolet Bolt EV, Chrysler Pacifica PHEV, Ford Escape PHEV, Ford F Series, Ford Mustang MACH E, Ford Transit Van, GMC Hummer Pickup, GMC Hummer SUV, Jeep Grand Cherokee PHEV, Jeep Wrangler PHEV, Lincoln Aviator PHEV, Lincoln Corsair Plug-in, Lucid Air, Nissan Leaf, Rivian EDV, Rivian R1S, Rivian R1T, Tesla Model 3, Tesla Model S, Tesla Model X, Tesla Model Y, Volvo S60, BMW 330e, Bolt EV, Cadillac Lyriq, Mercedes EQS SUV, and Nissan Leaf.)
|
||||
"""
|
||||
|
||||
Question: Can I claim a federal tax credit for my Toyota Prius Prime bought in 2021?
|
||||
|
||||
Solution:
|
||||
|
||||
(1) For each criterion, determine whether it is met by the vehicle purchase
|
||||
- Does the vehicle have at least four wheels? Let's think step by step.
|
||||
```
|
||||
|
||||
```text-davinci-002
|
||||
The Toyota Prius Prime has four wheels, so the answer is yes.
|
||||
- Does the vehicle weigh less than 14,000 pounds? Let's think step by step. The Toyota Prius Prime weighs less than 14,000 pounds, so the answer is yes.
|
||||
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged from an external source? Let's think step by step. The Toyota Prius Prime has a battery with at least 4 kilowatt hours that may be recharged from an external source, so the answer is yes.
|
||||
- Was the vehicle purchased in a year before 2022? Let's think step by step. The Toyota Prius Prime was purchased in 2021, which is before 2022, so the answer is yes.
|
||||
- Was the vehicle purchased in a year after 2022? N/A
|
||||
- If so, is the vehicle present in the following list of North American-assembled vehicles? N/A
|
||||
|
||||
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the answer is likely {yes or no}."
|
||||
|
||||
Because the Toyota Prius Prime meets all of the criteria for a federal tax credit, the answer is likely yes.
|
||||
```
|
||||
|
||||
### Few-shot examples
|
||||
|
||||
#### Method
|
||||
|
||||
Prompting the model to reason out its answers can be done in many ways. One way is to demonstrate with a few examples ('few-shot'), as studied by [Jason Wei and Denny Zhou et al. from Google](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html). Here's an example few-shot chain-of-thought prompt:
|
||||
|
||||
[
|
||||
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
|
||||
|
||||
More demonstrations of reasoning chains written by human labelers:
|
||||
|
||||
[
|
||||
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
|
||||
|
||||
[(Note that it has been called into question whether pears actually float)](https://twitter.com/Meaningness/status/1561062170074370048?s=20&t=mpHt8f3RRboztXxdhLFnWQ)
|
||||
|
||||
#### Results
|
||||
|
||||
Testing on grade school math problems, the authors found that chain of thought prompting tripled the solve rate, from 18% to 57%.
|
||||
|
||||
[
|
||||
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
|
||||
|
||||
In addition to math problems, chain of thought prompting also lifted performance on questions related to sports understanding, coin flip tracking, and last letter concatenation. In most cases, not many examples were need to saturate the performance gains (less than 8 or so).
|
||||
|
||||
[
|
||||
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2201.11903).
|
||||
|
||||
#### Implications
|
||||
|
||||
One advantage of the few-shot example-based approach relative to the `Let's think step by step` technique is that you can more easily specify the format, length, and style of reasoning that you want the model to perform before landing on its final answer. This can be be particularly helpful in cases where the model isn't initially reasoning in the right way or depth.
|
||||
|
||||
### Fine-tuned
|
||||
|
||||
#### Method
|
||||
|
||||
In general, to eke out maximum performance on a task, you'll need to fine-tune a custom model. However, fine-tuning a model using explanations may take thousands of example explanations, which are costly to write.
|
||||
|
||||
In 2022, Eric Zelikman and Yuhuai Wu et al. published a clever procedure for using a few-shot prompt to generate a dataset of explanations that could be used to fine-tune a model. The idea is to use a few-shot prompt to generate candidate explanations, and only keep the explanations that produce the correct answer. Then, to get additional explanations for some of the incorrect answers, retry the the few-shot prompt but with correct answers given as part of the question. The authors called their procedure STaR (Self-taught Reasoner):
|
||||
|
||||
[
|
||||
<br>Source: *STaR: Bootstrapping Reasoning With Reasoning* by Eric Zelikman and Yujuai Wu et al. (2022)](https://arxiv.org/abs/2203.14465)
|
||||
|
||||
With this technique, you can combine the benefits of fine-tuning with the benefits of chain-of-thought prompting without needing to write thousands of example explanations.
|
||||
|
||||
#### Results
|
||||
|
||||
When the authors applied this technique to a Common Sense Q&A dataset, they found that STaR outperformed both chain-of-thought prompting alone (73% > 37%) and fine-tuning alone (73% > 60%):
|
||||
|
||||
[
|
||||
<br>Source: *STaR: Bootstrapping Reasoning With Reasoning* by Eric Zelikman and Yujuai Wu et al. (2022)](https://arxiv.org/abs/2203.14465)
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2203.14465).
|
||||
|
||||
#### Implications
|
||||
|
||||
Using a few-shot prompt to extend or modify a fine-tuning dataset is an idea that can be generalized beyond explanation writing. For example, if you have large quantities of unstructured text that you want to train on, you may find opportunities to use a prompt to extract a structured dataset from your unstructured text, and then fine-tune a custom model on that structured dataset.
|
||||
|
||||
## Extensions to chain-of-thought prompting
|
||||
|
||||
A number of extensions of chain-of-thought prompting have been published as well.
|
||||
|
||||
### Selection-inference prompting
|
||||
|
||||
#### Method
|
||||
|
||||
Published by Antonia Creswell et al., one extension of the chain-of-thought technique is to split the single prompt for generating explanations and answers into smaller parts. First, a prompt selects a relevant subset of facts from the text ('selection prompt'). Then, a second prompt infers a conclusion from the selected facts ('inference prompt'). These prompts are then alternated in a loop to generate multiple steps of reasoning and eventually land on a final answer. The authors illustrate the idea in the following figure:
|
||||
|
||||
[
|
||||
<br>Source: *Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2205.09712)
|
||||
|
||||
#### Results
|
||||
|
||||
When applied to a 7B-parameter model, the authors found that selection-inference prompting substantially improved performance relative to chain-of-thought prompting on the bAbi and Proof Writer benchmark tasks (both of which require longer sequences of reasoning steps). The best performance they achieved combined both selection-inference prompting with fine-tuning.
|
||||
|
||||
[
|
||||
<br>Source: *Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2205.09712)
|
||||
|
||||
#### Implications
|
||||
|
||||
Although the gains on these benchmarks were large, these benchmarks were specifically chosen because they required longer sequences of reasoning. On problems that don't require reasoning with many steps, the gains are likely smaller.
|
||||
|
||||
The results highlight a couple of general lessons for working with large language models. One, splitting up complex tasks into smaller tasks is a great way to improve reliability and performance; the more atomic the task, the less room there is for the model to err. Two, getting maximum performance often means combining fine-tuning with whatever approach you've chosen.
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2205.09712).
|
||||
|
||||
### Faithful reasoning architecture
|
||||
|
||||
A few months after publishing the selection-inference prompting technique, the authors extended the technique in a follow-up paper, with ideas for:
|
||||
|
||||
- figuring out when the selection-inference cycle should stop or continue
|
||||
- adding a value function to help search over multiple reasoning paths
|
||||
- reducing hallucination of fake facts by fine-tuning a model to reason about sentence labels (e.g., sen1) rather than writing out the sentences themselves
|
||||
|
||||
#### Method
|
||||
|
||||
In the original selection-inference technique, specialized 'selection' and 'inference' prompts are alternated to select facts and make inferences from those facts, combining to generate a sequence of reasoning steps.
|
||||
|
||||
The authors extend this technique with two additional components.
|
||||
|
||||
First, the authors add a 'halter' model that, after each inference step, is asked whether the inferences thus far are sufficient to answer the question. If yes, then the model generates a final answer.
|
||||
|
||||
The halter models brings a couple of advantages:
|
||||
|
||||
- it can tell the selection-inference process to stop or keep going, as necessary.
|
||||
- if the process never halts, you'll get no answer, which is often preferrable to a hallucinated guess
|
||||
|
||||
[
|
||||
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
|
||||
|
||||
[
|
||||
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
|
||||
|
||||
Second, the authors add a value function, which is used to assess the quality of reasoning steps and search over multiple reasoning trajectories. This echoes a common theme for increasing reliability; instead of generating a single answer from the model, generate a set of answers and then use some type of value function / discriminator / verifier model to pick the best one.
|
||||
|
||||
[
|
||||
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
|
||||
|
||||
In addition to these two extensions, the authors also use a trick to reduce hallucination of fake facts. Rather than asking the model to write out factual sentences, they fine-tune a model to work with sentence labels (e.g., sen1) instead. This helps prevent the model from hallucinating fake facts not mentioned in the prompt context.
|
||||
|
||||
[
|
||||
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
|
||||
|
||||
#### Results
|
||||
|
||||
The authors evaluated their technique on two benchmarks: the ProofWriter task (not shown) and [EntailmentBankQA](https://allenai.org/data/entailmentbank) (shown). The technique increased accuracy substantially, especially on harder reasoning problems.
|
||||
|
||||

|
||||
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
|
||||
|
||||
In addition, their sentence label manipulation trick essentially eliminated hallucination!
|
||||
|
||||

|
||||
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
|
||||
|
||||
#### Implications
|
||||
|
||||
This paper illustrates a number of helpful lessons for improving the reliability of large language models:
|
||||
|
||||
- Split complex tasks into smaller, more reliable subtasks
|
||||
- Generate your answer in a step-by-step fashion, evaluating it along the way
|
||||
- Generate many possible answers and use another model or function to pick the ones that look best
|
||||
- Reduce hallucination by constraining what the model can say (e.g., by using sentence labels instead of sentences)
|
||||
- Maximize performance of models by fine-tuning them on specialized tasks
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2205.09712).
|
||||
|
||||
### Least-to-most prompting
|
||||
|
||||
In addition to doing poorly on long reasoning chains (where selection-inference shines), chain-of-thought prompting can especially struggle when the examples are short but the task is long.
|
||||
|
||||
#### Method
|
||||
|
||||
Least-to-most prompting is another technique that splits up reasoning tasks into smaller, more reliable subtasks. The idea is to elicit a subtask from the model by prompting it with something like `To solve {question}, we need to first solve: "`. Then, with that subtask in hand, the model can generate a solution. The solution is appended to the original question and the process is repeated until a final answer is produced.
|
||||
|
||||
[
|
||||
<br>Source: *Least-to-most Prompting Enables Complex Reasoning in Large Language Models* by Denny Zhou et al. (2022)](https://arxiv.org/abs/2205.10625)
|
||||
|
||||
#### Results
|
||||
|
||||
When applied to benchmarks involving long reasoning chains using `code-davinci-002` (which is optimized for code but can still understand text), the authors measured gains as large as 16% -> 99.7%!
|
||||
|
||||
[
|
||||

|
||||

|
||||

|
||||
<br>Source: *Least-to-most Prompting Enables Complex Reasoning in Large Language Models* by Denny Zhou et al. (2022)](https://arxiv.org/abs/2205.10625)
|
||||
|
||||
#### Implications
|
||||
|
||||
Although the above gains from least-to-most prompting are impressive, they are measured on a very narrow set of tasks that require long reasoning chains.
|
||||
|
||||
Still, they illustrate a common theme: increase reliability by (a) breaking complex tasks into smaller subtasks and (b) giving the model more time and space to work out the answer.
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2205.10625).
|
||||
|
||||
## Related ideas
|
||||
|
||||
### Maieutic prompting
|
||||
|
||||
#### Method
|
||||
|
||||
In contrast to the previous techniques, which try to maximize the likelihood of correct answers, another approach is to use GPT-3 to generate a tree of possible explanations (both correct *and incorrect*), and then analyze their relationships to guess at which set is correct. This technique was coined maieutic prompting by [Jaehun Jung et al. in May 2022](https://arxiv.org/abs/2205.11822) (maieutic means relating to the Socratic method of asking questions to elicit ideas).
|
||||
|
||||
The method is complicated, and works as follows:
|
||||
|
||||
- First, build a maieutic tree, where each node is a statement that could be true or false:
|
||||
- Start with a multiple-choice question or true/false statement (e.g. `War cannot have a tie`)
|
||||
- For each possible answer to the question, use the model to generate a correponding explanation (with a prompt like `War cannot have a tie? True, because`)
|
||||
- Then, prompt the model with the question and the generated explanation, and ask it to produce the answer. If reversing the explanation (with a prefix like `It is wrong to say that {explanation}`) reverses the answer, then the explanation is considered 'logically integral.'
|
||||
- If an explanation is not logically integral, then repeat the above process recursively, with each explanation turned into a True or False question, and generate more explanations for each new question.
|
||||
- After all of the recursive explaining is done, you end up with a tree of explanations, where each leaf on the tree has the property that reversing the explanation reverses the model's answer.
|
||||
- Second, convert the tree into a graph of relations:
|
||||
- For each node in the tree, calculate the model's relative belief in each node (inferred from the probability of getting an answer of `True` to given an explanation)
|
||||
- For each pair of nodes in the tree, use the model to identify whether they are entailed (implied) or contradicted
|
||||
- Third, find the most consistent set of beliefs and take those to be true:
|
||||
- Specifically, using the strength of belief in each node and the logical relationships between them, formulate the problem as a weighted maximum satisfiability problem (MAX-SAT)
|
||||
- Use a solver to the find the most self-consistent set of beliefs, and take those as true
|
||||
|
||||
[
|
||||

|
||||

|
||||
<br>Source: *Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations* by Jaehun Jung et al. (2022)](https://arxiv.org/abs/2205.11822)
|
||||
|
||||
|
||||
#### Results
|
||||
|
||||
[
|
||||
<br>Source: *Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations* by Jaehun Jung et al. (2022)](https://arxiv.org/abs/2205.11822)
|
||||
|
||||
#### Implications
|
||||
|
||||
Beyond the complexity, one limitation of this method is that it appears to only apply to questions that can be posed as multiple-choice.
|
||||
|
||||
To learn more, read the [full paper](https://arxiv.org/abs/2205.11822).
|
||||
|
||||
## Extensions
|
||||
|
||||
### Self-consistency
|
||||
|
||||
#### Method
|
||||
|
||||
For tasks with a discrete set of answers, one simple way to improve reliability is to sample multiple explanations & answers from the model (using a positive temperature) and then pick the final answer that appears most often.
|
||||
|
||||
[
|
||||
<br>Source: *Self-Consistency Improves Chain of Thought Reasoning in Language Models* by Xuezhi Wang et al. (2022)](https://arxiv.org/abs/2203.11171)
|
||||
|
||||
#### Results
|
||||
|
||||
This technique lifted accuracies by anywhere from 1 to 24 percentage points on a suite of math and reasoning benchmarks. (Plotted below are results from Google's LaMDA model; using Google's larger PaLM model, the baselines were higher but the gains were a bit smaller.)
|
||||
|
||||
[
|
||||
<br>Source: *Self-Consistency Improves Chain of Thought Reasoning in Language Models* by Xuezhi Wang et al. (2022)](https://arxiv.org/abs/2203.11171)
|
||||
|
||||
#### Implications
|
||||
|
||||
Although this technique is simple to implement, it can be costly. Generating a set of 10 answers will increase your costs by 10x.
|
||||
|
||||
Also, as with many of these techniques, it applies only to tasks with a limited set of answers. For open-ended tasks where each answer is unique (such as writing a poem), it's not obvious what it would mean to pick the most common answer.
|
||||
|
||||
Lastly, this technique ought to be most beneficial when there are multiple paths or phrasings to reach an answer; if there's only one path, then the technique may not help at all. An extreme example: If the task was to generate a single token answer, then taking the most common token from 100 generations would be no different than taking the token with the highest logprobs (which you can get with a single generation at temperature=0).
|
||||
|
||||
### Verifiers
|
||||
|
||||
Another key technique for improving task performance is to train a verifier or discriminator model to evaluate the outputs of the main generative model. If the discriminator rejects the output, then you can resample the generative model until you get an acceptable output. In many cases, it's easier to judge an answer than it is to create an answer, which helps explain the power of this method.
|
||||
|
||||
#### Method
|
||||
|
||||
In 2021, OpenAI researchers applied this technique to grade school math problems, using the following procedure:
|
||||
|
||||
- First, they fine-tuned a model on questions and solutions
|
||||
- For each problem in the training set, they generated 100 solutions
|
||||
- Each of those 100 solutions was automatically labeled as either correct or incorrect, based on whether the final answer was correct
|
||||
- Using those solutions, with some labeled correct and some labeled incorrect, they fine-tuned a verifier model to classify whether a question and candidate solution was correct or incorrect
|
||||
- Finally, at test time, the generative model creates 100 solutions to each problem, and the one with the highest score according to the verifier model is picked as the final answer
|
||||
|
||||
[
|
||||
<br>Source: *Training Verifiers to Solve Math Word Problems* by Karl Cobbe et al. (2021)](https://arxiv.org/abs/2110.14168)
|
||||
|
||||
#### Results
|
||||
|
||||
With a 175B GPT-3 model and 8,000 training examples, this technique substantially lifted gradeschool math accuracy from ~33% to ~55%.
|
||||
|
||||
[
|
||||
<br>Source: *Training Verifiers to Solve Math Word Problems* by Karl Cobbe et al. (2021)](https://arxiv.org/abs/2110.14168)
|
||||
|
||||
#### Implications
|
||||
|
||||
Similar to the self-consistency technique, this method can get expensive, as generating, say, 100 solutions per task will increase your costs by roughly ~100x.
|
||||
|
||||
## Theories of reliability
|
||||
|
||||
Although the techniques above vary in their approach, they all share the goal of improving reliability on complex tasks. Mainly they do this by:
|
||||
|
||||
- decomposing unreliable operations into smaller, more reliable operations (e.g., selection-inference prompting)
|
||||
- using multiple steps or multiple relationships to make the system's reliability greater than any individual component (e.g., maieutic prompting)
|
||||
|
||||
### Probabilistic graphical models
|
||||
|
||||
This paradigm of trying to build a reliable system out of less reliable components is reminiscent of probabilistic programming, and many of the analysis techniques of that field can be applied to this one.
|
||||
|
||||
In the paper *Language Model Cascades*, David Dohan et al. interpret the above techniques in the paradigm of probabilistic graphical models:
|
||||
|
||||
#### Chain of thought prompting
|
||||
|
||||
[
|
||||
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
|
||||
|
||||
#### Fine-tuned chain of thought prompting / Self-taught reasoner
|
||||
|
||||
[
|
||||
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
|
||||
|
||||
#### Selection-inference prompting
|
||||
|
||||
[
|
||||
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
|
||||
|
||||
#### Verifiers
|
||||
|
||||
[
|
||||
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
|
||||
|
||||
#### Implications
|
||||
|
||||
Although formulating these techniques as probabilistic graphical models may not be immediately useful for solving any particular problem, the framework may be helpful in selecting, combining, and discovering new techniques.
|
||||
|
||||
## Closing thoughts
|
||||
|
||||
Research into large language models is very active and evolving rapidly. Not only do researchers continue to improve the models, they are also continue to improve our understanding of how to best employ the models. To underscore the pace of these developments, note that all of the papers shared above were published within the past 12 months (as I write in Sep 2022).
|
||||
|
||||
In the future, expect better models and better techniques to be published. Even if the specific techniques here are eclipsed by future best practices, the general principles behind them will likely remain a key part of any expert user's toolkit.
|
||||
|
||||
## Bibliography
|
||||
|
||||
| Lesson | Paper | Date |
|
||||
|--------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|----------|
|
||||
| Break complex tasks into simpler subtasks (and consider exposing the intermediate outputs to users) | [AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts](https://arxiv.org/abs/2110.01691) | 2021 Oct |
|
||||
| You can improve output by generating many candidates, and then picking the one that looks best | [Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168) | 2021 Oct |
|
||||
| On reasoning tasks, models do better when they reason step-by-step before answering | [Chain of Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) | 2022 Jan |
|
||||
| You can improve step-by-step reasoning by generating many explanation-answer outputs, and picking the most popular answer | [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) | 2022 Mar |
|
||||
| If you want to fine-tune a step-by-step reasoner, you can do it with multiple-choice question & answer data alone | [STaR: Bootstrapping Reasoning With Reasoning](https://arxiv.org/abs/2203.14465) | 2022 Mar |
|
||||
| The step-by-step reasoning method works great even with zero examples | [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916) | 2022 May |
|
||||
| You can do better than step-by-step reasoning by alternating a ‘selection’ prompt and an ‘inference’ prompt | [Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning](https://arxiv.org/abs/2205.09712) | 2022 May |
|
||||
| On long reasoning problems, you can improve step-by-step reasoning by splitting the problem into pieces to solve incrementally | [Least-to-most Prompting Enables Complex Reasoning in Large Language Models](https://arxiv.org/abs/2205.10625) | 2022 May |
|
||||
| You can have the model analyze both good and bogus explanations to figure out which set of explanations are most consistent | [Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations](https://arxiv.org/abs/2205.11822) | 2022 May |
|
||||
| You can think about these techniques in terms of probabilistic programming, where systems comprise unreliable components | [Language Model Cascades](https://arxiv.org/abs/2207.10342) | 2022 Jul |
|
||||
| You can eliminate hallucination with sentence label manipulation, and you can reduce wrong answers with a 'halter' prompt | [Faithful Reasoning Using Large Language Models](https://arxiv.org/abs/2208.14271) | 2022 Aug |
|
@ -34,7 +34,7 @@ We've written guides and code examples for transitioning from the deprecated API
|
||||
[Guide: How to transition off the Classifications endpoint](https://help.openai.com/en/articles/6272941-classifications-transition-guide)
|
||||
|
||||
* Option 1: transition to fine-tuning **(recommended)**
|
||||
* Example code: [Classification.ipynb](../examples/Classification.ipynb)
|
||||
* Example code: [Fine-tuned_classification.ipynb](../examples/Fine-tuned_classification.ipynb)
|
||||
* Option 2: transition to embeddings
|
||||
* Example code: [Semantic_text_search_using_embeddings.ipynb](../examples/Semantic_text_search_using_embeddings.ipynb)
|
||||
* Option 3: reimplement Classifications endpoint functionality
|
||||
|