85 Commits

Author SHA1 Message Date
b7cb21689a Initial commit of vector database example with new embeddings 2023-01-05 01:54:46 -08:00
838f000935 Merge pull request #42 from openai/ted/update-embedding-model
updates embedding examples with new embedding model
2022-12-15 10:00:26 -08:00
fd181ec78f updates embedding examples with new embedding model 2022-12-15 09:58:37 -08:00
7de3d50816 Merge pull request #36 from achandmsft/patch-1
Added .default scope to URI to get token using DefaultAzureCredential
2022-12-12 11:24:05 -08:00
aabbdbe28e Merge pull request #40 from pitmonticone/main
Fix a few typos
2022-12-12 11:22:42 -08:00
0009da639d Fix a few typos 2022-12-10 01:18:05 +01:00
5e66437686 Merge pull request #39 from ggdupont/fix/typo_search_example
Fixing typo in parameters usage
2022-12-06 12:39:06 -08:00
6b6e6323e4 Fixing typo in parameters usage 2022-12-06 17:09:44 +01:00
2072d1a1fd Merge pull request #38 from openai/dev/atty/text-davinci-003
Update README for text-davinci-003
2022-11-28 17:34:06 -08:00
e811878082 Update README for text-davinci-003 2022-11-28 16:35:53 -08:00
3c334e70dd Added .default scope to URI to get token using DefaultAzureCredential
Fixing issue encountered when using this in some contexts (like virtual environments in notebooks).
2022-11-18 09:28:51 -08:00
e3395df981 Merge pull request #35 from openai/ted/unit-test-example
adds unit test example
2022-11-15 13:29:08 -08:00
e00797e3e5 adds unit test example 2022-11-15 13:24:11 -08:00
4fd730e78f Merge pull request #34 from termosa/patch-1
Update olympics-1-collect-data.ipynb
2022-11-07 08:59:13 -08:00
1a8111e0ef Update olympics-1-collect-data.ipynb
Fix typo with duplicated "the the"
2022-11-06 16:54:54 +02:00
12ea77eb1b Merge pull request #33 from openai/ted/update-DALL-E-API-example
updates DALL-E API example
2022-11-04 18:37:09 -07:00
1f62a62102 updates DALL-E API example 2022-11-04 18:32:32 -07:00
06ac519c8b Merge pull request #32 from openai/ted/fix_broken_link
fixes two broken links to embedding guide
2022-11-03 11:22:49 -07:00
d932a36398 fixes two broken links to embedding guide 2022-11-03 11:18:13 -07:00
459afa7d9b Merge pull request #30 from viethoangtranduong/patch-1
Nit: Change text in CLF cookbook
2022-10-28 15:30:30 -07:00
0d4989245d Nit: Change text in CLF cookbook
Minor edit to clarify content in the code base
2022-10-29 01:56:06 +07:00
fe60d7f2af Merge pull request #26 from colin-jarvis/main
Adding transaction classification notebooks
2022-10-26 17:50:49 -07:00
c621b46924 Merge branch 'main' of https://github.com/colin-jarvis/openai-cookbook 2022-10-26 17:13:23 +01:00
0ad407b75a Removed helpers 2022-10-26 17:13:15 +01:00
6b536c981a Delete helpers.py 2022-10-26 17:09:09 +01:00
d968557408 Merge branch 'main' of https://github.com/colin-jarvis/openai-cookbook 2022-10-26 16:30:46 +01:00
209c1a12e8 Resolved PR comments from Boris 2022-10-26 16:30:38 +01:00
3ad2df91d8 Merge pull request #29 from openai/ted/fix-broken-qa-link
fixes broken link to QA notebook
2022-10-24 16:33:03 -07:00
e383e243c2 fixes broken link to QA notebook 2022-10-24 16:31:33 -07:00
5ce51d7b4d Merge pull request #28 from openai/ted/restore_qa_notebooks
updates warning formatting to HTML to improve display on GitHub
2022-10-24 14:04:45 -07:00
75aceae6b8 updates warning formatting to HTML to improve display on GitHub 2022-10-24 14:03:11 -07:00
0528302f6d Merge pull request #27 from openai/ted/restore_qa_notebooks
Ted/restore qa notebooks
2022-10-24 13:48:15 -07:00
e3d7091d70 adds warning to QA example code 2022-10-24 13:46:57 -07:00
37e0136ce0 Revert "removes old Q&A example now that a better alternative exists"
This reverts commit 02295444f7.
2022-10-24 13:28:47 -07:00
381070fa4e Merge branch 'openai:main' into main 2022-10-20 23:45:06 +01:00
401f7c7ef0 Added write-up to Clustering for transaction classification notebook 2022-10-20 23:42:19 +01:00
b01900d5d9 Initial commit of transaction classification notebooks 2022-10-20 23:31:42 +01:00
027c9233fe Merge pull request #24 from openai/ted/tiny-fix-to-streaming-example
small updates to streaming example
2022-10-12 15:54:52 -07:00
1b211d2a87 small updates to streaming example 2022-10-12 15:54:19 -07:00
4a117bc0d2 Merge pull request #22 from openai/ted/fix-broken-classification-link
fixes broken classification link
2022-09-30 23:13:54 -07:00
c9ec5d79f3 fixes broken classification link 2022-09-30 23:12:43 -07:00
32596ae85b Removed line from notebook 2022-09-29 18:30:29 +01:00
d538ffc014 Merge pull request #21 from colin-jarvis/main
Added Generations and Variations endpoint examples for DALL-E API
2022-09-29 10:29:19 -07:00
c951d099b9 Removed images from folder 2022-09-29 08:34:34 +01:00
89de43faf4 Fixed Edits endpoint by applying new masking approach 2022-09-29 08:33:24 +01:00
139af20193 Added Generations and Variations endpoint examples for DALL-E API. Edits endpoint example is included but is WIP, getting an error currently 2022-09-28 21:15:02 +01:00
e8e5c39a1c Merge pull request #19 from sorinsuciu-msft/change-scale-type
Change scale_type in the create deployment examples
2022-09-19 10:28:42 -07:00
71cc3f4a9b Change scale_type in the create deployment examples 2022-09-19 14:42:29 +02:00
563f8db035 Merge pull request #18 from openai/ted/removes_old_qa_example
removes old Q&A example now that a better alternative exists
2022-09-15 10:47:41 -07:00
02295444f7 removes old Q&A example now that a better alternative exists 2022-09-15 10:46:27 -07:00
f4a149bf20 Merge pull request #17 from openai/ted/updates-techniques-guide
small update to one example prompt
2022-09-12 21:21:47 -07:00
bd26141162 small update to one example prompt 2022-09-12 21:20:48 -07:00
f601888f18 Merge pull request #16 from openai/ted/updates-techniques-guide
fixes markdown formatting
2022-09-12 19:50:13 -07:00
8c3b8196e7 fixes markdown formatting 2022-09-12 19:49:42 -07:00
1ad5b85a46 Merge pull request #15 from openai/ted/adds-how-to-handle-rate-limits
adds guide for how to handle rate limits
2022-09-12 18:49:33 -07:00
6d31729d05 Merge pull request #14 from openai/ted/updates-techniques-guide
fixes typos and adds more examples
2022-09-12 18:49:12 -07:00
c5f64fa27a fixes typos and adds more examples 2022-09-12 18:48:19 -07:00
853264fc55 adds guide for how to handle rate limits 2022-09-12 16:55:35 -07:00
5704646be0 Merge pull request #12 from openai/ted/adds-techniques-to-improve-reliability
adds guide on techniques to improve reliability
2022-09-02 12:20:10 -07:00
c76bbfcf8d Merge pull request #11 from openai/ted/adds-streaming-example
adds streaming example
2022-09-02 12:19:50 -07:00
e26b8dd3c8 adds guide on techniques to improve reliability 2022-09-02 12:19:11 -07:00
bff53e71ed adds streaming example 2022-09-02 12:15:34 -07:00
5da9f28eb8 Merge pull request #8 from shyamal-anadkat/patch-1
organizes links in README
2022-08-02 17:52:20 -07:00
e1144afc97 organizes links in README 2022-08-02 17:43:49 -07:00
1ca1c2946b Merge pull request #6 from openai/ted/update-examples
Ted/update examples
2022-07-20 17:45:39 -07:00
b40d006c7a update search example 2022-07-20 17:42:02 -07:00
4195d712cc updates azure examples 2022-07-20 17:39:56 -07:00
627fdd8c84 Merge pull request #5 from openai/ted-at-openai/openai-cookbook-updates
Ted at openai/openai cookbook updates
2022-07-12 16:23:33 -07:00
edb02b8bc9 migrates Azure examples from openai-python library 2022-07-12 15:26:37 -07:00
d987d2329a adds prefiltered data and precomputed embeddings to Customizing_embeddings.ipynb 2022-07-12 15:26:19 -07:00
65d0925e59 adds precomputing embeddings, changes engine name to match, fixes typo 2022-07-12 15:26:05 -07:00
2dc64b1567 adds data download from CDN with precomputed embeddings 2022-07-12 15:25:56 -07:00
efa71942c6 moves dbpedia embeddings to data folder 2022-07-12 15:25:43 -07:00
04f7e93c62 adds warning about needing to compute file with embeddings 2022-07-12 15:25:35 -07:00
bd061ad2d4 adds data download from CDN with precomputed embeddings 2022-07-12 15:25:23 -07:00
350b9a7333 adds data download from CDN with precomputed embeddings 2022-07-12 15:25:10 -07:00
6eae26d5cc adds link to full dataset from AG 2022-07-12 15:24:54 -07:00
7dfdb9aa05 lint 2022-07-12 15:24:43 -07:00
b77e9b34e8 clarifies name of Classification example using embeddings 2022-07-12 15:24:27 -07:00
cfe03038bd lint 2022-07-12 15:23:26 -07:00
8b2d68f031 adds precomputed embeddings 2022-07-12 15:22:46 -07:00
2515ddc7b8 adds data for Obtain_dataset.ipynb 2022-07-12 15:22:29 -07:00
25d641d21f renames context_embeddings to document_embeddings 2022-07-12 15:21:44 -07:00
ac406fc126 Update Question_answering_using_embeddings.ipynb 2022-07-11 20:24:33 +03:00
22f25af04f Merge pull request #4 from openai/mike_question_answering_using_embeddings
Question answering using embeddings
2022-06-20 13:16:48 -07:00
76 changed files with 30529 additions and 25380 deletions

5
.gitignore vendored
View File

@ -127,3 +127,8 @@ dmypy.json
# Pyre type checker
.pyre/
# Data
*transactions*.jsonl
/examples/data/transactions*
*.DS_Store

View File

@ -1,8 +1,8 @@
# OpenAI Cookbook
This repository shares example code and example prompts for accomplishing common tasks with the [OpenAI API](https://openai.com/api/).
This repository shares example code and example prompts for accomplishing common tasks with the [OpenAI API].
To try these examples yourself, youll need an OpenAI account. [Create a free account to get started.](https://beta.openai.com/signup)
To try these examples yourself, youll need an OpenAI account. [Create a free account to get started.][API Signup]
Most code examples are written in Python, though the concepts can be applied in any language.
@ -10,13 +10,13 @@ In the same way that a cookbook's recipes don't span all possible meals or techn
## Related resources
Beyond the code examples here, you can also learn about the [OpenAI API](https://openai.com/api/) from the following resources:
Beyond the code examples here, you can also learn about the [OpenAI API] from the following resources:
* Try out GPT-3 in the [OpenAI Playground](https://beta.openai.com/playground)
* Read about the API in the [OpenAI Documentation](https://beta.openai.com/docs/introduction)
* Discuss the API in the [OpenAI Community Forum](https://community.openai.com/top?period=monthly)
* Look for help in the [OpenAI Help Center](https://help.openai.com/en/)
* See example prompts in the [OpenAI Examples](https://beta.openai.com/examples)
* Try out GPT-3 in the [OpenAI Playground]
* Read about the API in the [OpenAI Documentation]
* Discuss the API in the [OpenAI Community Forum]
* Look for help in the [OpenAI Help Center]
* See example prompts in the [OpenAI Examples]
## Examples, organized by capability
@ -87,7 +87,7 @@ Beyond the code examples here, you can also learn about the [OpenAI API](https:/
## How large language models work
[Large language models](https://openai.com/blog/better-language-models/) are functions that map text to text. Given an input string of text, a large language model tries to predict the text that will come next.
[Large language models][Large language models Blog Post] are functions that map text to text. Given an input string of text, a large language model tries to predict the text that will come next.
The magic of large language models is that by being trained to minimize this prediction error over vast quantities of text, the models end up learning concepts useful for these predictions. For example, they learn concepts like:
@ -102,7 +102,7 @@ The magic of large language models is that by being trained to minimize this pre
None of these capabilities are explicitly programmed in - they all emerge as a result of training.
GPT-3's capabilities now power [hundreds of different software products](https://openai.com/blog/gpt-3-apps/), including productivity apps, education apps, games, and more.
GPT-3's capabilities now power [hundreds of different software products][GPT3 Apps Blog Post], including productivity apps, education apps, games, and more.
## How to control a large language model
@ -120,7 +120,7 @@ An example of each is shown below.
### Instruction prompts
Instruction-following models (e.g., `text-davinci-002` or any model beginning with `text-`) are specially designed to follow instructions. Write your instruction at the top of the prompt (or at the bottom, or both), and the model will do its best to follow the instruction and then stop. Instructions can be detailed, so don't be afraid to write a paragraph explicitly detailing the output you want.
Instruction-following models (e.g., `text-davinci-003` or any model beginning with `text-`) are specially designed to follow instructions. Write your instruction at the top of the prompt (or at the bottom, or both), and the model will do its best to follow the instruction and then stop. Instructions can be detailed, so don't be afraid to write a paragraph explicitly detailing the output you want.
Example instruction prompt:
@ -182,7 +182,7 @@ Output:
### Fine-tuned prompt example
With enough training examples, you can [fine-tune](https://beta.openai.com/docs/guides/fine-tuning) a custom model. In this case, instructions become unnecessary, as the model can learn the task from the training data provided. However, it can be helpful to include separator sequences (e.g., `->` or `###` or any string that doesn't commonly appear in your inputs) to tell the model when the prompt has ended and the output should begin. Without separator sequences, there is a risk that the model continues elaborating on the input text rather than starting on the answer you want to see.
With enough training examples, you can [fine-tune][Fine Tuning Docs] a custom model. In this case, instructions become unnecessary, as the model can learn the task from the training data provided. However, it can be helpful to include separator sequences (e.g., `->` or `###` or any string that doesn't commonly appear in your inputs) to tell the model when the prompt has ended and the output should begin. Without separator sequences, there is a risk that the model continues elaborating on the input text rather than starting on the answer you want to see.
Example fine-tuned prompt (for a model that has been custom trained on similar prompt-completion pairs):
@ -203,7 +203,7 @@ Output:
### More prompt advice
For more prompt examples, visit [OpenAI Examples](https://beta.openai.com/examples).
For more prompt examples, visit [OpenAI Examples][OpenAI Examples].
In general, the input prompt is the best lever for improving model outputs. You can try tricks like:
@ -253,7 +253,7 @@ In general, writing can work with any style of prompt. Experiment to see what wo
| | Advantages | Disadvantages |
| ---------------------------------------------------------- | ----------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| Instruction-following models<br>(e.g., `text-davinci-002`) | Easiest to use | Less creative; less diverse; harder to control tone, length, etc. |
| Instruction-following models<br>(e.g., `text-davinci-003`) | Easiest to use | Less creative; less diverse; harder to control tone, length, etc. |
| Base models<br>(e.g., `davinci`) | More creative | More expensive (as including examples demonstrations in prompt will cost tokens) |
| Fine-tuned models | Can train off of many examples; cheaper than including examples in the prompt | Hard to gather training data; training makes iteration slower and more expensive |
@ -301,7 +301,7 @@ Output:
One
```
If the text you wish to ask about is longer than the token limit (~4,000 tokens for `text-davinci-002` and ~2,000 tokens for earlier models), we recommending splitting the text into smaller pieces, ranking them by relevance, and then asking the most-relevant-looking pieces.
If the text you wish to ask about is longer than the token limit (~4,000 tokens for `text-davinci-003` and ~2,000 tokens for earlier models), we recommending splitting the text into smaller pieces, ranking them by relevance, and then asking the most-relevant-looking pieces.
#### Summarization
@ -360,7 +360,7 @@ People & titles: William Baxter (Assistant Attorney General), Charles L. Brown (
### 3. Edit text
In addition to the [completion API endpoint](https://beta.openai.com/docs/api-reference/completions), OpenAI now offers an [edit API endpoint](https://beta.openai.com/docs/api-reference/edits) ([blog post](https://openai.com/blog/gpt-3-edit-insert/)). In contrast to completions, which only take a single text input, edits take two text inputs: the instruction and the text to be modified.
In addition to the [completion API endpoint][Completion API Docs], OpenAI now offers an [edit API endpoint][Edit API Docs] ([blog post][GPT3 Edit Blog Post]). In contrast to completions, which only take a single text input, edits take two text inputs: the instruction and the text to be modified.
An example edit prompt:
@ -435,7 +435,7 @@ When it comes to translation, large language models particularly shine at combin
### 4. Compare text
The [OpenAI API embeddings endpoint](https://beta.openai.com/docs/guides/embeddings) can be used to measure similarity between pieces of text ([blog post](https://openai.com/blog/introducing-text-and-code-embeddings/)). By leveraging GPT-3's understanding of text, these embeddings [achieved state-of-the-art results](https://arxiv.org/abs/2201.10005) on benchmarks in both unsupervised learning and transfer learning settings.
The [OpenAI API embeddings endpoint][Embeddings Docs] can be used to measure similarity between pieces of text ([blog post][Embeddings Blog Post]). By leveraging GPT-3's understanding of text, these embeddings [achieved state-of-the-art results](https://arxiv.org/abs/2201.10005) on benchmarks in both unsupervised learning and transfer learning settings.
Embeddings can be used for semantic search, recommendations, cluster analysis, near-duplicate detection, and more.
@ -446,11 +446,11 @@ Embeddings can be used for search either by themselves or as a feature in a larg
The simplest way to use embeddings for search is as follows:
* Before the search (precompute):
* Split your text corpus into chunks smaller than the token limit (e.g., ~2,000 tokens)
* Embed each chunk using a 'doc' model (e.g., `text-search-curie-doc-001`)
* Split your text corpus into chunks smaller than the token limit (e.g., <8,000 tokens)
* Embed each chunk
* Store those embeddings in your own database or in a vector search provider like [Pinecone](https://www.pinecone.io) or [Weaviate](https://weaviate.io)
* At the time of the search (live compute):
* Embed the search query using the correponding 'query' model (e.g. `text-search-curie-query-001`)
* Embed the search query
* Find the closest embeddings in your database
* Return the top results, ranked by cosine similarity
@ -460,7 +460,7 @@ In more advanced search systems, the the cosine similarity of embeddings can be
#### Recommendations
Recommendations are quite similar to search, except that instead of a free-form text query, the inputs are items in a set. And instead of using pairs of doc-query models, you can use a single symmetric similarity model (e.g., `text-similarity-curie-001`).
Recommendations are quite similar to search, except that instead of a free-form text query, the inputs are items in a set.
An example of how to use embeddings for recommendations is shown in [Recommendation_using_embeddings.ipynb](examples/Recommendation_using_embeddings.ipynb).
@ -470,23 +470,23 @@ Similar to search, these cosine similarity scores can either be used on their ow
Although OpenAI's embedding model weights cannot be fine-tuned, you can still use training data to customize embeddings to your application.
In the following notebook, we provide an example method for customizing your embeddings using training data. The idea of the method is to train a custom matrix to multiply embedding vectors by in order to get new customized embeddings. With good training data, this custom matrix will highlight the features relevant to your training labels and suppress the rest. You can equivalently consider the matrix mulitplication as (a) a modification of the embeddings or (b) a modification of the distance function used to measure the distances between embeddings.
In the following notebook, we provide an example method for customizing your embeddings using training data. The idea of the method is to train a custom matrix to multiply embedding vectors by in order to get new customized embeddings. With good training data, this custom matrix will highlight the features relevant to your training labels and suppress the rest. You can equivalently consider the matrix multiplication as (a) a modification of the embeddings or (b) a modification of the distance function used to measure the distances between embeddings.
* [Customizing_embeddings.ipynb](examples/Customizing_embeddings.ipynb)
## Code Capabilities
Large language models aren't only great at text - they can be great at code too. OpenAI's specialized code model is called [Codex](https://openai.com/blog/openai-codex/).
Large language models aren't only great at text - they can be great at code too. OpenAI's specialized code model is called [Codex].
Codex powers [more than 70 products](https://openai.com/blog/codex-apps/), including:
Codex powers [more than 70 products][Codex Apps Blog Post], including:
* [GitHub Copilot](https://copilot.github.com/) (autocompletes code in VS Code and other IDEs)
* [GitHub Copilot] (autocompletes code in VS Code and other IDEs)
* [Pygma](https://pygma.app/) (turns Figma designs into code)
* [Replit](https://replit.com/) (has an 'Explain code' button and other features)
* [Warp](https://www.warp.dev/) (a smart terminal with AI command search)
* [Machinet](https://machinet.net/) (writes Java unit test templates)
Note that unlike instruction-following text models (e.g., `text-davinci-002`), Codex is *not* trained to follow instructions. As a result, designing good prompts can take more care.
Note that unlike instruction-following text models (e.g., `text-davinci-003`), Codex is *not* trained to follow instructions. As a result, designing good prompts can take more care.
### 1. Write code
@ -523,7 +523,7 @@ Code explanation can be applied to many use cases:
* Generating in-code documentation (e.g., Python docstrings, git commit messages)
* Generating out-of-code documentation (e.g., man pages)
* In an interactive code exploration tool
* Communicating program results back to users via a natural langauge interface
* Communicating program results back to users via a natural language interface
An example prompt for explaining code with `code-davinci-002`:
@ -618,9 +618,9 @@ As you can see, `code-davinci-edit-001` was able to successfully reduce the func
The OpenAI API also features code search embeddings, which can measure the relevance of a section of code to a text query, or the similarity between two sections of code.
OpenAI code search embeddings significantly improved the state-of-the-art on the [CodeSearchNet](https://github.com/github/CodeSearchNet) evaluation suite, scoring 93.5% versus the previous record of 77.4%.
OpenAI code search embeddings significantly improved the state-of-the-art on the [CodeSearchNet] evaluation suite, scoring 93.5% versus the previous record of 77.4%.
Read more about OpenAI's code embeddings in the [blog post announcement](https://openai.com/blog/introducing-text-and-code-embeddings/) or [documentation](https://beta.openai.com/docs/guides/embeddings).
Read more about OpenAI's code embeddings in the [blog post announcement][Embeddings Blog Post] or [documentation][Embeddings Docs].
Code embeddings can be useful for use cases such as:
@ -630,3 +630,24 @@ Code embeddings can be useful for use cases such as:
An example of code search is shown in [Code_search.ipynb](examples/Code_search.ipynb).
We haven't written an example of code clustering, but the idea is the same as the text clustering in [Clustering.ipynb](examples/Clustering.ipynb).
[OpenAI API]: https://openai.com/api/
[Embeddings Docs]: https://beta.openai.com/docs/guides/embeddings
[Edit API Docs]: https://beta.openai.com/docs/api-reference/edits
[Completion API Docs]: https://beta.openai.com/docs/api-reference/completions
[Fine Tuning Docs]: https://beta.openai.com/docs/guides/fine-tuning
[CodeSearchNet]: https://github.com/github/CodeSearchNet
[Embeddings Blog Post]: https://openai.com/blog/introducing-text-and-code-embeddings/
[Codex Apps Blog Post]: https://openai.com/blog/codex-apps/
[GPT3 Edit Blog Post]: https://openai.com/blog/gpt-3-edit-insert/
[Large language models Blog Post]: https://openai.com/blog/better-language-models/
[GitHub Copilot]: https://copilot.github.com/
[Codex]: https://openai.com/blog/openai-codex/
[API Signup]: https://beta.openai.com/signup
[GPT3 Apps Blog Post]: https://openai.com/blog/gpt-3-apps/
[OpenAI Playground]: https://beta.openai.com/playground
[OpenAI Documentation]: https://beta.openai.com/docs/introduction
[OpenAI Community Forum]: https://community.openai.com/top?period=monthly
[OpenAI Help Center]: https://help.openai.com/en/
[OpenAI Examples]: https://beta.openai.com/examples

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,12 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Code search\n",
"\n",
"We index our own openai-python code repository, and show how it can be searched. We implement a simple version of file parsing and extracting of functions from python files."
"We index our own [openai-python code repository](https://github.com/openai/openai-python), and show how it can be searched. We implement a simple version of file parsing and extracting of functions from python files."
]
},
{
@ -18,8 +19,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of py files: 40\n",
"Total number of functions extracted: 64\n"
"Total number of py files: 51\n",
"Total number of functions extracted: 97\n"
]
}
],
@ -63,18 +64,24 @@
"\n",
"# get user root directory\n",
"root_dir = os.path.expanduser(\"~\")\n",
"# note: for this code to work, the openai-python repo must be downloaded and placed in your root directory\n",
"\n",
"# path to code repository directory\n",
"code_root = root_dir + \"/openai-python\"\n",
"\n",
"code_files = [y for x in os.walk(code_root) for y in glob(os.path.join(x[0], '*.py'))]\n",
"print(\"Total number of py files:\", len(code_files))\n",
"\n",
"if len(code_files) == 0:\n",
" print(\"Double check that you have downloaded the openai-python repo and set the code_root variable correctly.\")\n",
"\n",
"all_funcs = []\n",
"for code_file in code_files:\n",
" funcs = list(get_functions(code_file))\n",
" for func in funcs:\n",
" all_funcs.append(func)\n",
"\n",
"print(\"Total number of functions extracted:\", len(all_funcs))\n"
"print(\"Total number of functions extracted:\", len(all_funcs))"
]
},
{
@ -119,38 +126,38 @@
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>def semantic_search(engine, query, documents):...</td>\n",
" <td>semantic_search</td>\n",
" <td>/examples/semanticsearch/semanticsearch.py</td>\n",
" <td>[-0.038976121693849564, -0.0031428150832653046...</td>\n",
" <td>def _console_log_level():\\n if openai.log i...</td>\n",
" <td>_console_log_level</td>\n",
" <td>/openai/util.py</td>\n",
" <td>[0.03389773145318031, -0.004390408284962177, 0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>def main():\\n parser = argparse.ArgumentPar...</td>\n",
" <td>main</td>\n",
" <td>/examples/semanticsearch/semanticsearch.py</td>\n",
" <td>[-0.024289356544613838, -0.017748363316059113,...</td>\n",
" <td>def log_debug(message, **params):\\n msg = l...</td>\n",
" <td>log_debug</td>\n",
" <td>/openai/util.py</td>\n",
" <td>[-0.004034275189042091, 0.004895383026450872, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>def get_candidates(\\n prompt: str,\\n sto...</td>\n",
" <td>get_candidates</td>\n",
" <td>/examples/codex/backtranslation.py</td>\n",
" <td>[-0.04161201789975166, -0.0169310811907053, 0....</td>\n",
" <td>def log_info(message, **params):\\n msg = lo...</td>\n",
" <td>log_info</td>\n",
" <td>/openai/util.py</td>\n",
" <td>[0.004882764536887407, 0.0033515947870910168, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>def rindex(lst: List, value: str) -&gt; int:\\n ...</td>\n",
" <td>rindex</td>\n",
" <td>/examples/codex/backtranslation.py</td>\n",
" <td>[-0.027255680412054062, -0.007931121625006199,...</td>\n",
" <td>def log_warn(message, **params):\\n msg = lo...</td>\n",
" <td>log_warn</td>\n",
" <td>/openai/util.py</td>\n",
" <td>[0.002535992069169879, -0.010829543694853783, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>def eval_candidate(\\n candidate_answer: str...</td>\n",
" <td>eval_candidate</td>\n",
" <td>/examples/codex/backtranslation.py</td>\n",
" <td>[-0.00999179296195507, -0.01640152558684349, 0...</td>\n",
" <td>def logfmt(props):\\n def fmt(key, val):\\n ...</td>\n",
" <td>logfmt</td>\n",
" <td>/openai/util.py</td>\n",
" <td>[0.016732551157474518, 0.017367802560329437, 0...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
@ -158,25 +165,18 @@
],
"text/plain": [
" code function_name \\\n",
"0 def semantic_search(engine, query, documents):... semantic_search \n",
"1 def main():\\n parser = argparse.ArgumentPar... main \n",
"2 def get_candidates(\\n prompt: str,\\n sto... get_candidates \n",
"3 def rindex(lst: List, value: str) -> int:\\n ... rindex \n",
"4 def eval_candidate(\\n candidate_answer: str... eval_candidate \n",
"0 def _console_log_level():\\n if openai.log i... _console_log_level \n",
"1 def log_debug(message, **params):\\n msg = l... log_debug \n",
"2 def log_info(message, **params):\\n msg = lo... log_info \n",
"3 def log_warn(message, **params):\\n msg = lo... log_warn \n",
"4 def logfmt(props):\\n def fmt(key, val):\\n ... logfmt \n",
"\n",
" filepath \\\n",
"0 /examples/semanticsearch/semanticsearch.py \n",
"1 /examples/semanticsearch/semanticsearch.py \n",
"2 /examples/codex/backtranslation.py \n",
"3 /examples/codex/backtranslation.py \n",
"4 /examples/codex/backtranslation.py \n",
"\n",
" code_embedding \n",
"0 [-0.038976121693849564, -0.0031428150832653046... \n",
"1 [-0.024289356544613838, -0.017748363316059113,... \n",
"2 [-0.04161201789975166, -0.0169310811907053, 0.... \n",
"3 [-0.027255680412054062, -0.007931121625006199,... \n",
"4 [-0.00999179296195507, -0.01640152558684349, 0... "
" filepath code_embedding \n",
"0 /openai/util.py [0.03389773145318031, -0.004390408284962177, 0... \n",
"1 /openai/util.py [-0.004034275189042091, 0.004895383026450872, ... \n",
"2 /openai/util.py [0.004882764536887407, 0.0033515947870910168, ... \n",
"3 /openai/util.py [0.002535992069169879, -0.010829543694853783, ... \n",
"4 /openai/util.py [0.016732551157474518, 0.017367802560329437, 0... "
]
},
"execution_count": 2,
@ -188,12 +188,109 @@
"from openai.embeddings_utils import get_embedding\n",
"\n",
"df = pd.DataFrame(all_funcs)\n",
"df['code_embedding'] = df['code'].apply(lambda x: get_embedding(x, engine='code-search-babbage-code-001'))\n",
"df['code_embedding'] = df['code'].apply(lambda x: get_embedding(x, engine='text-embedding-ada-002'))\n",
"df['filepath'] = df['filepath'].apply(lambda x: x.replace(code_root, \"\"))\n",
"df.to_csv(\"output/code_search_openai-python.csv\", index=False)\n",
"df.to_csv(\"data/code_search_openai-python.csv\", index=False)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/openai/tests/test_endpoints.py:test_completions score=0.826\n",
"def test_completions():\n",
" result = openai.Completion.create(prompt=\"This was a test\", n=5, engine=\"ada\")\n",
" assert len(result.choices) == 5\n",
"\n",
"\n",
"----------------------------------------------------------------------\n",
"/openai/tests/test_endpoints.py:test_completions_model score=0.811\n",
"def test_completions_model():\n",
" result = openai.Completion.create(prompt=\"This was a test\", n=5, model=\"ada\")\n",
" assert len(result.choices) == 5\n",
" assert result.model.startswith(\"ada\")\n",
"\n",
"\n",
"----------------------------------------------------------------------\n",
"/openai/tests/test_endpoints.py:test_completions_multiple_prompts score=0.808\n",
"def test_completions_multiple_prompts():\n",
" result = openai.Completion.create(\n",
" prompt=[\"This was a test\", \"This was another test\"], n=5, engine=\"ada\"\n",
" )\n",
" assert len(result.choices) == 10\n",
"\n",
"\n",
"----------------------------------------------------------------------\n"
]
}
],
"source": [
"from openai.embeddings_utils import cosine_similarity\n",
"\n",
"def search_functions(df, code_query, n=3, pprint=True, n_lines=7):\n",
" embedding = get_embedding(code_query, engine='text-embedding-ada-002')\n",
" df['similarities'] = df.code_embedding.apply(lambda x: cosine_similarity(x, embedding))\n",
"\n",
" res = df.sort_values('similarities', ascending=False).head(n)\n",
" if pprint:\n",
" for r in res.iterrows():\n",
" print(r[1].filepath+\":\"+r[1].function_name + \" score=\" + str(round(r[1].similarities, 3)))\n",
" print(\"\\n\".join(r[1].code.split(\"\\n\")[:n_lines]))\n",
" print('-'*70)\n",
" return res\n",
"\n",
"res = search_functions(df, 'Completions API tests', n=3)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/openai/validators.py:format_inferrer_validator score=0.751\n",
"def format_inferrer_validator(df):\n",
" \"\"\"\n",
" This validator will infer the likely fine-tuning format of the data, and display it to the user if it is classification.\n",
" It will also suggest to use ada and explain train/validation split benefits.\n",
" \"\"\"\n",
" ft_type = infer_task_type(df)\n",
" immediate_msg = None\n",
"----------------------------------------------------------------------\n",
"/openai/validators.py:get_validators score=0.748\n",
"def get_validators():\n",
" return [\n",
" num_examples_validator,\n",
" lambda x: necessary_column_validator(x, \"prompt\"),\n",
" lambda x: necessary_column_validator(x, \"completion\"),\n",
" additional_column_validator,\n",
" non_empty_field_validator,\n",
"----------------------------------------------------------------------\n",
"/openai/validators.py:infer_task_type score=0.738\n",
"def infer_task_type(df):\n",
" \"\"\"\n",
" Infer the likely fine-tuning task type from the data\n",
" \"\"\"\n",
" CLASSIFICATION_THRESHOLD = 3 # min_average instances of each class\n",
" if sum(df.prompt.str.len()) == 0:\n",
" return \"open-ended generation\"\n",
"----------------------------------------------------------------------\n"
]
}
],
"source": [
"res = search_functions(df, 'fine-tuning input data validation logic', n=3)"
]
},
{
"cell_type": "code",
"execution_count": 5,
@ -203,48 +300,35 @@
"name": "stdout",
"output_type": "stream",
"text": [
"/openai/tests/test_endpoints.py:test_completions_multiple_prompts score=0.681\n",
"def test_completions_multiple_prompts():\n",
" result = openai.Completion.create(\n",
" prompt=[\"This was a test\", \"This was another test\"], n=5, engine=\"ada\"\n",
" )\n",
" assert len(result.choices) == 10\n",
"\n",
"/openai/validators.py:get_common_xfix score=0.793\n",
"def get_common_xfix(series, xfix=\"suffix\"):\n",
" \"\"\"\n",
" Finds the longest common suffix or prefix of all the values in a series\n",
" \"\"\"\n",
" common_xfix = \"\"\n",
" while True:\n",
" common_xfixes = (\n",
" series.str[-(len(common_xfix) + 1) :]\n",
" if xfix == \"suffix\"\n",
" else series.str[: len(common_xfix) + 1]\n",
"----------------------------------------------------------------------\n",
"/openai/tests/test_endpoints.py:test_completions score=0.675\n",
"def test_completions():\n",
" result = openai.Completion.create(prompt=\"This was a test\", n=5, engine=\"ada\")\n",
" assert len(result.choices) == 5\n",
"/openai/validators.py:common_completion_suffix_validator score=0.778\n",
"def common_completion_suffix_validator(df):\n",
" \"\"\"\n",
" This validator will suggest to add a common suffix to the completion if one doesn't already exist in case of classification or conditional generation.\n",
" \"\"\"\n",
" error_msg = None\n",
" immediate_msg = None\n",
" optional_msg = None\n",
" optional_fn = None\n",
"\n",
"\n",
"----------------------------------------------------------------------\n",
"/openai/tests/test_api_requestor.py:test_requestor_sets_request_id score=0.635\n",
"def test_requestor_sets_request_id(mocker: MockerFixture) -> None:\n",
" # Fake out 'requests' and confirm that the X-Request-Id header is set.\n",
"\n",
" got_headers = {}\n",
"\n",
" def fake_request(self, *args, **kwargs):\n",
" nonlocal got_headers\n",
" ft_type = infer_task_type(df)\n",
"----------------------------------------------------------------------\n"
]
}
],
"source": [
"from openai.embeddings_utils import cosine_similarity\n",
"\n",
"def search_functions(df, code_query, n=3, pprint=True, n_lines=7):\n",
" embedding = get_embedding(code_query, engine='code-search-babbage-text-001')\n",
" df['similarities'] = df.code_embedding.apply(lambda x: cosine_similarity(x, embedding))\n",
"\n",
" res = df.sort_values('similarities', ascending=False).head(n)\n",
" if pprint:\n",
" for r in res.iterrows():\n",
" print(r[1].filepath+\":\"+r[1].function_name + \" score=\" + str(round(r[1].similarities, 3)))\n",
" print(\"\\n\".join(r[1].code.split(\"\\n\")[:n_lines]))\n",
" print('-'*70)\n",
" return res\n",
"res = search_functions(df, 'Completions API tests', n=3)\n"
"res = search_functions(df, 'find common suffix', n=2, n_lines=10)"
]
},
{
@ -256,90 +340,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"/openai/validators.py:format_inferrer_validator score=0.655\n",
"def format_inferrer_validator(df):\n",
" \"\"\"\n",
" This validator will infer the likely fine-tuning format of the data, and display it to the user if it is classification.\n",
" It will also suggest to use ada and explain train/validation split benefits.\n",
" \"\"\"\n",
" ft_type = infer_task_type(df)\n",
" immediate_msg = None\n",
"----------------------------------------------------------------------\n",
"/openai/validators.py:long_examples_validator score=0.649\n",
"def long_examples_validator(df):\n",
" \"\"\"\n",
" This validator will suggest to the user to remove examples that are too long.\n",
" \"\"\"\n",
" immediate_msg = None\n",
" optional_msg = None\n",
" optional_fn = None\n",
"----------------------------------------------------------------------\n",
"/openai/validators.py:non_empty_completion_validator score=0.646\n",
"def non_empty_completion_validator(df):\n",
" \"\"\"\n",
" This validator will ensure that no completion is empty.\n",
" \"\"\"\n",
" necessary_msg = None\n",
" necessary_fn = None\n",
" immediate_msg = None\n",
"----------------------------------------------------------------------\n"
]
}
],
"source": [
"res = search_functions(df, 'fine-tuning input data validation logic', n=3)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/openai/validators.py:common_completion_suffix_validator score=0.665\n",
"def common_completion_suffix_validator(df):\n",
" \"\"\"\n",
" This validator will suggest to add a common suffix to the completion if one doesn't already exist in case of classification or conditional generation.\n",
" \"\"\"\n",
" error_msg = None\n",
" immediate_msg = None\n",
" optional_msg = None\n",
" optional_fn = None\n",
"\n",
" ft_type = infer_task_type(df)\n",
"----------------------------------------------------------------------\n",
"/openai/validators.py:get_outfnames score=0.66\n",
"def get_outfnames(fname, split):\n",
" suffixes = [\"_train\", \"_valid\"] if split else [\"\"]\n",
" i = 0\n",
" while True:\n",
" index_suffix = f\" ({i})\" if i > 0 else \"\"\n",
" candidate_fnames = [\n",
" fname.split(\".\")[0] + \"_prepared\" + suffix + index_suffix + \".jsonl\"\n",
" for suffix in suffixes\n",
" ]\n",
" if not any(os.path.isfile(f) for f in candidate_fnames):\n",
"----------------------------------------------------------------------\n"
]
}
],
"source": [
"res = search_functions(df, 'find common suffix', n=2, n_lines=10)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/openai/cli.py:tools_register score=0.651\n",
"/openai/cli.py:tools_register score=0.773\n",
"def tools_register(parser):\n",
" subparsers = parser.add_subparsers(\n",
" title=\"Tools\", help=\"Convenience client side tools\"\n",
@ -374,8 +375,9 @@
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"name": "python3"
"display_name": "openai-cookbook",
"language": "python",
"name": "openai-cookbook"
},
"language_info": {
"codemirror_mode": {
@ -387,7 +389,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.6"
},
"orig_nbformat": 4
},

File diff suppressed because it is too large Load Diff

View File

@ -17,7 +17,7 @@
{
"data": {
"text/plain": [
"12288"
"1536"
]
},
"execution_count": 1,
@ -28,8 +28,11 @@
"source": [
"import openai\n",
"\n",
"embedding = openai.Embedding.create(input=\"Sample document text goes here\", engine=\"text-similarity-davinci-001\")['data'][0]['embedding']\n",
"len(embedding)"
"embedding = openai.Embedding.create(\n",
" input=\"Your text goes here\",\n",
" engine=\"text-embedding-ada-002\"\n",
")[\"data\"][0][\"embedding\"]\n",
"len(embedding)\n"
]
},
{
@ -41,7 +44,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"1024\n"
"1536\n"
]
}
],
@ -49,43 +52,25 @@
"import openai\n",
"from tenacity import retry, wait_random_exponential, stop_after_attempt\n",
"\n",
"\n",
"@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))\n",
"def get_embedding(text: str, engine=\"text-similarity-davinci-001\") -> List[float]:\n",
"def get_embedding(text: str, engine=\"text-embedding-ada-002\") -> list[float]:\n",
"\n",
" # replace newlines, which can negatively affect performance.\n",
" text = text.replace(\"\\n\", \" \")\n",
"\n",
" return openai.Embedding.create(input=[text], engine=engine)[\"data\"][0][\"embedding\"]\n",
"\n",
"embedding = get_embedding(\"Sample query text goes here\", engine=\"text-search-ada-query-001\")\n",
"print(len(embedding))"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1024\n"
]
}
],
"source": [
"embedding = get_embedding(\"Sample document text goes here\", engine=\"text-search-ada-doc-001\")\n",
"print(len(embedding))"
"\n",
"embedding = get_embedding(\"Your text goes here\", engine=\"text-embedding-ada-002\")\n",
"print(len(embedding))\n"
]
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
@ -98,9 +83,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.9"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@ -0,0 +1,604 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to handle rate limits\n",
"\n",
"When you call the OpenAI API repeatedly, you may encounter error messages that say `429: 'Too Many Requests'` or `RateLimitError`. These error messages come from exceeding the API's rate limits.\n",
"\n",
"Rate limits are a common practice for APIs, and they're put in place for a few different reasons.\n",
"\n",
"- First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.\n",
"- Second, rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, OpenAI ensures that everyone has an opportunity to use the API without experiencing slowdowns.\n",
"- Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience for all users.\n",
"\n",
"Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation of the API for its users.\n",
"\n",
"In this guide, we'll share some tips for avoiding and handling rate limit errors."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Default rate limits\n",
"\n",
"As of Sep 2022, the default rate limits are:\n",
"\n",
"<table>\n",
"<thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>Text Completion &amp; Embedding endpoints</th>\n",
" <th>Code &amp; Edit endpoints</th>\n",
" </tr>\n",
"</thead>\n",
"<tbody>\n",
" <tr>\n",
" <td>Free trial users</td>\n",
" <td>\n",
" <ul>\n",
" <li>20 requests / minute</li>\n",
" <li>150,000 tokens / minute</li>\n",
" </ul>\n",
" </td>\n",
" <td>\n",
" <ul>\n",
" <li>20 requests / minute</li>\n",
" <li>150,000 tokens / minute</li>\n",
" </ul>\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td>Pay-as-you-go users (in your first 48 hours)</td>\n",
" <td>\n",
" <ul>\n",
" <li>60 requests / minute</li>\n",
" <li>250,000 davinci tokens / minute (and proportionally more for smaller models)</li>\n",
" </ul>\n",
" </td>\n",
" <td>\n",
" <ul>\n",
" <li>20 requests / minute</li>\n",
" <li>150,000 tokens / minute</li>\n",
" </ul>\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td>Pay-as-you-go users (after your first 48 hours)</td>\n",
" <td>\n",
" <ul>\n",
" <li>3,000 requests / minute</li>\n",
" <li>250,000 davinci tokens / minute (and proportionally more for smaller models)</li>\n",
" </ul>\n",
" </td>\n",
" <td>\n",
" <ul>\n",
" <li>20 requests / minute</li>\n",
" <li>150,000 tokens / minute</li>\n",
" </ul>\n",
" </td>\n",
" </tr>\n",
"</tbody>\n",
"</table>\n",
"\n",
"For reference, 1,000 tokens is roughly a page of text.\n",
"\n",
"### Other rate limit resources\n",
"\n",
"Read more about OpenAI's rate limits in the [OpenAI Help Center](https://help.openai.com/en/):\n",
"\n",
"- [Is API usage subject to any rate limits?](https://help.openai.com/en/articles/5955598-is-api-usage-subject-to-any-rate-limits)\n",
"- [How can I solve 429: 'Too Many Requests' errors?](https://help.openai.com/en/articles/5955604-how-can-i-solve-429-too-many-requests-errors)\n",
"\n",
"### Requesting a rate limit increase\n",
"\n",
"If you'd like your organization's rate limit increased, please feel free to reach out to <support@openai.com> with the following information:\n",
"\n",
"- The model(s) you need increased limits on\n",
"- The estimated rate of requests\n",
"- The reason for the increase"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example rate limit error\n",
"\n",
"A rate limit error will occur when API requests are sent too quickly. If using the OpenAI Python library, they will look something like:\n",
"\n",
"```\n",
"RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min. Contact support@openai.com if you continue to have issues or if youd like to request an increase.\n",
"```\n",
"\n",
"Below is example code for triggering a rate limit error."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import openai # for making OpenAI API requests\n",
"\n",
"# request a bunch of completions in a loop\n",
"for _ in range(100):\n",
" openai.Completion.create(\n",
" model=\"code-cushman-001\",\n",
" prompt=\"def magic_function():\\n\\t\",\n",
" max_tokens=10,\n",
" )\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to avoid rate limit errors\n",
"\n",
"### Retrying with exponential backoff\n",
"\n",
"One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.\n",
"\n",
"This approach has many benefits:\n",
"\n",
"- Automatic retries means you can recover from rate limit errors without crashes or missing data\n",
"- Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail\n",
"- Adding random jitter to the delay helps retries from all hitting at the same time\n",
"\n",
"Note that unsuccessful requests contribute to your per-minute limit, so continuously resending a request wont work.\n",
"\n",
"Below are a few example solutions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Example #1: Using the Tenacity library\n",
"\n",
"[Tenacity](https://tenacity.readthedocs.io/en/latest/) is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.\n",
"\n",
"To add exponential backoff to your requests, you can use the `tenacity.retry` [decorator](https://peps.python.org/pep-0318/). The following example uses the `tenacity.wait_random_exponential` function to add random exponential backoff to a request.\n",
"\n",
"Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<OpenAIObject text_completion id=cmpl-5oowO391reUW8RGVfFyzBM1uBs4A5 at 0x10d8cae00> JSON: {\n",
" \"choices\": [\n",
" {\n",
" \"finish_reason\": \"length\",\n",
" \"index\": 0,\n",
" \"logprobs\": null,\n",
" \"text\": \" a little girl dreamed of becoming a model.\\n\\nNowadays, that dream\"\n",
" }\n",
" ],\n",
" \"created\": 1662793900,\n",
" \"id\": \"cmpl-5oowO391reUW8RGVfFyzBM1uBs4A5\",\n",
" \"model\": \"text-davinci-002\",\n",
" \"object\": \"text_completion\",\n",
" \"usage\": {\n",
" \"completion_tokens\": 16,\n",
" \"prompt_tokens\": 5,\n",
" \"total_tokens\": 21\n",
" }\n",
"}"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import openai # for OpenAI API calls\n",
"from tenacity import (\n",
" retry,\n",
" stop_after_attempt,\n",
" wait_random_exponential,\n",
") # for exponential backoff\n",
"\n",
"\n",
"@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))\n",
"def completion_with_backoff(**kwargs):\n",
" return openai.Completion.create(**kwargs)\n",
"\n",
"\n",
"completion_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Example #2: Using the backoff library\n",
"\n",
"Another library that provides function decorators for backoff and retry is [backoff](https://pypi.org/project/backoff/).\n",
"\n",
"Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<OpenAIObject text_completion id=cmpl-5oowPhIdUvshEsF1rBhhwE9KFfI3M at 0x111043680> JSON: {\n",
" \"choices\": [\n",
" {\n",
" \"finish_reason\": \"length\",\n",
" \"index\": 0,\n",
" \"logprobs\": null,\n",
" \"text\": \" two children lived in a poor country village. In the winter, the temperature would\"\n",
" }\n",
" ],\n",
" \"created\": 1662793901,\n",
" \"id\": \"cmpl-5oowPhIdUvshEsF1rBhhwE9KFfI3M\",\n",
" \"model\": \"text-davinci-002\",\n",
" \"object\": \"text_completion\",\n",
" \"usage\": {\n",
" \"completion_tokens\": 16,\n",
" \"prompt_tokens\": 5,\n",
" \"total_tokens\": 21\n",
" }\n",
"}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import backoff # for exponential backoff\n",
"import openai # for OpenAI API calls\n",
"\n",
"\n",
"@backoff.on_exception(backoff.expo, openai.error.RateLimitError)\n",
"def completions_with_backoff(**kwargs):\n",
" return openai.Completion.create(**kwargs)\n",
"\n",
"\n",
"completions_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Example 3: Manual backoff implementation\n",
"\n",
"If you don't want to use third-party libraries, you can implement your own backoff logic."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<OpenAIObject text_completion id=cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT at 0x111024220> JSON: {\n",
" \"choices\": [\n",
" {\n",
" \"finish_reason\": \"length\",\n",
" \"index\": 0,\n",
" \"logprobs\": null,\n",
" \"text\": \" a man decided to greatly improve his karma by turning his life around.\\n\\n\"\n",
" }\n",
" ],\n",
" \"created\": 1662793903,\n",
" \"id\": \"cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT\",\n",
" \"model\": \"text-davinci-002\",\n",
" \"object\": \"text_completion\",\n",
" \"usage\": {\n",
" \"completion_tokens\": 16,\n",
" \"prompt_tokens\": 5,\n",
" \"total_tokens\": 21\n",
" }\n",
"}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# imports\n",
"import random\n",
"import time\n",
"\n",
"import openai\n",
"\n",
"# define a retry decorator\n",
"def retry_with_exponential_backoff(\n",
" func,\n",
" initial_delay: float = 1,\n",
" exponential_base: float = 2,\n",
" jitter: bool = True,\n",
" max_retries: int = 10,\n",
" errors: tuple = (openai.error.RateLimitError,),\n",
"):\n",
" \"\"\"Retry a function with exponential backoff.\"\"\"\n",
"\n",
" def wrapper(*args, **kwargs):\n",
" # Initialize variables\n",
" num_retries = 0\n",
" delay = initial_delay\n",
"\n",
" # Loop until a successful response or max_retries is hit or an exception is raised\n",
" while True:\n",
" try:\n",
" return func(*args, **kwargs)\n",
"\n",
" # Retry on specified errors\n",
" except errors as e:\n",
" # Increment retries\n",
" num_retries += 1\n",
"\n",
" # Check if max retries has been reached\n",
" if num_retries > max_retries:\n",
" raise Exception(\n",
" f\"Maximum number of retries ({max_retries}) exceeded.\"\n",
" )\n",
"\n",
" # Increment the delay\n",
" delay *= exponential_base * (1 + jitter * random.random())\n",
"\n",
" # Sleep for the delay\n",
" time.sleep(delay)\n",
"\n",
" # Raise exceptions for any errors not specified\n",
" except Exception as e:\n",
" raise e\n",
"\n",
" return wrapper\n",
"\n",
"\n",
"@retry_with_exponential_backoff\n",
"def completions_with_backoff(**kwargs):\n",
" return openai.Completion.create(**kwargs)\n",
"\n",
"\n",
"completions_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to maximize throughput of batch processing given rate limits\n",
"\n",
"If you're processing real-time requests from users, backoff and retry is a great strategy to minimize latency while avoiding rate limit errors.\n",
"\n",
"However, if you're processing large volumes of batch data, where throughput matters more than latency, there are a few other things you can do in addition to backoff and retry.\n",
"\n",
"### Proactively adding delay between requests\n",
"\n",
"If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, it's possible that a good fraction of your request budget will be 'wasted' on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.\n",
"\n",
"Here, one potential solution is to calculate your rate limit and add a delay equal to its reciprocal (e.g., if your rate limit 20 requests per minute, add a delay of 3 seconds to each request). This can help you operate near the rate limit ceiling without hitting it and incurring wasted requests.\n",
"\n",
"#### Example of adding delay to a request"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<OpenAIObject text_completion id=cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp at 0x11b2c7680> JSON: {\n",
" \"choices\": [\n",
" {\n",
" \"finish_reason\": \"length\",\n",
" \"index\": 0,\n",
" \"logprobs\": null,\n",
" \"text\": \" there was an idyllic little farm that sat by a babbling brook\"\n",
" }\n",
" ],\n",
" \"created\": 1662793907,\n",
" \"id\": \"cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp\",\n",
" \"model\": \"text-davinci-002\",\n",
" \"object\": \"text_completion\",\n",
" \"usage\": {\n",
" \"completion_tokens\": 16,\n",
" \"prompt_tokens\": 5,\n",
" \"total_tokens\": 21\n",
" }\n",
"}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# imports\n",
"import time\n",
"import openai\n",
"\n",
"# Define a function that adds a delay to a Completion API call\n",
"def delayed_completion(delay_in_seconds: float = 1, **kwargs):\n",
" \"\"\"Delay a completion by a specified amount of time.\"\"\"\n",
"\n",
" # Sleep for the delay\n",
" time.sleep(delay_in_seconds)\n",
"\n",
" # Call the Completion API and return the result\n",
" return openai.Completion.create(**kwargs)\n",
"\n",
"\n",
"# Calculate the delay based on your rate limit\n",
"rate_limit_per_minute = 20\n",
"delay = 60.0 / rate_limit_per_minute\n",
"\n",
"delayed_completion(\n",
" delay_in_seconds=delay,\n",
" model=\"text-davinci-002\",\n",
" prompt=\"Once upon a time,\"\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"### Batching requests\n",
"\n",
"The OpenAI API has separate limits for requests per minute and tokens per minute.\n",
"\n",
"If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with the smaller models.\n",
"\n",
"Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a list of strings to `prompt` parameter instead of a single string.\n",
"\n",
"**Warning:** the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the `index` field.\n",
"\n",
"#### Example without batching"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Once upon a time, before there were grandiloquent tales of the massacre at Fort Mims, there were stories of\n",
"Once upon a time, a full-sized search and rescue was created. However, CIDIs are the addition of requiring\n",
"Once upon a time, Schubert was hot with the films. “Schubert sings of honey, flowers,\n",
"Once upon a time, you could watch these films on your VCR, sometimes years after their initial theatrical release, and there\n",
"Once upon a time, there was a forest. In that forest, the forest animals ruled. The forest animals had their homes\n",
"Once upon a time, there were two programs that complained about false positive scans. Peacock and Midnight Manager alike, only\n",
"Once upon a time, a long, long time ago, tragedy struck. it was the darkest of nights, and there was\n",
"Once upon a time, when Adam was a perfect little gentleman, he was presented at Court as a guarantee of good character.\n",
"Once upon a time, Adam and Eve made a mistake. They ate the fruit from the tree of immortality and split the consequences\n",
"Once upon a time, there was a set of programming fundamental principles known as the \"X model.\" This is a set of\n"
]
}
],
"source": [
"import openai # for making OpenAI API requests\n",
"\n",
"\n",
"num_stories = 10\n",
"prompt = \"Once upon a time,\"\n",
"\n",
"# serial example, with one story completion per request\n",
"for _ in range(num_stories):\n",
" response = openai.Completion.create(\n",
" model=\"curie\",\n",
" prompt=prompt,\n",
" max_tokens=20,\n",
" )\n",
"\n",
" # print story\n",
" print(prompt + response.choices[0].text)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Example with batching"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Once upon a time, there were two sisters, Eliza Pickering and Ariana 'Ari' Lucas. When these lovely\n",
"Once upon a time, Keene was stung by a worm — actually, probably a python — snaking through his leg\n",
"Once upon a time, there was a professor of physics during the depression. It was difficult, during this time, to get\n",
"Once upon a time, before you got sick, you told stories to all and sundry, and your listeners believed in you\n",
"Once upon a time, there was one very old nice donkey. He was incredibly smart, in a very old, kind of\n",
"Once upon a time, the property of a common lodging house was a common cup for all the inhabitants. Betimes a constant\n",
"Once upon a time, in an unspecified country, there was a witch who had an illegal product. It was highly effective,\n",
"Once upon a time, a long time ago, I turned 13, my beautiful dog Duncan swept me up into his jaws like\n",
"Once upon a time, as a thoroughly reformed creature from an army of Nazis, he took On Judgement Day myself and his\n",
"Once upon a time, Capcom made a game for the Atari VCS called Missile Command. While it was innovative at the time\n"
]
}
],
"source": [
"import openai # for making OpenAI API requests\n",
"\n",
"\n",
"num_stories = 10\n",
"prompts = [\"Once upon a time,\"] * num_stories\n",
"\n",
"# batched example, with 10 stories completions per request\n",
"response = openai.Completion.create(\n",
" model=\"curie\",\n",
" prompt=prompts,\n",
" max_tokens=20,\n",
")\n",
"\n",
"# match completions to prompts by index\n",
"stories = [\"\"] * len(prompts)\n",
"for choice in response.choices:\n",
" stories[choice.index] = prompts[choice.index] + choice.text\n",
"\n",
"# print stories\n",
"for story in stories:\n",
" print(story)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,369 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to stream completions\n",
"\n",
"By default, when you send a prompt to the OpenAI Completions endpoint, it computes the entire completion and sends it back in a single response.\n",
"\n",
"If you're generating very long completions from a davinci-level model, waiting for the response can take many seconds. As of Aug 2022, responses from `text-davinci-002` typically take something like ~1 second plus ~2 seconds per 100 completion tokens.\n",
"\n",
"If you want to get the response faster, you can 'stream' the completion as it's being generated. This allows you to start printing or otherwise processing the beginning of the completion before the entire completion is finished.\n",
"\n",
"To stream completions, set `stream=True` when calling the Completions endpoint. This will return an object that streams back text as [data-only server-sent events](https://app.mode.com/openai/reports/4fce5ba22b5b/runs/f518a0be4495).\n",
"\n",
"Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n",
"\n",
"Below is a Python code example of how to receive streaming completions."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"import openai # for OpenAI API calls\n",
"import time # for measuring time savings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A typical completion request\n",
"\n",
"With a typical Completions API call, the text is first computed and then returned all at once."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Full response received 7.32 seconds after request\n",
"Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n"
]
}
],
"source": [
"# Example of an OpenAI Completion request\n",
"# https://beta.openai.com/docs/api-reference/completions/create\n",
"\n",
"# record the time before the request is sent\n",
"start_time = time.time()\n",
"\n",
"# send a Completion request to count to 100\n",
"response = openai.Completion.create(\n",
" model='text-davinci-002',\n",
" prompt='1,2,3,',\n",
" max_tokens=193,\n",
" temperature=0,\n",
")\n",
"\n",
"# calculate the time it took to receive the response\n",
"response_time = time.time() - start_time\n",
"\n",
"# extract the text from the response\n",
"completion_text = response['choices'][0]['text']\n",
"\n",
"# print the time delay and text received\n",
"print(f\"Full response received {response_time:.2f} seconds after request\")\n",
"print(f\"Full text received: {completion_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A streaming completion request\n",
"\n",
"With a streaming Completions API call, the text is sent back via a series of events. In Python, you can iterate over these events with a `for` loop."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Text received: 4 (0.16 seconds after request)\n",
"Text received: , (0.19 seconds after request)\n",
"Text received: 5 (0.21 seconds after request)\n",
"Text received: , (0.24 seconds after request)\n",
"Text received: 6 (0.27 seconds after request)\n",
"Text received: , (0.29 seconds after request)\n",
"Text received: 7 (0.32 seconds after request)\n",
"Text received: , (0.35 seconds after request)\n",
"Text received: 8 (0.37 seconds after request)\n",
"Text received: , (0.40 seconds after request)\n",
"Text received: 9 (0.43 seconds after request)\n",
"Text received: , (0.46 seconds after request)\n",
"Text received: 10 (0.48 seconds after request)\n",
"Text received: , (0.51 seconds after request)\n",
"Text received: 11 (0.54 seconds after request)\n",
"Text received: , (0.56 seconds after request)\n",
"Text received: 12 (0.59 seconds after request)\n",
"Text received: , (0.62 seconds after request)\n",
"Text received: 13 (0.64 seconds after request)\n",
"Text received: , (0.67 seconds after request)\n",
"Text received: 14 (0.70 seconds after request)\n",
"Text received: , (0.72 seconds after request)\n",
"Text received: 15 (0.75 seconds after request)\n",
"Text received: , (0.78 seconds after request)\n",
"Text received: 16 (0.84 seconds after request)\n",
"Text received: , (0.84 seconds after request)\n",
"Text received: 17 (0.86 seconds after request)\n",
"Text received: , (0.89 seconds after request)\n",
"Text received: 18 (0.91 seconds after request)\n",
"Text received: , (0.94 seconds after request)\n",
"Text received: 19 (1.41 seconds after request)\n",
"Text received: , (1.41 seconds after request)\n",
"Text received: 20 (1.41 seconds after request)\n",
"Text received: , (1.41 seconds after request)\n",
"Text received: 21 (1.41 seconds after request)\n",
"Text received: , (1.41 seconds after request)\n",
"Text received: 22 (1.41 seconds after request)\n",
"Text received: , (1.41 seconds after request)\n",
"Text received: 23 (1.41 seconds after request)\n",
"Text received: , (1.41 seconds after request)\n",
"Text received: 24 (1.46 seconds after request)\n",
"Text received: , (1.46 seconds after request)\n",
"Text received: 25 (1.46 seconds after request)\n",
"Text received: , (1.55 seconds after request)\n",
"Text received: 26 (1.61 seconds after request)\n",
"Text received: , (1.65 seconds after request)\n",
"Text received: 27 (1.66 seconds after request)\n",
"Text received: , (1.70 seconds after request)\n",
"Text received: 28 (1.72 seconds after request)\n",
"Text received: , (1.75 seconds after request)\n",
"Text received: 29 (1.78 seconds after request)\n",
"Text received: , (2.05 seconds after request)\n",
"Text received: 30 (2.08 seconds after request)\n",
"Text received: , (2.13 seconds after request)\n",
"Text received: 31 (2.16 seconds after request)\n",
"Text received: , (2.20 seconds after request)\n",
"Text received: 32 (2.26 seconds after request)\n",
"Text received: , (2.28 seconds after request)\n",
"Text received: 33 (2.31 seconds after request)\n",
"Text received: , (2.35 seconds after request)\n",
"Text received: 34 (2.38 seconds after request)\n",
"Text received: , (2.54 seconds after request)\n",
"Text received: 35 (2.55 seconds after request)\n",
"Text received: , (2.59 seconds after request)\n",
"Text received: 36 (2.61 seconds after request)\n",
"Text received: , (2.64 seconds after request)\n",
"Text received: 37 (2.67 seconds after request)\n",
"Text received: , (2.71 seconds after request)\n",
"Text received: 38 (2.86 seconds after request)\n",
"Text received: , (2.89 seconds after request)\n",
"Text received: 39 (2.92 seconds after request)\n",
"Text received: , (2.95 seconds after request)\n",
"Text received: 40 (2.99 seconds after request)\n",
"Text received: , (3.01 seconds after request)\n",
"Text received: 41 (3.04 seconds after request)\n",
"Text received: , (3.08 seconds after request)\n",
"Text received: 42 (3.15 seconds after request)\n",
"Text received: , (3.33 seconds after request)\n",
"Text received: 43 (3.36 seconds after request)\n",
"Text received: , (3.43 seconds after request)\n",
"Text received: 44 (3.47 seconds after request)\n",
"Text received: , (3.50 seconds after request)\n",
"Text received: 45 (3.53 seconds after request)\n",
"Text received: , (3.56 seconds after request)\n",
"Text received: 46 (3.59 seconds after request)\n",
"Text received: , (3.63 seconds after request)\n",
"Text received: 47 (3.65 seconds after request)\n",
"Text received: , (3.68 seconds after request)\n",
"Text received: 48 (3.71 seconds after request)\n",
"Text received: , (3.77 seconds after request)\n",
"Text received: 49 (3.77 seconds after request)\n",
"Text received: , (3.79 seconds after request)\n",
"Text received: 50 (3.82 seconds after request)\n",
"Text received: , (3.85 seconds after request)\n",
"Text received: 51 (3.89 seconds after request)\n",
"Text received: , (3.91 seconds after request)\n",
"Text received: 52 (3.93 seconds after request)\n",
"Text received: , (3.96 seconds after request)\n",
"Text received: 53 (3.98 seconds after request)\n",
"Text received: , (4.04 seconds after request)\n",
"Text received: 54 (4.05 seconds after request)\n",
"Text received: , (4.07 seconds after request)\n",
"Text received: 55 (4.10 seconds after request)\n",
"Text received: , (4.13 seconds after request)\n",
"Text received: 56 (4.19 seconds after request)\n",
"Text received: , (4.20 seconds after request)\n",
"Text received: 57 (4.20 seconds after request)\n",
"Text received: , (4.23 seconds after request)\n",
"Text received: 58 (4.26 seconds after request)\n",
"Text received: , (4.30 seconds after request)\n",
"Text received: 59 (4.31 seconds after request)\n",
"Text received: , (4.59 seconds after request)\n",
"Text received: 60 (4.61 seconds after request)\n",
"Text received: , (4.64 seconds after request)\n",
"Text received: 61 (4.67 seconds after request)\n",
"Text received: , (4.72 seconds after request)\n",
"Text received: 62 (4.73 seconds after request)\n",
"Text received: , (4.76 seconds after request)\n",
"Text received: 63 (4.80 seconds after request)\n",
"Text received: , (4.83 seconds after request)\n",
"Text received: 64 (4.86 seconds after request)\n",
"Text received: , (4.89 seconds after request)\n",
"Text received: 65 (4.92 seconds after request)\n",
"Text received: , (4.94 seconds after request)\n",
"Text received: 66 (4.97 seconds after request)\n",
"Text received: , (5.00 seconds after request)\n",
"Text received: 67 (5.03 seconds after request)\n",
"Text received: , (5.06 seconds after request)\n",
"Text received: 68 (5.09 seconds after request)\n",
"Text received: , (5.14 seconds after request)\n",
"Text received: 69 (5.16 seconds after request)\n",
"Text received: , (5.19 seconds after request)\n",
"Text received: 70 (5.22 seconds after request)\n",
"Text received: , (5.28 seconds after request)\n",
"Text received: 71 (5.30 seconds after request)\n",
"Text received: , (5.33 seconds after request)\n",
"Text received: 72 (5.36 seconds after request)\n",
"Text received: , (5.38 seconds after request)\n",
"Text received: 73 (5.41 seconds after request)\n",
"Text received: , (5.44 seconds after request)\n",
"Text received: 74 (5.48 seconds after request)\n",
"Text received: , (5.51 seconds after request)\n",
"Text received: 75 (5.53 seconds after request)\n",
"Text received: , (5.56 seconds after request)\n",
"Text received: 76 (5.60 seconds after request)\n",
"Text received: , (5.62 seconds after request)\n",
"Text received: 77 (5.65 seconds after request)\n",
"Text received: , (5.68 seconds after request)\n",
"Text received: 78 (5.71 seconds after request)\n",
"Text received: , (5.77 seconds after request)\n",
"Text received: 79 (5.77 seconds after request)\n",
"Text received: , (5.79 seconds after request)\n",
"Text received: 80 (5.82 seconds after request)\n",
"Text received: , (5.85 seconds after request)\n",
"Text received: 81 (5.88 seconds after request)\n",
"Text received: , (5.92 seconds after request)\n",
"Text received: 82 (5.93 seconds after request)\n",
"Text received: , (5.97 seconds after request)\n",
"Text received: 83 (5.98 seconds after request)\n",
"Text received: , (6.01 seconds after request)\n",
"Text received: 84 (6.04 seconds after request)\n",
"Text received: , (6.07 seconds after request)\n",
"Text received: 85 (6.09 seconds after request)\n",
"Text received: , (6.11 seconds after request)\n",
"Text received: 86 (6.14 seconds after request)\n",
"Text received: , (6.17 seconds after request)\n",
"Text received: 87 (6.19 seconds after request)\n",
"Text received: , (6.22 seconds after request)\n",
"Text received: 88 (6.24 seconds after request)\n",
"Text received: , (6.27 seconds after request)\n",
"Text received: 89 (6.30 seconds after request)\n",
"Text received: , (6.31 seconds after request)\n",
"Text received: 90 (6.35 seconds after request)\n",
"Text received: , (6.36 seconds after request)\n",
"Text received: 91 (6.40 seconds after request)\n",
"Text received: , (6.44 seconds after request)\n",
"Text received: 92 (6.46 seconds after request)\n",
"Text received: , (6.49 seconds after request)\n",
"Text received: 93 (6.51 seconds after request)\n",
"Text received: , (6.54 seconds after request)\n",
"Text received: 94 (6.56 seconds after request)\n",
"Text received: , (6.59 seconds after request)\n",
"Text received: 95 (6.62 seconds after request)\n",
"Text received: , (6.64 seconds after request)\n",
"Text received: 96 (6.68 seconds after request)\n",
"Text received: , (6.68 seconds after request)\n",
"Text received: 97 (6.70 seconds after request)\n",
"Text received: , (6.73 seconds after request)\n",
"Text received: 98 (6.75 seconds after request)\n",
"Text received: , (6.78 seconds after request)\n",
"Text received: 99 (6.90 seconds after request)\n",
"Text received: , (6.92 seconds after request)\n",
"Text received: 100 (7.25 seconds after request)\n",
"Full response received 7.25 seconds after request\n",
"Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n"
]
}
],
"source": [
"# Example of an OpenAI Completion request, using the stream=True option\n",
"# https://beta.openai.com/docs/api-reference/completions/create\n",
"\n",
"# record the time before the request is sent\n",
"start_time = time.time()\n",
"\n",
"# send a Completion request to count to 100\n",
"response = openai.Completion.create(\n",
" model='text-davinci-002',\n",
" prompt='1,2,3,',\n",
" max_tokens=193,\n",
" temperature=0,\n",
" stream=True, # this time, we set stream=True\n",
")\n",
"\n",
"# create variables to collect the stream of events\n",
"collected_events = []\n",
"completion_text = ''\n",
"# iterate through the stream of events\n",
"for event in response:\n",
" event_time = time.time() - start_time # calculate the time delay of the event\n",
" collected_events.append(event) # save the event response\n",
" event_text = event['choices'][0]['text'] # extract the text\n",
" completion_text += event_text # append the text\n",
" print(f\"Text received: {event_text} ({event_time:.2f} seconds after request)\") # print the delay and text\n",
"\n",
"# print the time delay and text received\n",
"print(f\"Full response received {event_time:.2f} seconds after request\")\n",
"print(f\"Full text received: {completion_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Time comparison\n",
"\n",
"In the example above, both requests took about 7 seconds to fully complete.\n",
"\n",
"However, with the streaming request, you would have received the first token after 0.16 seconds, and subsequent tokens after about ~0.035 seconds each."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because it is too large Load Diff

View File

@ -11,6 +11,14 @@
"We will combine the review summary and review text into a single combined text. The model will encode this combined text and it will output a single vector embedding."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"To run this notebook, you will need to install: pandas, openai, transformers, plotly, matplotlib, scikit-learn, torch (transformer dep), torchvision, and scipy."
]
},
{
"cell_type": "code",
"execution_count": 1,
@ -45,57 +53,48 @@
" <th>Text</th>\n",
" <th>combined</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1303862400</td>\n",
" <td>B001E4KFG0</td>\n",
" <td>A3SGXH7AUHU8GW</td>\n",
" <th>0</th>\n",
" <td>1351123200</td>\n",
" <td>B003XPF9BO</td>\n",
" <td>A3R7JR3FMEBXQB</td>\n",
" <td>5</td>\n",
" <td>Good Quality Dog Food</td>\n",
" <td>I have bought several of the Vitality canned d...</td>\n",
" <td>Title: Good Quality Dog Food; Content: I have ...</td>\n",
" <td>where does one start...and stop... with a tre...</td>\n",
" <td>Wanted to save some to bring to my Chicago fam...</td>\n",
" <td>Title: where does one start...and stop... wit...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1346976000</td>\n",
" <td>B00813GRG4</td>\n",
" <td>A1D87F6ZCVE5NK</td>\n",
" <th>1</th>\n",
" <td>1351123200</td>\n",
" <td>B003JK537S</td>\n",
" <td>A3JBPC3WFUT5ZP</td>\n",
" <td>1</td>\n",
" <td>Not as Advertised</td>\n",
" <td>Product arrived labeled as Jumbo Salted Peanut...</td>\n",
" <td>Title: Not as Advertised; Content: Product arr...</td>\n",
" <td>Arrived in pieces</td>\n",
" <td>Not pleased at all. When I opened the box, mos...</td>\n",
" <td>Title: Arrived in pieces; Content: Not pleased...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Time ProductId UserId Score Summary \\\n",
"Id \n",
"1 1303862400 B001E4KFG0 A3SGXH7AUHU8GW 5 Good Quality Dog Food \n",
"2 1346976000 B00813GRG4 A1D87F6ZCVE5NK 1 Not as Advertised \n",
" Time ProductId UserId Score \\\n",
"0 1351123200 B003XPF9BO A3R7JR3FMEBXQB 5 \n",
"1 1351123200 B003JK537S A3JBPC3WFUT5ZP 1 \n",
"\n",
" Summary \\\n",
"0 where does one start...and stop... with a tre... \n",
"1 Arrived in pieces \n",
"\n",
" Text \\\n",
"Id \n",
"1 I have bought several of the Vitality canned d... \n",
"2 Product arrived labeled as Jumbo Salted Peanut... \n",
"0 Wanted to save some to bring to my Chicago fam... \n",
"1 Not pleased at all. When I opened the box, mos... \n",
"\n",
" combined \n",
"Id \n",
"1 Title: Good Quality Dog Food; Content: I have ... \n",
"2 Title: Not as Advertised; Content: Product arr... "
"0 Title: where does one start...and stop... wit... \n",
"1 Title: Arrived in pieces; Content: Not pleased... "
]
},
"execution_count": 1,
@ -106,7 +105,8 @@
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv('input/Reviews.csv', index_col=0)\n",
"input_datapath = 'data/fine_food_reviews_1k.csv' # to save space, we provide a pre-filtered dataset\n",
"df = pd.read_csv(input_datapath, index_col=0)\n",
"df = df[['Time', 'ProductId', 'UserId', 'Score', 'Summary', 'Text']]\n",
"df = df.dropna()\n",
"df['combined'] = \"Title: \" + df.Summary.str.strip() + \"; Content: \" + df.Text.str.strip()\n",
@ -139,7 +139,7 @@
"\n",
"# remove reviews that are too long\n",
"df['n_tokens'] = df.combined.apply(lambda x: len(tokenizer.encode(x)))\n",
"df = df[df.n_tokens<2000].tail(1_000)\n",
"df = df[df.n_tokens<8000].tail(1_000)\n",
"len(df)"
]
},
@ -156,22 +156,22 @@
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"from openai.embeddings_utils import get_embedding\n",
"# Ensure you have your API key set in your environment per the README: https://github.com/openai/openai-python#usage\n",
"\n",
"# This will take just under 10 minutes\n",
"df['babbage_similarity'] = df.combined.apply(lambda x: get_embedding(x, engine='text-similarity-babbage-001'))\n",
"df['babbage_search'] = df.combined.apply(lambda x: get_embedding(x, engine='text-search-babbage-doc-001'))\n",
"df.to_csv('output/embedded_1k_reviews.csv')"
"# This will take just between 5 and 10 minutes\n",
"df['ada_similarity'] = df.combined.apply(lambda x: get_embedding(x, engine='text-embedding-ada-002'))\n",
"df['ada_search'] = df.combined.apply(lambda x: get_embedding(x, engine='text-embedding-ada-002'))\n",
"df.to_csv('data/fine_food_reviews_with_embeddings_1k.csv')"
]
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"name": "python3"
"display_name": "openai-cookbook",
"language": "python",
"name": "openai-cookbook"
},
"language_info": {
"codemirror_mode": {
@ -183,9 +183,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
"version": "3.9.6"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@ -9,12 +9,12 @@
"\n",
"Many use cases require GPT-3 to respond to user questions with insightful answers. For example, a customer support chatbot may need to provide answers to common questions. The GPT models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.\n",
"\n",
"In this notebook we will demonstrate a method for enabling GPT-3 able to answer questions using a library of text as a reference, by using document embeddings and retrieval. We'll be using a dataset of Wikipedia articles about the 2020 Summer Olympic Games. Please see [this notebook](examples/fine-tuned_qa/olympics-1-collect-data.ipynb) to follow the data gathering process."
"In this notebook we will demonstrate a method for enabling GPT-3 able to answer questions using a library of text as a reference, by using document embeddings and retrieval. We'll be using a dataset of Wikipedia articles about the 2020 Summer Olympic Games. Please see [this notebook](fine-tuned_qa/olympics-1-collect-data.ipynb) to follow the data gathering process."
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 1,
"id": "9e3839a6-9146-4f60-b74b-19abbc24278d",
"metadata": {},
"outputs": [],
@ -38,7 +38,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "a167516c-7c19-4bda-afa5-031aa0ae13bb",
"metadata": {},
"outputs": [
@ -48,7 +48,7 @@
"\"The 2020 Summer Olympics men's high jump was won by Mariusz Przybylski of Poland.\""
]
},
"execution_count": 5,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
@ -83,7 +83,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"id": "a5451371-17fe-4ef3-aa02-affcf4edb0e0",
"metadata": {},
"outputs": [
@ -93,7 +93,7 @@
"\"Sorry, I don't know.\""
]
},
"execution_count": 6,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@ -125,7 +125,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 4,
"id": "fceaf665-2602-4788-bc44-9eb256a6f955",
"metadata": {},
"outputs": [
@ -135,7 +135,7 @@
"\"Gianmarco Tamberi and Mutaz Essa Barshim won the 2020 Summer Olympics men's high jump.\""
]
},
"execution_count": 7,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@ -195,12 +195,12 @@
"\n",
"We plan to use document embeddings to fetch the most relevant part of parts of our document library and insert them into the prompt that we provide to GPT-3. We therefore need to break up the document library into \"sections\" of context, which can be searched and retrieved separately. \n",
"\n",
"Sections should be large enough to contain enough information to answer a question; but small enough to fit one or several into the GPT-3 prompt. We find that approximately a paragraph of text is usually a good length, but you should experiment for your particular use case. In this example, Wikipedia articles are already grouped into semantically related headers, so we will use these to define our sections. This preprocessing has already been done in [this notebook](examples/fine-tuned_qa/olympics-1-collect-data.ipynb), so we will load the results and use them."
"Sections should be large enough to contain enough information to answer a question; but small enough to fit one or several into the GPT-3 prompt. We find that approximately a paragraph of text is usually a good length, but you should experiment for your particular use case. In this example, Wikipedia articles are already grouped into semantically related headers, so we will use these to define our sections. This preprocessing has already been done in [this notebook](fine-tuned_qa/olympics-1-collect-data.ipynb), so we will load the results and use them."
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 5,
"id": "cc9c8d69-e234-48b4-87e3-935970e1523a",
"metadata": {},
"outputs": [
@ -245,33 +245,34 @@
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Volleyball at the 2020 Summer Olympics Women's tournament</th>\n",
" <th>Format</th>\n",
" <td>The preliminary round was a competition betwee...</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Rugby sevens at the 2020 Summer Olympics Women's qualification</th>\n",
" <th>South America</th>\n",
" <td>Sudamérica Rugby held a tournament on 12 June...</td>\n",
" <th>Nordic combined at the 2016 Winter Youth Olympics</th>\n",
" <th>Summary</th>\n",
" <td>Nordic combined at the 2016 Winter Youth Olymp...</td>\n",
" <td>56</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Canoeing at the 2020 Summer Olympics Men's slalom C-1</th>\n",
" <th>Competition format</th>\n",
" <td>Slalom canoeing uses a three-round format, wit...</td>\n",
" <td>136</td>\n",
" <th>Morocco at the 2020 Summer Olympics</th>\n",
" <th>Judo</th>\n",
" <td>Morocco qualified two female judoka for each o...</td>\n",
" <td>106</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Qualification</th>\n",
" <td>A National Olympic Committee (NOC) entered onl...</td>\n",
" <td>171</td>\n",
" <th>Guinea-Bissau at the 2020 Summer Olympics</th>\n",
" <th>Wrestling</th>\n",
" <td>Guinea-Bissau qualified two wrestlers for each...</td>\n",
" <td>69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tunisia at the 2020 Summer Olympics</th>\n",
" <th>Athletics</th>\n",
" <td>Tunisian athletes further achieved the entry s...</td>\n",
" <td>48</td>\n",
" <th>Rome bid for the 2020 Summer Olympics</th>\n",
" <th>History</th>\n",
" <td>The Italian National Olympic Committee (CONI) ...</td>\n",
" <td>738</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Italy at the 2020 Summer Olympics</th>\n",
" <th>Slalom</th>\n",
" <td>Italian canoeists qualified one boat for each ...</td>\n",
" <td>76</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
@ -280,22 +281,22 @@
"text/plain": [
" content \\\n",
"title heading \n",
"Volleyball at the 2020 Summer Olympics Women'... Format The preliminary round was a competition betwee... \n",
"Rugby sevens at the 2020 Summer Olympics Wome... South America Sudamérica Rugby held a tournament on 12 June... \n",
"Canoeing at the 2020 Summer Olympics Men's sl... Competition format Slalom canoeing uses a three-round format, wit... \n",
" Qualification A National Olympic Committee (NOC) entered onl... \n",
"Tunisia at the 2020 Summer Olympics Athletics Tunisian athletes further achieved the entry s... \n",
"Nordic combined at the 2016 Winter Youth Olympics Summary Nordic combined at the 2016 Winter Youth Olymp... \n",
"Morocco at the 2020 Summer Olympics Judo Morocco qualified two female judoka for each o... \n",
"Guinea-Bissau at the 2020 Summer Olympics Wrestling Guinea-Bissau qualified two wrestlers for each... \n",
"Rome bid for the 2020 Summer Olympics History The Italian National Olympic Committee (CONI) ... \n",
"Italy at the 2020 Summer Olympics Slalom Italian canoeists qualified one boat for each ... \n",
"\n",
" tokens \n",
"title heading \n",
"Volleyball at the 2020 Summer Olympics Women'... Format 132 \n",
"Rugby sevens at the 2020 Summer Olympics Wome... South America 56 \n",
"Canoeing at the 2020 Summer Olympics Men's sl... Competition format 136 \n",
" Qualification 171 \n",
"Tunisia at the 2020 Summer Olympics Athletics 48 "
"Nordic combined at the 2016 Winter Youth Olympics Summary 56 \n",
"Morocco at the 2020 Summer Olympics Judo 106 \n",
"Guinea-Bissau at the 2020 Summer Olympics Wrestling 69 \n",
"Rome bid for the 2020 Summer Olympics History 738 \n",
"Italy at the 2020 Summer Olympics Slalom 76 "
]
},
"execution_count": 8,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@ -315,16 +316,16 @@
"id": "a17b88b9-7ea2-491e-9727-12617c74a77d",
"metadata": {},
"source": [
"We preprocess the document sections by creating an embedding vector for each section. An embedding is a vector of numbers that helps us understand how semantically similar or different the texts are. The closer two embeddings are to each other, the more similar are their contents. See the [documentation on OpenAI embeddings](https://beta.api.openai.org/docs/guides/embeddings/) for more information.\n",
"We preprocess the document sections by creating an embedding vector for each section. An embedding is a vector of numbers that helps us understand how semantically similar or different the texts are. The closer two embeddings are to each other, the more similar are their contents. See the [documentation on OpenAI embeddings](https://beta.openai.com/docs/guides/embeddings) for more information.\n",
"\n",
"This indexing stage can be executed offline and only runs once to precompute the indexes for the dataset so that each piece of content can be retrieved later. Since this is a small example, we will store and search the embeddings locally. If you have a larger dataset, consider using a vector search engine like [Pinecone](https://www.pinecone.io/) or [Weaviate](https://github.com/semi-technologies/weaviate) to power the search.\n",
"\n",
"For the purposes of this tutorial we chose to use Curie embeddings, which are 4096-dimensional embeddings at a very good price and performance point. Since we will be using these embeddings for retrieval, well use the \"search\" embeddings (see the [documentation](https://beta.api.openai.org/docs/guides/embeddings/))."
"For the purposes of this tutorial we chose to use Curie embeddings, which are 4096-dimensional embeddings at a very good price and performance point. Since we will be using these embeddings for retrieval, well use the \"search\" embeddings (see the [documentation](https://beta.openai.com/docs/guides/embeddings))."
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 6,
"id": "4b874907-5109-4eef-ad9a-add4367925a3",
"metadata": {},
"outputs": [],
@ -337,7 +338,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 7,
"id": "ba475f30-ef7f-431c-b60d-d5970b62ad09",
"metadata": {},
"outputs": [],
@ -368,7 +369,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 8,
"id": "737266aa-cbe7-4691-87c1-fce8a31632f1",
"metadata": {},
"outputs": [],
@ -398,7 +399,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 9,
"id": "ab50bfca-cb02-41c6-b338-4400abe1d86e",
"metadata": {},
"outputs": [],
@ -412,7 +413,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 10,
"id": "b9a8c713-c8a9-47dc-85a4-871ee1395566",
"metadata": {},
"outputs": [
@ -446,7 +447,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 11,
"id": "dcd680e9-f194-4180-b14f-fc357498eb92",
"metadata": {},
"outputs": [],
@ -476,7 +477,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 12,
"id": "e3a27d73-f47f-480d-b336-079414f749cb",
"metadata": {},
"outputs": [
@ -495,7 +496,7 @@
" (\"Athletics at the 2020 Summer Olympics Women's long jump\", 'Summary'))]"
]
},
"execution_count": 15,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
@ -506,7 +507,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 13,
"id": "729c2ce7-8540-4ab2-bb3a-76c4dfcb689c",
"metadata": {},
"outputs": [
@ -525,7 +526,7 @@
" (\"Athletics at the 2020 Summer Olympics Women's pole vault\", 'Summary'))]"
]
},
"execution_count": 16,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@ -554,7 +555,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 14,
"id": "b763ace2-1946-48e0-8ff1-91ba335d47a0",
"metadata": {},
"outputs": [
@ -564,7 +565,7 @@
"'Context separator contains 3 tokens'"
]
},
"execution_count": 18,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
@ -581,7 +582,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 15,
"id": "0c5c0509-eeb9-4552-a5d4-6ace04ef73dd",
"metadata": {},
"outputs": [],
@ -618,7 +619,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 16,
"id": "f614045a-3917-4b28-9643-7e0c299ec1a7",
"metadata": {},
"outputs": [
@ -670,7 +671,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 17,
"id": "b0edfec7-9243-4573-92e0-253d31c771ad",
"metadata": {},
"outputs": [],
@ -685,7 +686,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 18,
"id": "9c1c9a69-848e-4099-a90d-c8da36c153d5",
"metadata": {},
"outputs": [],
@ -715,7 +716,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 19,
"id": "c233e449-bf33-4c9e-b095-6a4dd278c8fd",
"metadata": {},
"outputs": [
@ -735,7 +736,7 @@
"'Gianmarco Tamberi and Mutaz Essa Barshim emerged as joint winners of the event following a tie between both of them as they cleared 2.37m.'"
]
},
"execution_count": 24,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@ -760,7 +761,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 20,
"id": "1127867b-2884-44bb-9439-0e8ae171c835",
"metadata": {},
"outputs": [
@ -785,7 +786,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 21,
"id": "720d9e0b-b189-4101-91ee-babf736199e6",
"metadata": {},
"outputs": [
@ -810,7 +811,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 22,
"id": "4e8e51cc-e4eb-4557-9e09-2929d4df5b7f",
"metadata": {},
"outputs": [
@ -837,7 +838,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 23,
"id": "37c83519-e3c6-4c44-8b4a-98cbb3a5f5ba",
"metadata": {},
"outputs": [
@ -870,7 +871,7 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 24,
"id": "26a1a9ef-e1ee-4f80-a1b1-6164ccfa5bac",
"metadata": {},
"outputs": [
@ -897,7 +898,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 25,
"id": "9fba8a63-eb81-4661-ae17-59bb5e2933d6",
"metadata": {},
"outputs": [
@ -928,7 +929,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 26,
"id": "2d4c693b-cdb9-4f4c-bd1b-f77b29097a1f",
"metadata": {},
"outputs": [
@ -961,7 +962,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
@ -976,6 +977,11 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,

File diff suppressed because it is too large Load Diff

View File

@ -13,14 +13,14 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Babbage similarity embedding performance on 1k Amazon reviews: mse=0.38, mae=0.39\n"
"Ada similarity embedding performance on 1k Amazon reviews: mse=0.60, mae=0.51\n"
]
}
],
@ -32,46 +32,51 @@
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import mean_squared_error, mean_absolute_error\n",
"\n",
"df = pd.read_csv('output/embedded_1k_reviews.csv')\n",
"df['babbage_similarity'] = df.babbage_similarity.apply(eval).apply(np.array)\n",
"# If you have not run the \"Obtain_dataset.ipynb\" notebook, you can download the datafile from here: https://cdn.openai.com/API/examples/data/fine_food_reviews_with_embeddings_1k.csv\n",
"datafile_path = \"./data/fine_food_reviews_with_embeddings_1k.csv\"\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(list(df.babbage_similarity.values), df.Score, test_size = 0.2, random_state=42)\n",
"df = pd.read_csv(datafile_path)\n",
"df[\"ada_similarity\"] = df.ada_similarity.apply(eval).apply(np.array)\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(list(df.ada_similarity.values), df.Score, test_size=0.2, random_state=42)\n",
"\n",
"rfr = RandomForestRegressor(n_estimators=100)\n",
"rfr.fit(X_train, y_train)\n",
"preds = rfr.predict(X_test)\n",
"\n",
"\n",
"mse = mean_squared_error(y_test, preds)\n",
"mae = mean_absolute_error(y_test, preds)\n",
"\n",
"print(f\"Babbage similarity embedding performance on 1k Amazon reviews: mse={mse:.2f}, mae={mae:.2f}\")"
"print(f\"Ada similarity embedding performance on 1k Amazon reviews: mse={mse:.2f}, mae={mae:.2f}\")\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dummy mean prediction performance on Amazon reviews: mse=1.77, mae=1.04\n"
"Dummy mean prediction performance on Amazon reviews: mse=1.73, mae=1.03\n"
]
}
],
"source": [
"bmse = mean_squared_error(y_test, np.repeat(y_test.mean(), len(y_test)))\n",
"bmae = mean_absolute_error(y_test, np.repeat(y_test.mean(), len(y_test)))\n",
"print(f\"Dummy mean prediction performance on Amazon reviews: mse={bmse:.2f}, mae={bmae:.2f}\")"
"print(\n",
" f\"Dummy mean prediction performance on Amazon reviews: mse={bmse:.2f}, mae={bmae:.2f}\"\n",
")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the embeddings are able to predict the scores with an average error of 0.39 per score prediction. This is roughly equivalent to predicting 2 out of 3 reviews perfectly, and 1 out of three reviews by a one star error."
"We can see that the embeddings are able to predict the scores with an average error of 0.60 per score prediction. This is roughly equivalent to predicting 1 out of 3 reviews perfectly, and 1 out of two reviews by a one star error."
]
},
{
@ -83,12 +88,10 @@
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"name": "python3"
"display_name": "openai-cookbook",
"language": "python",
"name": "openai-cookbook"
},
"language_info": {
"codemirror_mode": {
@ -100,9 +103,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.6"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@ -18,9 +18,11 @@
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"# If you have not run the \"Obtain_dataset.ipynb\" notebook, you can download the datafile from here: https://cdn.openai.com/API/examples/data/fine_food_reviews_with_embeddings_1k.csv\n",
"datafile_path = \"./data/fine_food_reviews_with_embeddings_1k.csv\"\n",
"\n",
"df = pd.read_csv('output/embedded_1k_reviews.csv')\n",
"df['babbage_search'] = df.babbage_search.apply(eval).apply(np.array)"
"df = pd.read_csv(datafile_path)\n",
"df[\"ada_search\"] = df.ada_search.apply(eval).apply(np.array)\n"
]
},
{
@ -39,11 +41,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Jamaican Blue beans: Excellent coffee bean for roasting. Our family just purchased another 5 pounds for more roasting. Plenty of flavor and mild on acidity when roasted to a dark brown bean and befor\n",
"\n",
"Good Buy: I liked the beans. They were vacuum sealed, plump and moist. Would recommend them for any use. I personally split and stuck them in some vodka to make vanilla extract. Yum!\n",
"\n",
"Fantastic Instant Refried beans: Fantastic Instant Refried Beans have been a staple for my family now for nearly 20 years. All 7 of us love it and my grown kids are passing on the tradition.\n",
"Jamaican Blue beans: Excellent coffee bean for roasting. Our family just purchased another 5 pounds for more roasting. Plenty of flavor and mild on acidity when roasted to a dark brown bean and befor\n",
"\n",
"Delicious!: I enjoy this white beans seasoning, it gives a rich flavor to the beans I just love it, my mother in law didn't know about this Zatarain's brand and now she is traying different seasoning\n",
"\n"
]
}
@ -53,16 +55,26 @@
"\n",
"# search through the reviews for a specific product\n",
"def search_reviews(df, product_description, n=3, pprint=True):\n",
" embedding = get_embedding(product_description, engine='text-search-babbage-query-001')\n",
" df['similarities'] = df.babbage_search.apply(lambda x: cosine_similarity(x, embedding))\n",
" embedding = get_embedding(\n",
" product_description,\n",
" engine=\"text-embedding-ada-002\"\n",
" )\n",
" df[\"similarities\"] = df.ada_search.apply(lambda x: cosine_similarity(x, embedding))\n",
"\n",
" res = df.sort_values('similarities', ascending=False).head(n).combined.str.replace('Title: ','').str.replace('; Content:', ': ')\n",
" res = (\n",
" df.sort_values(\"similarities\", ascending=False)\n",
" .head(n)\n",
" .combined.str.replace(\"Title: \", \"\")\n",
" .str.replace(\"; Content:\", \": \")\n",
" )\n",
" if pprint:\n",
" for r in res:\n",
" print(r[:200])\n",
" print()\n",
" return res\n",
"res = search_reviews(df, 'delicious beans', n=3)\n"
"\n",
"\n",
"res = search_reviews(df, \"delicious beans\", n=3)\n"
]
},
{
@ -74,17 +86,17 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Rustichella ROCKS!: Anything this company makes is worthwhile eating! My favorite is their Trenne.<br />Their whole wheat pasta is the best I have ever had.\n",
"Tasty and Quick Pasta: Barilla Whole Grain Fusilli with Vegetable Marinara is tasty and has an excellent chunky vegetable marinara. I just wish there was more of it. If you aren't starving or on a \n",
"\n",
"sooo good: tastes so good. Worth the money. My boyfriend hates wheat pasta and LOVES this. cooks fast tastes great.I love this brand and started buying more of their pastas. Bulk is best.\n",
"\n",
"Wonderful: Came quickly. Was plentiful and delicious and cheaper than in the store. You will enjoy it if you like thick pasta.\n",
"Handy: Love the idea of ready in a minute pasta and for that alone this product gets praise. The pasta is whole grain so that's a big plus and it actually comes out al dente. The vegetable marinara\n",
"\n"
]
}
],
"source": [
"res = search_reviews(df, 'whole wheat pasta', n=3)"
"res = search_reviews(df, \"whole wheat pasta\", n=3)"
]
},
{
@ -109,7 +121,7 @@
}
],
"source": [
"res = search_reviews(df, 'bad delivery', n=1)"
"res = search_reviews(df, \"bad delivery\", n=1)"
]
},
{
@ -134,7 +146,7 @@
}
],
"source": [
"res = search_reviews(df, 'spoilt', n=1)"
"res = search_reviews(df, \"spoilt\", n=1)"
]
},
{
@ -148,23 +160,21 @@
"text": [
"Good food: The only dry food my queen cat will eat. Helps prevent hair balls. Good packaging. Arrives promptly. Recommended by a friend who sells pet food.\n",
"\n",
"A great deal on Greenies: Paid only $22 with free shipping for 96 teenies compared to about $35 at the pet store. How can you go wrong with a deal like that? The dog begs for his daily Greenie. Got \n",
"The cats like it: My 7 cats like this food but it is a little yucky for the human. Pieces of mackerel swimming in a dark broth. It is billed as a \"complete\" food and contains carrots, peas and pasta.\n",
"\n"
]
}
],
"source": [
"res = search_reviews(df, 'pet food', n=2)"
"res = search_reviews(df, \"pet food\", n=2)"
]
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"name": "python3"
"display_name": "openai-cookbook",
"language": "python",
"name": "openai-cookbook"
},
"language_info": {
"codemirror_mode": {
@ -176,9 +186,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.6"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@ -0,0 +1,452 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unit test writing using a multi-step prompt\n",
"\n",
"Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT-3 and then feeds that text back into subsequent prompts. This can help in cases where you want GPT-3 to explain its reasoning before answering, or brainstorm a plan before executing it.\n",
"\n",
"In this notebook, we use a 3-step prompt to write unit tests in Python using the following steps:\n",
"\n",
"1. Given a Python function, we first prompt GPT-3 to explain what the function is doing.\n",
"2. Second, we prompt GPT-3 to plan a set of unit tests for the function.\n",
" - If the plan is too short, we ask GPT-3 to elaborate with more ideas for unit tests.\n",
"3. Finally, we prompt GPT-3 to write the unit tests.\n",
"\n",
"The code example illustrates a few optional embellishments on the chained, multi-step prompt:\n",
"\n",
"- Conditional branching (e.g., only asking for elaboration if the first plan is too short)\n",
"- Different models for different steps (e.g., `text-davinci-002` for the text planning steps and `code-davinci-002` for the code writing step)\n",
"- A check that re-runs the function if the output is unsatisfactory (e.g., if the output code cannot be parsed by Python's `ast` module)\n",
"- Streaming output so that you can start reading the output before it's fully generated (useful for long, multi-step outputs)\n",
"\n",
"The full 3-step prompt looks like this (using as an example `pytest` for the unit test framework and `is_palindrome` as the function):\n",
"\n",
" # How to write great unit tests with pytest\n",
"\n",
" In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n",
" ```python\n",
" def is_palindrome(s):\n",
" return s == s[::-1]\n",
" ```\n",
"\n",
" Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
" - First,{GENERATED IN STEP 1}\n",
" \n",
" A good unit test suite should aim to:\n",
" - Test the function's behavior for a wide range of possible inputs\n",
" - Test edge cases that the author may not have foreseen\n",
" - Take advantage of the features of `pytest` to make the tests easy to write and maintain\n",
" - Be easy to read and understand, with clean code and descriptive names\n",
" - Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
" `pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
" For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
" -{GENERATED IN STEP 2}\n",
"\n",
" [OPTIONALLY APPENDED]In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
" -{GENERATED IN STEP 2B}\n",
"\n",
" Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
" ```python\n",
" import pytest # used for our unit tests\n",
"\n",
" def is_palindrome(s):\n",
" return s == s[::-1]\n",
"\n",
" #Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n",
" {GENERATED IN STEP 3}"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# imports needed to run the code in this notebook\n",
"import ast # used for detecting whether generated Python code is valid\n",
"import openai # used for calling the OpenAI API\n",
"\n",
"# example of a function that uses a multi-step prompt to write unit tests\n",
"def unit_test_from_function(\n",
" function_to_test: str, # Python function to test, as a string\n",
" unit_test_package: str = \"pytest\", # unit testing package; use the name as it appears in the import statement\n",
" approx_min_cases_to_cover: int = 7, # minimum number of test case categories to cover (approximate)\n",
" print_text: bool = False, # optionally prints text; helpful for understanding the function & debugging\n",
" text_model: str = \"text-davinci-002\", # model used to generate text plans in steps 1, 2, and 2b\n",
" code_model: str = \"code-davinci-002\", # if you don't have access to code models, you can use text models here instead\n",
" max_tokens: int = 1000, # can set this high, as generations should be stopped earlier by stop sequences\n",
" temperature: float = 0.4, # temperature = 0 can sometimes get stuck in repetitive loops, so we use 0.4\n",
" reruns_if_fail: int = 1, # if the output code cannot be parsed, this will re-run the function up to N times\n",
") -> str:\n",
" \"\"\"Outputs a unit test for a given Python function, using a 3-step GPT-3 prompt.\"\"\"\n",
"\n",
" # Step 1: Generate an explanation of the function\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete an explanation of the function, formatted as a bullet list\n",
" prompt_to_explain_the_function = f\"\"\"# How to write great unit tests with {unit_test_package}\n",
"\n",
"In this advanced tutorial for experts, we'll use Python 3.9 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.\n",
"```python\n",
"{function_to_test}\n",
"```\n",
"\n",
"Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
"- First,\"\"\"\n",
" if print_text:\n",
" text_color_prefix = \"\\033[30m\" # black; if you read against a dark background \\033[97m is white\n",
" print(text_color_prefix + prompt_to_explain_the_function, end=\"\") # end='' prevents a newline from being printed\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" explanation_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=prompt_to_explain_the_function,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" explanation_completion = \"\"\n",
" if print_text:\n",
" completion_color_prefix = \"\\033[92m\" # green\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in explanation_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" explanation_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # Step 2: Generate a plan to write a unit test\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete a plan for writing unit tests, formatted as a bullet list\n",
" prompt_to_explain_a_plan = f\"\"\"\n",
" \n",
"A good unit test suite should aim to:\n",
"- Test the function's behavior for a wide range of possible inputs\n",
"- Test edge cases that the author may not have foreseen\n",
"- Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain\n",
"- Be easy to read and understand, with clean code and descriptive names\n",
"- Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
"`{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
"For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
"-\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_explain_a_plan, end=\"\")\n",
"\n",
" # append this planning prompt to the results from step 1\n",
" prior_text = prompt_to_explain_the_function + explanation_completion\n",
" full_plan_prompt = prior_text + prompt_to_explain_a_plan\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" plan_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=full_plan_prompt,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" plan_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in plan_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" plan_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # Step 2b: If the plan is short, ask GPT-3 to elaborate further\n",
" # this counts top-level bullets (e.g., categories), but not sub-bullets (e.g., test cases)\n",
" elaboration_needed = plan_completion.count(\"\\n-\") +1 < approx_min_cases_to_cover # adds 1 because the first bullet is not counted\n",
" if elaboration_needed:\n",
" prompt_to_elaborate_on_the_plan = f\"\"\"\n",
"\n",
"In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
"-\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_elaborate_on_the_plan, end=\"\")\n",
"\n",
" # append this elaboration prompt to the results from step 2\n",
" prior_text = full_plan_prompt + plan_completion\n",
" full_elaboration_prompt = prior_text + prompt_to_elaborate_on_the_plan\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" elaboration_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=full_elaboration_prompt,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" elaboration_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in elaboration_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" elaboration_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # Step 3: Generate the unit test\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete a unit test\n",
" starter_comment = \"\"\n",
" if unit_test_package == \"pytest\":\n",
" starter_comment = \"Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\"\n",
" prompt_to_generate_the_unit_test = f\"\"\"\n",
"\n",
"Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
"```python\n",
"import {unit_test_package} # used for our unit tests\n",
"\n",
"{function_to_test}\n",
"\n",
"#{starter_comment}\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_generate_the_unit_test, end=\"\")\n",
"\n",
" # append this unit test prompt to the results from step 3\n",
" if elaboration_needed:\n",
" prior_text = full_elaboration_prompt + elaboration_completion\n",
" else:\n",
" prior_text = full_plan_prompt + plan_completion\n",
" full_unit_test_prompt = prior_text + prompt_to_generate_the_unit_test\n",
"\n",
" # send the prompt to the API, using ``` as a stop sequence to stop at the end of the code block\n",
" unit_test_response = openai.Completion.create(\n",
" model=code_model,\n",
" prompt=full_unit_test_prompt,\n",
" stop=\"```\",\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True\n",
" )\n",
" unit_test_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in unit_test_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" unit_test_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # check the output for errors\n",
" code_start_index = prompt_to_generate_the_unit_test.find(\"```python\\n\") + len(\"```python\\n\")\n",
" code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_completion\n",
" try:\n",
" ast.parse(code_output)\n",
" except SyntaxError as e:\n",
" print(f\"Syntax error in generated code: {e}\")\n",
" if reruns_if_fail > 0:\n",
" print(\"Rerunning...\")\n",
" return unit_test_from_function(\n",
" function_to_test=function_to_test,\n",
" unit_test_package=unit_test_package,\n",
" approx_min_cases_to_cover=approx_min_cases_to_cover,\n",
" print_text=print_text,\n",
" text_model=text_model,\n",
" code_model=code_model,\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" reruns_if_fail=reruns_if_fail-1, # decrement rerun counter when calling again\n",
" )\n",
"\n",
" # return the unit test as a string\n",
" return unit_test_completion\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[30m# How to write great unit tests with pytest\n",
"\n",
"In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n",
"```python\n",
"def is_palindrome(s):\n",
" return s == s[::-1]\n",
"```\n",
"\n",
"Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
"- First,\u001b[92m we have a function definition. This is where we give the function a name, `is_palindrome`, and specify the arguments that the function accepts. In this case, the function accepts a single string argument, `s`.\n",
"- Next, we have a return statement. This is where we specify the value that the function returns. In this case, the function returns `s == s[::-1]`.\n",
"- Finally, we have a function call. This is where we actually call the function with a specific set of arguments. In this case, we're calling the function with the string `\"racecar\"`.\u001b[30m\n",
" \n",
"A good unit test suite should aim to:\n",
"- Test the function's behavior for a wide range of possible inputs\n",
"- Test edge cases that the author may not have foreseen\n",
"- Take advantage of the features of `pytest` to make the tests easy to write and maintain\n",
"- Be easy to read and understand, with clean code and descriptive names\n",
"- Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
"`pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
"For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
"-\u001b[92m The input is a palindrome\n",
" - `\"racecar\"`\n",
" - `\"madam\"`\n",
" - `\"anna\"`\n",
"- The input is not a palindrome\n",
" - `\"python\"`\n",
" - `\"test\"`\n",
" - `\"1234\"`\n",
"- The input is an empty string\n",
" - `\"\"`\n",
"- The input is `None`\n",
"- The input is not a string\n",
" - `1`\n",
" - `1.0`\n",
" - `True`\n",
" - `False`\n",
" - `[]`\n",
" - `{}`\u001b[30m\n",
"\n",
"In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
"-\u001b[92m The input is a palindrome with spaces\n",
" - `\"race car\"`\n",
" - `\" madam \"`\n",
" - `\" anna \"`\n",
"- The input is not a palindrome with spaces\n",
" - `\" python \"`\n",
" - `\" test \"`\n",
" - `\" 1234 \"`\n",
"- The input is a palindrome with punctuation\n",
" - `\"racecar!\"`\n",
" - `\"Madam, I'm Adam.\"`\n",
" - `\"Anna's\"`\n",
"- The input is not a palindrome with punctuation\n",
" - `\"python!\"`\n",
" - `\"test.\"`\n",
" - `\"1234!\"`\n",
"- The input is a palindrome with mixed case\n",
" - `\"Racecar\"`\n",
" - `\"Madam\"`\n",
" - `\"Anna\"`\n",
"- The input is not a palindrome with mixed case\n",
" - `\"Python\"`\n",
" - `\"Test\"`\n",
" - `\"1234\"`\u001b[30m\n",
"\n",
"Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
"```python\n",
"import pytest # used for our unit tests\n",
"\n",
"def is_palindrome(s):\n",
" return s == s[::-1]\n",
"\n",
"#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\u001b[92m.\n",
"#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\n",
"#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\n",
"#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\n",
"#The generated test function will be given the arguments specified in the list of arguments for the test case.\n",
"#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\n",
"#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\n",
"@pytest.mark.parametrize(\n",
" \"name,args,expected\",\n",
" [\n",
" # Test the function's behavior for a wide range of possible inputs\n",
" (\"palindrome\", [\"racecar\"], True),\n",
" (\"palindrome\", [\"madam\"], True),\n",
" (\"palindrome\", [\"anna\"], True),\n",
" (\"non-palindrome\", [\"python\"], False),\n",
" (\"non-palindrome\", [\"test\"], False),\n",
" (\"non-palindrome\", [\"1234\"], False),\n",
" (\"empty string\", [\"\"], True),\n",
" (\"None\", [None], False),\n",
" (\"non-string\", [1], False),\n",
" (\"non-string\", [1.0], False),\n",
" (\"non-string\", [True], False),\n",
" (\"non-string\", [False], False),\n",
" (\"non-string\", [[]], False),\n",
" (\"non-string\", [{}], False),\n",
" # Test edge cases that the author may not have foreseen\n",
" (\"palindrome with spaces\", [\"race car\"], True),\n",
" (\"palindrome with spaces\", [\" madam \"], True),\n",
" (\"palindrome with spaces\", [\" anna \"], True),\n",
" (\"non-palindrome with spaces\", [\" python \"], False),\n",
" (\"non-palindrome with spaces\", [\" test \"], False),\n",
" (\"non-palindrome with spaces\", [\" 1234 \"], False),\n",
" (\"palindrome with punctuation\", [\"racecar!\"], True),\n",
" (\"palindrome with punctuation\", [\"Madam, I'm Adam.\"], True),\n",
" (\"palindrome with punctuation\", [\"Anna's\"], True),\n",
" (\"non-palindrome with punctuation\", [\"python!\"], False),\n",
" (\"non-palindrome with punctuation\", [\"test.\"], False),\n",
" (\"non-palindrome with punctuation\", [\"1234!\"], False),\n",
" (\"palindrome with mixed case\", [\"Racecar\"], True),\n",
" (\"palindrome with mixed case\", [\"Madam\"], True),\n",
" (\"palindrome with mixed case\", [\"Anna\"], True),\n",
" (\"non-palindrome with mixed case\", [\"Python\"], False),\n",
" (\"non-palindrome with mixed case\", [\"Test\"], False),\n",
" (\"non-palindrome with mixed case\", [\"1234\"], False),\n",
" ],\n",
")\n",
"def test_is_palindrome(is_palindrome, args, expected):\n",
" assert is_palindrome(*args) == expected\n"
]
},
{
"data": {
"text/plain": [
"'.\\n#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\\n#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\\n#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\\n#The generated test function will be given the arguments specified in the list of arguments for the test case.\\n#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\\n#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\\n@pytest.mark.parametrize(\\n \"name,args,expected\",\\n [\\n # Test the function\\'s behavior for a wide range of possible inputs\\n (\"palindrome\", [\"racecar\"], True),\\n (\"palindrome\", [\"madam\"], True),\\n (\"palindrome\", [\"anna\"], True),\\n (\"non-palindrome\", [\"python\"], False),\\n (\"non-palindrome\", [\"test\"], False),\\n (\"non-palindrome\", [\"1234\"], False),\\n (\"empty string\", [\"\"], True),\\n (\"None\", [None], False),\\n (\"non-string\", [1], False),\\n (\"non-string\", [1.0], False),\\n (\"non-string\", [True], False),\\n (\"non-string\", [False], False),\\n (\"non-string\", [[]], False),\\n (\"non-string\", [{}], False),\\n # Test edge cases that the author may not have foreseen\\n (\"palindrome with spaces\", [\"race car\"], True),\\n (\"palindrome with spaces\", [\" madam \"], True),\\n (\"palindrome with spaces\", [\" anna \"], True),\\n (\"non-palindrome with spaces\", [\" python \"], False),\\n (\"non-palindrome with spaces\", [\" test \"], False),\\n (\"non-palindrome with spaces\", [\" 1234 \"], False),\\n (\"palindrome with punctuation\", [\"racecar!\"], True),\\n (\"palindrome with punctuation\", [\"Madam, I\\'m Adam.\"], True),\\n (\"palindrome with punctuation\", [\"Anna\\'s\"], True),\\n (\"non-palindrome with punctuation\", [\"python!\"], False),\\n (\"non-palindrome with punctuation\", [\"test.\"], False),\\n (\"non-palindrome with punctuation\", [\"1234!\"], False),\\n (\"palindrome with mixed case\", [\"Racecar\"], True),\\n (\"palindrome with mixed case\", [\"Madam\"], True),\\n (\"palindrome with mixed case\", [\"Anna\"], True),\\n (\"non-palindrome with mixed case\", [\"Python\"], False),\\n (\"non-palindrome with mixed case\", [\"Test\"], False),\\n (\"non-palindrome with mixed case\", [\"1234\"], False),\\n ],\\n)\\ndef test_is_palindrome(is_palindrome, args, expected):\\n assert is_palindrome(*args) == expected\\n'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_function = \"\"\"def is_palindrome(s):\n",
" return s == s[::-1]\"\"\"\n",
"\n",
"unit_test_from_function(example_function, print_text=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -39,7 +39,7 @@
"import numpy as np\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"df = pd.read_csv('output/embedded_babbage_similarity_50k.csv', index_col=0)\n",
"df = pd.read_csv('output/embedded_babbage_similarity_50k.csv', index_col=0) # note that you will need to generate this file to run the code below\n",
"df['babbage_similarity'] = df.babbage_similarity.apply(eval).apply(np.array)\n",
"X_train, X_test, y_train, y_test = train_test_split(df, df.Score, test_size = 0.2, random_state=42)\n",
"\n",

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,196 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure embeddings example\n",
"In this example we'll try to go over all operations for embeddings that can be done using the Azure endpoints. \\\n",
"This example focuses on finetuning but touches on the majority of operations that are also available using the API. This example is meant to be a quick way of showing simple operations and is not meant as a tutorial."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"from openai import cli"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"In the following section the endpoint and key need to be set up of the next sections to work. \\\n",
"Please go to https://portal.azure.com, find your resource and then under \"Resource Management\" -> \"Keys and Endpoints\" look for the \"Endpoint\" value and one of the Keys. They will act as api_base and api_key in the code below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"openai.api_key = '' # Please add your api key here\n",
"openai.api_base = '' # Please add your endpoint here\n",
"\n",
"openai.api_type = 'azure'\n",
"openai.api_version = '2022-03-01-preview' # this may change in the future"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deployments\n",
"In this section we are going to create a deployment using the finetune model that we just adapted and then used the deployment to create a simple completion operation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deployments: Create Manually\n",
"Let's create a deployment using the text-similarity-curie-001 engine. You can create a new deployment by going to your Resource in your portal under \"Resource Management\" -> \"Deployments\"."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### (Optional) Deployments: Create Programatically\n",
"We can also create a deployment using code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = \"text-similarity-curie-001\"\n",
"\n",
"# Now let's create the deployment\n",
"print(f'Creating a new deployment with model: {model}')\n",
"result = openai.Deployment.create(model=model, scale_settings={\"scale_type\":\"standard\"})\n",
"deployment_id = result[\"id\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### (Optional) Deployments: Retrieving\n",
"Now let's check the status of the newly created deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Checking for deployment status.')\n",
"resp = openai.Deployment.retrieve(id=deployment_id)\n",
"status = resp[\"status\"]\n",
"print(f'Deployment {deployment_id} is with status: {status}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deployments: Listing\n",
"Now because creating a new deployment takes a long time, let's look in the subscription for an already finished deployment that succeeded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('While deployment running, selecting a completed one.')\n",
"deployment_id = None\n",
"result = openai.Deployment.list()\n",
"for deployment in result.data:\n",
" if deployment[\"status\"] == \"succeeded\":\n",
" deployment_id = deployment[\"id\"]\n",
" break\n",
"\n",
"if not deployment_id:\n",
" print('No deployment with status: succeeded found.')\n",
"else:\n",
" print(f'Found a successful deployment with id: {deployment_id}.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Embeddings\n",
"Now let's send a sample embedding to the deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = openai.Embedding.create(deployment_id=deployment_id,\n",
" input=\"The food was delicious and the waiter...\")\n",
" \n",
"print(embeddings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### (Optional) Deployments: Delete\n",
"Finally let's delete the deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Deleting deployment: {deployment_id}')\n",
"openai.Deployment.delete(sid=deployment_id)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,475 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure Fine tuning example\n",
"In this example we'll try to go over all operations that can be done using the Azure endpoints and their differences with the openAi endpoints (if any).<br>\n",
"This example focuses on finetuning but touches on the majority of operations that are also available using the API. This example is meant to be a quick way of showing simple operations and is not meant as a finetune model adaptation tutorial.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"from openai import cli"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"In the following section the endpoint and key need to be set up of the next sections to work.<br> Please go to https://portal.azure.com, find your resource and then under \"Resource Management\" -> \"Keys and Endpoints\" look for the \"Endpoint\" value and one of the Keys. They will act as api_base and api_key in the code below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"openai.api_key = '' # Please add your api key here\n",
"openai.api_base = '' # Please add your endpoint here\n",
"\n",
"openai.api_type = 'azure'\n",
"openai.api_version = '2022-03-01-preview' # this may change in the future"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Microsoft Active Directory Authentication\n",
"Instead of key based authentication, you can use Active Directory to authenticate using credential tokens. Uncomment the next code section to use credential based authentication:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"from azure.identity import DefaultAzureCredential\n",
"\n",
"default_credential = DefaultAzureCredential()\n",
"token = default_credential.get_token(\"https://cognitiveservices.azure.com/.default\")\n",
"\n",
"openai.api_type = 'azure_ad'\n",
"openai.api_key = token.token\n",
"openai.api_version = '2022-03-01-preview' # this may change in the future\n",
"\n",
"\n",
"openai.api_base = '' # Please add your endpoint here\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Files\n",
"In the next section we will focus on the files operations: importing, listing, retrieving, deleting. For this we need to create 2 temporary files with some sample data. For the sake of simplicity, we will use the same data for training and validation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"import json\n",
"\n",
"training_file_name = 'training.jsonl'\n",
"validation_file_name = 'validation.jsonl'\n",
"\n",
"sample_data = [{\"prompt\": \"When I go to the store, I want an\", \"completion\": \"apple\"},\n",
" {\"prompt\": \"When I go to work, I want a\", \"completion\": \"coffe\"},\n",
" {\"prompt\": \"When I go home, I want a\", \"completion\": \"soda\"}]\n",
"\n",
"print(f'Generating the training file: {training_file_name}')\n",
"with open(training_file_name, 'w') as training_file:\n",
" for entry in sample_data:\n",
" json.dump(entry, training_file)\n",
" training_file.write('\\n')\n",
"\n",
"print(f'Copying the training file to the validation file')\n",
"shutil.copy(training_file_name, validation_file_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Files: Listing\n",
"List all of the uploaded files and check for the ones that are named \"training.jsonl\" or \"validation.jsonl\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('Checking for existing uploaded files.')\n",
"results = []\n",
"files = openai.File.list().data\n",
"print(f'Found {len(files)} total uploaded files in the subscription.')\n",
"for item in files:\n",
" if item[\"filename\"] in [training_file_name, validation_file_name]:\n",
" results.append(item[\"id\"])\n",
"print(f'Found {len(results)} already uploaded files that match our names.')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Files: Deleting\n",
"Let's now delete those found files (if any) since we're going to be re-uploading them next."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Deleting already uploaded files.')\n",
"for id in results:\n",
" openai.File.delete(sid = id)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Files: Importing & Retrieving\n",
"Now, let's import our two files ('training.jsonl' and 'validation.jsonl') and keep those IDs since we're going to use them later for finetuning.<br>\n",
"For this operation we are going to use the cli wrapper which does a bit more checks before uploading and also gives us progress. In addition, after uploading we're going to check the status our import until it has succeeded (or failed if something goes wrong)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"def check_status(training_id, validation_id):\n",
" train_status = openai.File.retrieve(training_id)[\"status\"]\n",
" valid_status = openai.File.retrieve(validation_id)[\"status\"]\n",
" print(f'Status (training_file | validation_file): {train_status} | {valid_status}')\n",
" return (train_status, valid_status)\n",
"\n",
"#importing our two files\n",
"training_id = cli.FineTune._get_or_upload(training_file_name, True)\n",
"validation_id = cli.FineTune._get_or_upload(validation_file_name, True)\n",
"\n",
"#checking the status of the imports\n",
"(train_status, valid_status) = check_status(training_id, validation_id)\n",
"\n",
"while train_status not in [\"succeeded\", \"failed\"] or valid_status not in [\"succeeded\", \"failed\"]:\n",
" time.sleep(1)\n",
" (train_status, valid_status) = check_status(training_id, validation_id)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Files: Downloading\n",
"Now let's download one of the files, the training file for example, to check that everything was in order during importing and all bits are there."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Downloading training file: {training_id}')\n",
"result = openai.File.download(training_id)\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finetune\n",
"In this section we are going to use the two training and validation files that we imported in the previous section, to train a finetune model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finetune: Adapt\n",
"First let's create the finetune adaptation job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_args = {\n",
" \"training_file\": training_id,\n",
" \"validation_file\": validation_id,\n",
" \"model\": \"curie\",\n",
" \"compute_classification_metrics\": True,\n",
" \"classification_n_classes\": 3\n",
"}\n",
"resp = openai.FineTune.create(**create_args)\n",
"job_id = resp[\"id\"]\n",
"status = resp[\"status\"]\n",
"\n",
"print(f'Fine-tunning model with jobID: {job_id}.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finetune: Streaming\n",
"While the job runs, we can subscribe to the streaming events to check the progress of the operation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import signal\n",
"import datetime\n",
"\n",
"def signal_handler(sig, frame):\n",
" status = openai.FineTune.retrieve(job_id).status\n",
" print(f\"Stream interrupted. Job is still {status}.\")\n",
" return\n",
"\n",
"print('Streaming events for the fine-tuning job: {job_id}')\n",
"signal.signal(signal.SIGINT, signal_handler)\n",
"\n",
"events = openai.FineTune.stream_events(job_id)\n",
"try:\n",
" for event in events:\n",
" print(f'{datetime.datetime.fromtimestamp(event[\"created_at\"])} {event[\"message\"]}')\n",
"\n",
"except Exception:\n",
" print(\"Stream interrupted (client disconnected).\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finetune: Listing and Retrieving\n",
"Now let's check that our operation was successful and in addition we can look at all of the finetuning operations using a list operation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"status = openai.FineTune.retrieve(id=job_id)[\"status\"]\n",
"if status not in [\"succeeded\", \"failed\"]:\n",
" print(f'Job not in terminal status: {status}. Waiting.')\n",
" while status not in [\"succeeded\", \"failed\"]:\n",
" time.sleep(2)\n",
" status = openai.FineTune.retrieve(id=job_id)[\"status\"]\n",
" print(f'Status: {status}')\n",
"else:\n",
" print(f'Finetune job {job_id} finished with status: {status}')\n",
"\n",
"print('Checking other finetune jobs in the subscription.')\n",
"result = openai.FineTune.list()\n",
"print(f'Found {len(result)} finetune jobs.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finetune: Deleting\n",
"Finally we can delete our finetune job.<br>\n",
"WARNING: Please skip this step if you want to continue with the next section as the finetune model is needed. (The delete code is commented out by default)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# openai.FineTune.delete(sid=job_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deployments\n",
"In this section we are going to create a deployment using the finetune model that we just adapted and then used the deployment to create a simple completion operation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deployments: Create\n",
"Let's create a deployment using the fine-tune model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Fist let's get the model of the previous job:\n",
"result = openai.FineTune.retrieve(id=job_id)\n",
"if result[\"status\"] == 'succeeded':\n",
" model = result[\"fine_tuned_model\"]\n",
"\n",
"# Now let's create the deployment\n",
"print(f'Creating a new deployment with model: {model}')\n",
"result = openai.Deployment.create(model=model, scale_settings={\"scale_type\":\"standard\"})\n",
"deployment_id = result[\"id\"]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deployments: Retrieving\n",
"Now let's check the status of the newly created deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Checking for deployment status.')\n",
"resp = openai.Deployment.retrieve(id=deployment_id)\n",
"status = resp[\"status\"]\n",
"print(f'Deployment {deployment_id} is with status: {status}')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deployments: Listing\n",
"Now because creating a new deployment takes a long time, let's look in the subscription for an already finished deployment that succeeded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('While deployment running, selecting a completed one.')\n",
"deployment_id = None\n",
"result = openai.Deployment.list()\n",
"for deployment in result.data:\n",
" if deployment[\"status\"] == \"succeeded\":\n",
" deployment_id = deployment[\"id\"]\n",
" break\n",
"\n",
"if not deployment_id:\n",
" print('No deployment with status: succeeded found.')\n",
"else:\n",
" print(f'Found a successful deployment with id: {deployment_id}.')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Completions\n",
"Now let's send a sample completion to the deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('Sending a test completion job')\n",
"start_phrase = 'When I go to the store, I want a'\n",
"response = openai.Completion.create(deployment_id=deployment_id, prompt=start_phrase, max_tokens=4)\n",
"text = response['choices'][0]['text'].replace('\\n', '').replace(' .', '.').strip()\n",
"print(f'\"{start_phrase} {text}\"')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deployments: Delete\n",
"Finally let's delete the deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f'Deleting deployment: {deployment_id}')\n",
"openai.Deployment.delete(sid=deployment_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thank you"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 64-bit ('3.9.9')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "cb9817b186a29e4e9713184d901f26c1ee05ad25243d878baff7f31bb1fef480"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,362 @@
Date,Supplier,Description,Transaction value (<28>)
21/04/2016,M & J Ballantyne Ltd,George IV Bridge Work,35098
26/04/2016,Private Sale,Literary & Archival Items,30000
30/04/2016,City Of Edinburgh Council,Non Domestic Rates ,40800
09/05/2016,Computacenter Uk,Kelvin Hall,72835
09/05/2016,John Graham Construction Ltd,Causewayside Refurbishment,64361
09/05/2016,A McGillivray,Causewayside Refurbishment,53690
16/05/2016,John Graham Construction Ltd,Causewayside Refurbishment,365344
23/05/2016,Computacenter Uk,Kelvin Hall,26506
23/05/2016,ECG Facilities Service,Facilities Management Charge,32777
23/05/2016,ECG Facilities Service,Facilities Management Charge,32777
30/05/2016,ALDL,ALDL Charges,32317
10/06/2016,Wavetek Ltd,Kelvin Hall,87589
10/06/2016,John Graham Construction Ltd,Causewayside Refurbishment,381803
28/06/2016,ECG Facilities Service,Facilities Management Charge,32832
30/06/2016,Glasgow City Council,Kelvin Hall,1700000
11/07/2016,Wavetek Ltd,Kelvin Hall,65692
11/07/2016,John Graham Construction Ltd,Causewayside Refurbishment,139845
15/07/2016,Sotheby'S,Literary & Archival Items,28500
18/07/2016,Christies,Literary & Archival Items,33800
25/07/2016,A McGillivray,Causewayside Refurbishment,30113
31/07/2016,ALDL,ALDL Charges,32317
08/08/2016,ECG Facilities Service,Facilities Management Charge,32795
15/08/2016,Creative Video Productions Ltd,Kelvin Hall,26866
15/08/2016,John Graham Construction Ltd,Causewayside Refurbishment,196807
24/08/2016,ECG Facilities Service,Facilities Management Charge,32795
05/09/2016,John Graham Construction Ltd,Causewayside Refurbishment,36359
12/09/2016,Flexiform,Kelvin Hall,42623
12/09/2016,City Of Edinburgh Council,Non Domestic Rates ,144330
12/09/2016,City Of Edinburgh Council,Non Domestic Rates ,49827
12/09/2016,John Graham Construction Ltd,Causewayside Refurbishment,228689
19/09/2016,Jisc Services Ltd Subscription Account,Literary & Archival Items,42629
26/09/2016,Senator International,Kelvin Hall,35706
26/09/2016,ECG Facilities Service,Facilities Management Charge,32795
26/09/2016,John Graham Construction Ltd,Causewayside Refurbishment,28378
30/09/2016,A McGillivray,Causewayside Refurbishment,44392
10/10/2016,Cengage Learning (Emea )Ltd,Literary & Archival Items,86604
10/10/2016,John Graham Construction Ltd,Causewayside Refurbishment,303999
24/10/2016,ECG Facilities Service,Facilities Management Charge,32795
24/10/2016,ALDL,ALDL Charges,32317
31/10/2016,John Graham Construction Ltd,Causewayside Refurbishment,74245
07/11/2016,CBRE,Kelvin Hall,83736
14/11/2016,University Of Glasgow,Kelvin Hall,188682
14/11/2016,John Graham Construction Ltd,Causewayside Refurbishment,362326
08/12/2016,Sothebys,Literary & Archival Items,166000
08/12/2016,Private Sale,Literary & Archival Items,87500
08/12/2016,ECG Facilities Service,Facilities Management Charge,32795
12/12/2016,John Graham Construction Ltd,Causewayside Refurbishment,385310
30/12/2016,ECG Facilities Service,Facilities Management Charge,32795
30/12/2016,John Graham Construction Ltd,Causewayside Refurbishment,253618
30/12/2016,John Graham Construction Ltd,Causewayside Refurbishment,45127
23/01/2017,ALDL,ALDL Charges,27730
07/02/2017,ECG Facilities Service,Facilities Management Charge,32795
07/02/2017,John Graham Construction Ltd,Causewayside Refurbishment,52404
13/02/2017,John Graham Construction Ltd,Causewayside Refurbishment,272390
27/02/2017,Cengage Learning (Emea )Ltd,Literary & Archival Items,43302
27/02/2017,ECG Facilities Service,Facilities Management Charge,32795
06/03/2017,Private Sale,Literary & Archival Items,72500
06/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,31781
06/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,198048
27/03/2017,ECG Facilities Service,Facilities Management Charge,32795
31/03/2017,NLS Foundation,Grant Payment,177500
31/03/2017,Private Sale,Literary & Archival Items,3422500
31/03/2017,Nicholson Bros(Electrical Contractors) Ltd,Causewayside Refurbishment,33666
31/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,222090
31/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,63971
31/03/2017,XMA Scotland Ltd,IT equipment,33450
31/03/2017,XMA Scotland Ltd,IT equipment,84524
24/04/2017,Cengage Learning (Emea )Ltd,Literary & Archival Items,43302
24/04/2017,Scottish Historic Buildings Trust,Lawnmarket Work,50057
24/04/2017,Insight Direct (UK) Ltd,IT equipment,56768
30/04/2017,Morris & Spottiswood Ltd,George IV Bridge Work,63716
08/05/2017,Anglian Water Business,Water,26832
15/05/2017,John Graham Construction Ltd,Causewayside Refurbishment,245381
22/05/2017,ECG Facilities Service,Facilities Management Charge,33386
22/05/2017,ALDL,Legal Deposit Services,27067
29/05/2017,ECG Facilities Service,Facilities Management Charge,33386
29/05/2017,John Graham Construction Ltd,Causewayside Refurbishment,74806
29/05/2017,Morris & Spottiswood Ltd,George IV Bridge Work,56448
31/05/2017,John Graham Construction Ltd,Causewayside Refurbishment,164691
26/06/2017,ECG Facilities Service,Facilities Management Charge,33386
26/06/2017,British Library,Legal Deposit Services,50056
24/07/2017,John Graham Construction Ltd,Causewayside Refurbishment,27926
24/07/2017,John Graham Construction Ltd,Causewayside Refurbishment,212690
24/07/2017,ALDL,Legal Deposit Services,27067
24/07/2017,AM Phillip,Vehicle Purchase,26604
16/08/2017,ECG Facilities Service,Facilities Management Charge,33386
16/08/2017,John Graham Construction Ltd,Causewayside Refurbishment,59021
16/08/2017,John Graham Construction Ltd,Causewayside Refurbishment,136379
16/08/2017,Ex Libris,IT equipment,76610
23/08/2017,Culture And Sport Glasgow,Kelvin Hall,60503
23/08/2017,XMA Scotland Ltd,Kelvin Hall,31830
23/08/2017,ECG Facilities Service,Facilities Management Charge,33386
31/08/2017,John Graham Construction Ltd,Causewayside Refurbishment,36313
31/08/2017,Insight Direct (UK) Ltd,Causewayside Refurbishment,68222
31/08/2017,Mark Finn Laboratory,George IV Bridge Work,53884
11/09/2017,John Graham Construction Ltd,Causewayside Refurbishment,189483
15/09/2017,City Of Edinburgh Council,Non Domestic Rates ,57662
15/09/2017,City Of Edinburgh Council,Non Domestic Rates ,142680
09/10/2017,Frost And Sullivan Ltd,Literary & Archival Items,28125
09/10/2017,JISC Services Ltd ,Literary & Archival Items,43481
23/10/2017,John Graham Construction Ltd,Causewayside Refurbishment,151659
23/10/2017,City Building LLP,Causewayside Refurbishment,53147
30/10/2017,ECG Facilities Service,Facilities Management Charge,35758
30/10/2017,ECG Facilities Service,Facilities Management Charge,35758
06/11/2017,John Graham Construction Ltd,Causewayside Refurbishment,134208
06/11/2017,ALDL,Legal Deposit Services,27067
27/11/2017,Maggs Bros Ltd,Literary & Archival Items,26500
30/11/2017,Glasgow City Council,Kelvin Hall,42345
11/12/2017,ECG Facilities Service,Facilities Management Charge,35758
11/12/2017,John Graham Construction Ltd,Causewayside Refurbishment,159275
08/01/2018,ECG Facilities Service,Facilities Management Charge,35758
15/01/2018,Proquest Information And Learn,Literary & Archival Items,42199
15/01/2018,John Graham Construction Ltd,Causewayside Refurbishment,123244
29/01/2018,ECG Facilities Service,Facilities Management Charge,35758
05/02/2018,John Graham Construction Ltd,Causewayside Refurbishment,102659
27/02/2018,ALDL,Legal Deposit Services,27067
07/03/2018,John Graham Construction Ltd,Causewayside Refurbishment,89559
14/03/2018,Bernard Quaritch Ltd,Literary & Archival Items,372500
14/03/2018,ECG Facilities Service,Facilities Management Charge,35758
21/03/2018,Site Sealants Ltd,Causewayside Refurbishment,27747
30/03/2018,Private Sale,Literary & Archival Items,100000
30/03/2018,ECG Facilities Service,Facilities Management Charge,35758
30/04/2018,ECG FACILITIES SERVICE,Causewayside IT Work,25634.7
30/04/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
14/05/2018,GLASGOW CITY COUNCIL,Kelvin Hall,90946
11/06/2018,ALDL,ALDL Charges,27067
11/06/2018,JOHN GRAHAM CONSTRUCTION LTD,Causewayisde Refurbishment,127753.31
22/06/2018,BONHAMS - LONDON,Literary & Archival Items,25025
22/06/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
22/06/2018,EX LIBRIS,IT equipment,39000
30/06/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
16/07/2018,EX LIBRIS,IT equipment,80057.83
18/07/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
18/07/2018,Sotheby's,Literary & Archival Items,41600
31/08/2018,AUTOMATED DOCUMENT SERVICES,IT equipment,84480
31/08/2018,XMA SCOTLAND LTD,IT equipment,313000
13/09/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
13/09/2018,CITY OF EDINBURGH COUNCIL,Non Domestic Rates,59303.2
13/09/2018,CITY OF EDINBURGH COUNCIL,Non Domestic Rates,146740
20/09/2018,FROST AND SULLIVAN LTD,Literary & Archival Items,28125
20/09/2018,SJS Property Services,George IV Bridge Work,44684.2
20/09/2018,CENGAGE LEARNING (EMEA )LTD,Literary & Archival Items,64791
30/09/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
30/09/2018,SJS Property Services,George IV Bridge Work,51635.35
24/10/2018,XMA SCOTLAND LTD,IT equipment,35313.48
24/10/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
21/11/2018,EX LIBRIS,IT equipment,39000
21/11/2018,EX LIBRIS,IT equipment,53327.09
26/11/2018,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
26/11/2018,SJS Property Services,George IV Bridge Work,66818.25
11/12/2018,CALEDONIAN LIFT SERVICES LTD,Causewayside Work,47944.8
31/12/2018,SOFTCAT,IT equipment,37064.3
14/01/2019,m-hance,IT Work,33164.4
14/01/2019,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
24/01/2019,ARTHUR MCKAY BUILDING SERVICES,Causewayside Work,100235.17
31/01/2019,ECG FACILITIES SERVICE,Causewayside Work,32517.45
31/01/2019,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
31/01/2019,CENGAGE LEARNING (EMEA )LTD,Literary & Archival Items,66443
14/02/2019,Private Sale,Literary & Archival Items,50000
27/02/2019,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
31/03/2019,ECG FACILITIES SERVICE,Facilities Management Charge,35757.91
31/03/2019,ECG FACILITIES SERVICE,George IV Bridge Work,37320.15
31/03/2019,HP INC UK LTD,IT equipment,40746
31/03/2019,INSIGHT DIRECT (UK) LTD,IT equipment,56223.35
23/04/2019,EX LIBRIS,"IT equipment
",129584.58
30/04/2019,ECG FACILITIES SERVICE,Facilities Management Charge,36907.14
30/04/2019,COMPUTACENTER UK,"IT equipment
",139571.14
13/05/2019,GLASGOW LIFE,Kelvin Hall Service Charge,120335
04/06/2019,ECG FACILITIES SERVICE,Facilities Management Charge,36907.14
24/06/2019,Private Sale,Literary & Archival Items,34400
25/06/2019,ECG FACILITIES SERVICE,Facilities Management Charge,36907.14
31/07/2019,ECG FACILITIES SERVICE,Facilities Management Charge,36907.14
26/08/2019,MICROBOX GmbH,Digital equipment,65881.58
27/08/2019,ECG FACILITIES SERVICE,Facilities Management Charge,36907.14
27/08/2019,FROST AND SULLIVAN LTD,Literary & Archival Items,28687.5
18/09/2019,CITY OF EDINBURGH COUNCIL,Annual Property Rates 2019/20 for three buildings,221467.2
25/09/2019,LOTHIAN HEATING SERVICES LTD,Payment 1 - GB Boiler replacement ,57114.18
25/09/2019,ECG FACILITIES SERVICE,Facilities Management Charge,34021.61
25/09/2019,EDF Energy,Electricity,33122.06
18/09/2019,INSTITUTE OF CONSERVATION,Bursary Recruitment and Professional Services costs for intern,26805.2
10/10/2019,ECG FACILITIES SERVICE,"CB Bolier Replacement (1),USP Batteries,Gutter Works & Cleaning of pigeon fouling",112794
23/10/2019,ECG FACILITIES SERVICE,"CB Bolier Replacement (2),Facilities Management Charge October 19, intumescent strips & unblocking toilets",103462.39
23/10/2019,Private Sale,Purchase of Manuscripts,45000
04/10/2019,ECG FACILITIES SERVICE,Facilities Management Charge September 19,44288.57
10/10/2019,GLASGOW LIFE,Service Charges Kelvin Hall,39100.16
15/10/2019,EDF ENERGY,Electricity,26805.74
04/10/2019,JISC SERVICES LTD SUBSCRIPTION ACCOUNT,Annual Subscription,25731
23/10/2019,ALDL,Oct19-Dec19 charge from Agency for Legal Deposit Libraries,25155.6
27/11/2019,ECG FACILITIES SERVICE,"Paymnet for 31 invoices including Facilities Managemenr Charge Nov 19, Lift Repairs, replacement refrigerant gas detection system & data cabling and install of WIFI devices",104526.09
05/11/2019,LOTHIAN HEATING SERVICES LTD,GB Bolier Replacement - application 2,45728.9
27/11/2019,GLASGOW LIFE,Service Charges Kelvin Hall 01/07/19-30/09/19,41541.47
19/11/2019,EDF ENERGY,Electricity Oct 2019 3 buildings,26660.9
10/12/2019,PRIVATE SALE,Collection of papers of an individual,125000
06/12/2019,PROQUEST,Purchase of 9 subscriptions 01/11/19-31/10/20,61638
18/12/2019,ECG,"Payment of 19 separate invoice including for service of chiller, re-route return pipes, data cabling and install of WifI devices, sprinkler work",44556.15
22/01/2020,ECG,"Payment of 28 separate invoices including for supply and fit aluminium screen, upgrade boilerhouse electrical panels,CCTV components, pump casting & lift repairs",89297.94
09/01/2020,ECG,Payment of 18 separate invoices including for December facilities services and boiler replacement CB,78585.73
14/01/2020,LM Information Delivery UK LTD,Payment of 18 separate invoice for Online/Print subscriptions Jan 20-Dec 20,27822.54
14/01/2020,EDF,Electricity,25172.34
14/01/2020,ALDL,Jan20-Mar 20 charge from Agency for Legal Deposit Libraries,25155.6
06/02/2020,XMA Scotland,Scality Ring Maintenance,68464.62
06/02/2020,Trustmarque,Miscrosoft Software Licenses,38069.66
11/02/2020,Studio MB,Concept Design Semi-Permanent Exhibtion,27000
11/02/2020,EDF,Electricity,25484.03
06/03/2020,British Library,Governance and Management Costs,27766.6
10/03/2020,Proquest,Subscriptions,50309.81
10/03/2020,ECG,Two months maintance contracts,80041.02
17/03/2020,BSI,Subscription,30951.6
17/03/2020,Glasgow Life,Kelvin Hall Service Charges,55857.04
17/03/2020,Private Collection,Collection of literary papers,60000
20/03/2020,EDF,Electricity,25829.65
20/03/2020,ECG,This payment covers 16 invoices including upgrade to boiler control panel & remedial works following 5 year test,32025.98
06/04/2020,Gardiner and Theobald,GB Feasibility Study,49508
06/04/2020,ECG,This payment covers 8 invocies including monthly facilities management fees & site inspection fees,51822.68
23/04/2020,OCLC UK,Cataloging and Metadata subscription,26251.2
23/04/2020,John Graham,Stonework Retention Payment,25104.56
23/04/2020,EDF,Electricity,25025.89
23/04/2020,Studio MB,Exhibition design,63000
23/04/2020,ECG,"This payment covers 5 invocies including monthly facilities management fees, software and hardware maintenance & Lighting Upgrades",65200.11
14/05/2020,GARDINER AND THEOBALD LLP,GB Feasibility Study,26291.48
14/05/2020,HP INC UK LTD,IT equipment purchase,30640.32
14/05/2020,XMA SCOTLAND LTD,Purchase of IT equipment and renewal of maintenance agreement. This payment covers 2 invoices,139167.6
14/05/2020,CENGAGE LEARNING EMEA LTD,Annual hosting fee,28800
21/05/2020,ECG FACILITIES SERVICE,CB Boiler replacement plus monthly maintenance fee. This payment covers 2 invoices,47899.83
29/05/2020,EDF ENERGY,Electricity for April in Causewayside and George IV Bridge buildings. This payment covers 2 invoices.,30175.09
29/05/2020,SOFTCAT,Software Licence,42866.5
09/06/2020,Ex Libris,Annual subsriptions. This payment covers 2 invoices.,189036.11
09/06/2020,Glasgow Life,Service Charges,49509.2
09/06/2020,XMA Scotland Ltd,IT equipment,25371.84
18/06/2020,JISC SERVICES LTD SUBSCRIPTION ACCOUNT,Annual subscription,25896
25/06/2020,ECG FACILITIES SERVICE,Facility Management fees,49000
25/06/2020,GARDINER AND THEOBALD LLP,GB Feasibility Study,26291.48
25/06/2020,THE LEARNING POOL,E-Learning Resources,25344
07/07/2020,Agency for the Legal Deposit Libraries,Agency services,26007.95
07/07/2020,Lyon and Turnball,Various collection items,54094
09/07/2020,XMA Scotland Ltd,Computer equipment,33327
14/07/2020,EDF Energy,Utilities,25768.85
23/07/2020,Computer Centre UK Ltd,Computer equipment,27750.79
23/07/2020,ECG Facility Services,Facility Management fees,49000
23/07/2020,GARDINER AND THEOBALD LLP,GB Feasibility Study,26291.48
13/08/2020,EDF Energy,Utilities. This transaction is made up of 3 invoices.,26688.27
13/08/2020,Frost & Sullivan Ltd,Annual subscription,34425
27/08/2020,Agency for Legal Deposit Libaries,Agency services,26007.95
27/08/2020,ECG Facilities Services,Facility Management fees,49000
27/08/2020,Gardiner and Theobald LLP,GB Feasibility Study,26291.48
17/09/2020,EDF Energy,This payment covers 3 invoices for utility services,34283.03
17/09/2020,JISC Services Ltd,Subscription,26179.72
17/09/2020,XMA Scotland Ltd,IT equipment,26533.92
24/09/2020,ECG Facilities Services,Facility Management fees,55450.58
24/09/2020,Glasgow Life,Service charges,25211.17
08/10/2020,EDF Energy,This payment covers 5 invoices for utility services,27625.53
08/10/2020,ALDL,Agency services,26007.95
08/10/2020,Institute of Conservation,This payment covers 2 invoices for student bursary costs,31654
08/10/2020,Studio MB,Exhibition build works,36000
22/10/2020,ECG Facilities,This payment covers 11 invoices for facility Management fees,55672.9
22/10/2020,Glasgow City Council,Capital works,34802.4
19/11/2020,DTEK DIGITAL SOLUTIONS LTD,Computer equipment,39348
19/11/2020,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility Management fees,31888.51
19/11/2020,GLASGOW LIFE,Builidng service charges,47690.16
26/11/2020,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility Management fees,55299.92
26/11/2020,LEE BOYD LIMITED,This payment covers 7 invoices for project management fees,26440.98
03/12/2020,PROQUEST INFORMATION AND LEARN,This payment covers multiple invoices for collection items,50232.54
10/12/2020,STUDIO MB,This payment covers 2 invoices for exhibition services and equipment,55902
17/12/2020,ECG FACILITIES SERVICE,Facility Management Fees,49000
17/12/2020,LEE BOYD LIMITED,This payment covers multiple invoices for project management fees,28922.8
07/01/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,39150.26
14/01/2021,EDF ENERGY,This payment covers multiple invoices for electricity,28711.17
14/01/2021,ALDL,Legal deposit services,26007.95
14/01/2021,EXCHANGE COMMUNICATIONS INSTALLATIONS LTD,Telecom services,31878
21/01/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,28797.1
28/01/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,54875.74
04/02/2021,PROQUEST INFORMATION AND LEARN,One invoice for collection items,40000
18/02/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,54931.68
25/02/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,51283.39
25/02/2021,HP INC UK LTD,IT Equipment,37868.04
10/03/2021,BSI,BSOL Modular Subscription,30510
16/03/2021,PHOENIX SOFTWARE LTD,IT Hardware plus 5 year licence,74432.04
16/03/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,134758.64
23/03/2021,ECG FACILITIES SERVICE,Maintenance Contract - March,49000
23/03/2021,ICAM ARCHIVE SYSTEMS,Camera System - phase 1,39120
25/03/2021,ECG FACILITIES SERVICE,This payment covers multiple invoices for facility management fees,108450.85
31/03/2021,GLASGOW LIFE,Oct 20 to Dec 20 service charge - Kelvin Hall,54840.53
31/03/2021,ECG FACILITIES SERVICE,Replacement Humidifer units,76751
31/03/2021,ECG FACILITIES SERVICE,Cooling and Humidifer system upgrade,26943.84
31/03/2021,ECG FACILITIES SERVICE,Installation of CCTV,29404.62
29/04/2021,ECG FACILITIES SERVICE,This payment covers April 21 Maintenance Contract and the installation of battery rack and batteries plus smaller maintenance invoices,71604.07
29/04/2021,GLASGOW LIFE,Jan 21 to Mar 21 service charge - Kelvin Hall,46657.33
20/05/2021,ECG FACILITIES SERVICE,Routine inspection and maintenance of all NLS properties,52584.2
27/05/2021,XMA SCOTLAND LTD,2 invoices one for the replacement of obsolete hardware and the other for a new laptop,28587.59
13/05/2021,ALDL,"Claiming, receipting and onward distribution of legal deposit on behalf of NLS",26376.68
27/05/2021,LYON AND TURNBULL,Purchase of a manuscript,26000
27/05/2021,ARNOLD CLARK,Purchase of an electric van,25949.5
28/06/2021,XMA Scotland Ltd,Purchase of IT hardware for cloud and maintenance of hardware,72061.92
08/07/2021,EX LIBRIS,Subscription April to Oct 21 cloud based library services,95045.31
08/07/2021,ECG FACILITIES SERVICE,Maintenance contract - June 21 period,52459.25
08/07/2021,XMA SCOTLAND LTD,IT hardware equipment,37620.86
22/07/2021,ALDL,Quarterly invoice legal deposit materials - July to Sept 21,26400.68
12/08/2021,ECG FACILITIES SERVICE,Maintenance contract - July 21 period,52459.25
27/08/2021,ECG FACILITIES SERVICE,Maintenance contract - August 21 period,52459.25
27/08/2021,ECG FACILITIES SERVICE,Water penetration works - part 2,28350
27/08/2021,ECG FACILITIES SERVICE,Water penetration works - part 3,28350
22/09/2021,GLASGOW LIFE,Kelvin Hall Service Charge - April to June 21,35420.45
29/09/2021,ECG FACILITIES SERVICE,Maintenance contract - all properties,52459.25
29/09/2021,FROST AND SULLIVAN LTD,Annual Subscription - Sept 21 to Oct 22,35147.09
21/10/2021,ECG FACILITIES SERVICE,Maintenance contract - October,52459.25
31/10/2021,SOFTCAT,It purchases for server,42282.72
14/10/2021,ALDL,"Claiming, receipting and onward distribution for quarter Oct to Dec 21",26400.68
04/11/2021,Web of Science JISC SHEDL subs ,Subscription 2021 to 2021 SHEDL,28361.78
11/11/2021,M and J Kelman Ltd,Literary and personal papers of James Kelman,40000
11/11/2021,John Graham Constrution Ltd,External fabric repairs - Causeway Side building,75262.75
11/11/2021,Robert Harland,Correspondance and Literary papers - Thomas Carlyle,94000
11/11/2021,Jisc Services Ltd,IT Subscription and router service charge,25896
25/11/2021,ECG Facilities,Maintenance Contract - November,52459.25
25/11/2021,Ex Libris,IT Subscription ,81729.02
31/12/2021,ECG FACILITIES SERVICE,Electrical and mechanical works,28071.17
16/12/2021,JAMES BRECK LTD,Re-slating of roof LB,28572.28
23/12/2021,CENGAGE LEARNING EMEA LTD,Subscription - Historical Archive,32460
31/12/2021,GLASGOW LIFE,Quarterly service charge KH,45541.34
31/12/2021,ECG FACILITIES SERVICE,Maintenance Contract - December,52459.25
16/12/2021,ECG FACILITIES SERVICE,"Electrical, mechanical and building works",82227.96
27/01/2022,ECG FACILITIES SERVICE,January maintenance contract,52459.25
31/01/2022,ALDL,1st January to 31st March 22 - receipting and onward distribution of UK legal deposit materials on behalf of National Library of Scotland,26388.68
03/02/2022,ECG FACILITIES SERVICE,"Monthly maintenance contract, drainage jetting and cctv remedials, patio roofing wash",62411.69
10/02/2022,JAMES BRECK LTD,Roof uplifting and re-slating,31890.41
10/02/2022,LEE BOYD LIMITED,Various invoices smoke extract system and rateable value review,30552
17/02/2022,LEE BOYD LIMITED,"Various invoices for CB smoke extract system, project work - FM maintenance framework, sprinkler system",57766.9
24/02/2022,ECG FACILITIES SERVICE,"Carry out tanking works, supply and fit mini drive unit, balustrade repairs",27723.16
24/02/2022,ADAM MATTHEW DIGITAL LTD,Resource - slavery abolution and social justice,37080
10/03/2022,ECG FACILITIES SERVICE,Maintenance contract - March,52459.25
10/03/2022,XMA SCOTLAND LTD,It equipment,61885.56
17/03/2022,EDF ENERGY,Electricity bill for various sites,57220.55
17/03/2022,ECG FACILITIES SERVICE,Maintenance contract - Feb plus various smaller invoices for maintenance jobs,71653.47
17/03/2022,XMA010,IT equipment,77208.77
17/03/2022,OXFORD UNIVERSITY PRESS,Annual subscription,28576.89
24/03/2022,ECG FACILITIES SERVICE,Various small maintenance jobs around library sites,34055.73
24/03/2022,GLASGOW LIFE,Kelvin Hall quarterly service charge,41637.96
24/03/2022,LEE BOYD LIMITED,Sprinkler system project and lift refurb George IV,55234
24/03/2022,BSI,Annual subscription,31425
31/03/2022,ECG FACILITIES SERVICE,Various small maintenance jobs around library sites,28760.32
31/03/2022,XMA SCOTLAND LTD,It equipment,47461.25
31/03/2022,JAMES BRECK LTD,Roof uplift and reslating,28230.64
31/03/2022,LEE BOYD LIMITED,Various small maintenance jobs around library sites,26396.1
31/03/2022,UNIVERSITY OF DUNDEE,Salary costs for SCURL Scottish Universities press project,39726.44
30/04/2022,JISC Services Ltd,Managed router service charge annual subscription 01/04/22 to 31/03/23,25896
30/04/2022,EX Libris,Subscription Alma and Primo 01/04/22 to 31/10/22,114420.65
11/05/2022,KENNYS BOOKSHOP&ART GALLERIES,Purchase of Smillie Archive,30000
12/05/2022,ECG FACILITIES SERVICE,Inspection and Maintenance of all Library properties,55711.72
19/05/2022,CAE TECHNOLOGY SERVICES LIMITED,Subscription renewal,25041.31
19/05/2022,GLASGOW LIFE,Kelvin Hall service charge Jan to Mar 22,59084.95
31/05/2022,ECG FACILITIES SERVICE,Fit pre-purchased humidifiers,29710.8
31/05/2022,ECG FACILITIES SERVICE,Routine inspection and maintenance May 22,55711.72
31/05/2022,ALDL,Legal deposit materials April to July 22,27013.18
09/06/2022,LEE BOYD LIMITED,Architectural Works,93690
16/06/2022,CITY OF EDINBURGH COUNCIL,Rates for 33 Salisbury Place,136240
16/06/2022,CITY OF EDINBURGH COUNCIL,Rates 57 George IV Bridge,41920
23/06/2022,ECG FACILITIES SERVICE,Maintenance contract - June 22,55711.72
21/07/2022,ALDL,"Claiming,receipting and onward distribution of UK legal deposit materials July to Sept 22",27013.16
21/07/2022,RICK GEKOSKI,Papers 1970's to 2019 Alisdair Gray,125000
28/07/2022,SONYA LEONARD,Literary and personal papers of Tom Leonard 1961 to 2018,40000
1 Date Supplier Description Transaction value (£)
2 21/04/2016 M & J Ballantyne Ltd George IV Bridge Work 35098
3 26/04/2016 Private Sale Literary & Archival Items 30000
4 30/04/2016 City Of Edinburgh Council Non Domestic Rates 40800
5 09/05/2016 Computacenter Uk Kelvin Hall 72835
6 09/05/2016 John Graham Construction Ltd Causewayside Refurbishment 64361
7 09/05/2016 A McGillivray Causewayside Refurbishment 53690
8 16/05/2016 John Graham Construction Ltd Causewayside Refurbishment 365344
9 23/05/2016 Computacenter Uk Kelvin Hall 26506
10 23/05/2016 ECG Facilities Service Facilities Management Charge 32777
11 23/05/2016 ECG Facilities Service Facilities Management Charge 32777
12 30/05/2016 ALDL ALDL Charges 32317
13 10/06/2016 Wavetek Ltd Kelvin Hall 87589
14 10/06/2016 John Graham Construction Ltd Causewayside Refurbishment 381803
15 28/06/2016 ECG Facilities Service Facilities Management Charge 32832
16 30/06/2016 Glasgow City Council Kelvin Hall 1700000
17 11/07/2016 Wavetek Ltd Kelvin Hall 65692
18 11/07/2016 John Graham Construction Ltd Causewayside Refurbishment 139845
19 15/07/2016 Sotheby'S Literary & Archival Items 28500
20 18/07/2016 Christies Literary & Archival Items 33800
21 25/07/2016 A McGillivray Causewayside Refurbishment 30113
22 31/07/2016 ALDL ALDL Charges 32317
23 08/08/2016 ECG Facilities Service Facilities Management Charge 32795
24 15/08/2016 Creative Video Productions Ltd Kelvin Hall 26866
25 15/08/2016 John Graham Construction Ltd Causewayside Refurbishment 196807
26 24/08/2016 ECG Facilities Service Facilities Management Charge 32795
27 05/09/2016 John Graham Construction Ltd Causewayside Refurbishment 36359
28 12/09/2016 Flexiform Kelvin Hall 42623
29 12/09/2016 City Of Edinburgh Council Non Domestic Rates 144330
30 12/09/2016 City Of Edinburgh Council Non Domestic Rates 49827
31 12/09/2016 John Graham Construction Ltd Causewayside Refurbishment 228689
32 19/09/2016 Jisc Services Ltd Subscription Account Literary & Archival Items 42629
33 26/09/2016 Senator International Kelvin Hall 35706
34 26/09/2016 ECG Facilities Service Facilities Management Charge 32795
35 26/09/2016 John Graham Construction Ltd Causewayside Refurbishment 28378
36 30/09/2016 A McGillivray Causewayside Refurbishment 44392
37 10/10/2016 Cengage Learning (Emea )Ltd Literary & Archival Items 86604
38 10/10/2016 John Graham Construction Ltd Causewayside Refurbishment 303999
39 24/10/2016 ECG Facilities Service Facilities Management Charge 32795
40 24/10/2016 ALDL ALDL Charges 32317
41 31/10/2016 John Graham Construction Ltd Causewayside Refurbishment 74245
42 07/11/2016 CBRE Kelvin Hall 83736
43 14/11/2016 University Of Glasgow Kelvin Hall 188682
44 14/11/2016 John Graham Construction Ltd Causewayside Refurbishment 362326
45 08/12/2016 Sothebys Literary & Archival Items 166000
46 08/12/2016 Private Sale Literary & Archival Items 87500
47 08/12/2016 ECG Facilities Service Facilities Management Charge 32795
48 12/12/2016 John Graham Construction Ltd Causewayside Refurbishment 385310
49 30/12/2016 ECG Facilities Service Facilities Management Charge 32795
50 30/12/2016 John Graham Construction Ltd Causewayside Refurbishment 253618
51 30/12/2016 John Graham Construction Ltd Causewayside Refurbishment 45127
52 23/01/2017 ALDL ALDL Charges 27730
53 07/02/2017 ECG Facilities Service Facilities Management Charge 32795
54 07/02/2017 John Graham Construction Ltd Causewayside Refurbishment 52404
55 13/02/2017 John Graham Construction Ltd Causewayside Refurbishment 272390
56 27/02/2017 Cengage Learning (Emea )Ltd Literary & Archival Items 43302
57 27/02/2017 ECG Facilities Service Facilities Management Charge 32795
58 06/03/2017 Private Sale Literary & Archival Items 72500
59 06/03/2017 John Graham Construction Ltd Causewayside Refurbishment 31781
60 06/03/2017 John Graham Construction Ltd Causewayside Refurbishment 198048
61 27/03/2017 ECG Facilities Service Facilities Management Charge 32795
62 31/03/2017 NLS Foundation Grant Payment 177500
63 31/03/2017 Private Sale Literary & Archival Items 3422500
64 31/03/2017 Nicholson Bros(Electrical Contractors) Ltd Causewayside Refurbishment 33666
65 31/03/2017 John Graham Construction Ltd Causewayside Refurbishment 222090
66 31/03/2017 John Graham Construction Ltd Causewayside Refurbishment 63971
67 31/03/2017 XMA Scotland Ltd IT equipment 33450
68 31/03/2017 XMA Scotland Ltd IT equipment 84524
69 24/04/2017 Cengage Learning (Emea )Ltd Literary & Archival Items 43302
70 24/04/2017 Scottish Historic Buildings Trust Lawnmarket Work 50057
71 24/04/2017 Insight Direct (UK) Ltd IT equipment 56768
72 30/04/2017 Morris & Spottiswood Ltd George IV Bridge Work 63716
73 08/05/2017 Anglian Water Business Water 26832
74 15/05/2017 John Graham Construction Ltd Causewayside Refurbishment 245381
75 22/05/2017 ECG Facilities Service Facilities Management Charge 33386
76 22/05/2017 ALDL Legal Deposit Services 27067
77 29/05/2017 ECG Facilities Service Facilities Management Charge 33386
78 29/05/2017 John Graham Construction Ltd Causewayside Refurbishment 74806
79 29/05/2017 Morris & Spottiswood Ltd George IV Bridge Work 56448
80 31/05/2017 John Graham Construction Ltd Causewayside Refurbishment 164691
81 26/06/2017 ECG Facilities Service Facilities Management Charge 33386
82 26/06/2017 British Library Legal Deposit Services 50056
83 24/07/2017 John Graham Construction Ltd Causewayside Refurbishment 27926
84 24/07/2017 John Graham Construction Ltd Causewayside Refurbishment 212690
85 24/07/2017 ALDL Legal Deposit Services 27067
86 24/07/2017 AM Phillip Vehicle Purchase 26604
87 16/08/2017 ECG Facilities Service Facilities Management Charge 33386
88 16/08/2017 John Graham Construction Ltd Causewayside Refurbishment 59021
89 16/08/2017 John Graham Construction Ltd Causewayside Refurbishment 136379
90 16/08/2017 Ex Libris IT equipment 76610
91 23/08/2017 Culture And Sport Glasgow Kelvin Hall 60503
92 23/08/2017 XMA Scotland Ltd Kelvin Hall 31830
93 23/08/2017 ECG Facilities Service Facilities Management Charge 33386
94 31/08/2017 John Graham Construction Ltd Causewayside Refurbishment 36313
95 31/08/2017 Insight Direct (UK) Ltd Causewayside Refurbishment 68222
96 31/08/2017 Mark Finn Laboratory George IV Bridge Work 53884
97 11/09/2017 John Graham Construction Ltd Causewayside Refurbishment 189483
98 15/09/2017 City Of Edinburgh Council Non Domestic Rates 57662
99 15/09/2017 City Of Edinburgh Council Non Domestic Rates 142680
100 09/10/2017 Frost And Sullivan Ltd Literary & Archival Items 28125
101 09/10/2017 JISC Services Ltd Literary & Archival Items 43481
102 23/10/2017 John Graham Construction Ltd Causewayside Refurbishment 151659
103 23/10/2017 City Building LLP Causewayside Refurbishment 53147
104 30/10/2017 ECG Facilities Service Facilities Management Charge 35758
105 30/10/2017 ECG Facilities Service Facilities Management Charge 35758
106 06/11/2017 John Graham Construction Ltd Causewayside Refurbishment 134208
107 06/11/2017 ALDL Legal Deposit Services 27067
108 27/11/2017 Maggs Bros Ltd Literary & Archival Items 26500
109 30/11/2017 Glasgow City Council Kelvin Hall 42345
110 11/12/2017 ECG Facilities Service Facilities Management Charge 35758
111 11/12/2017 John Graham Construction Ltd Causewayside Refurbishment 159275
112 08/01/2018 ECG Facilities Service Facilities Management Charge 35758
113 15/01/2018 Proquest Information And Learn Literary & Archival Items 42199
114 15/01/2018 John Graham Construction Ltd Causewayside Refurbishment 123244
115 29/01/2018 ECG Facilities Service Facilities Management Charge 35758
116 05/02/2018 John Graham Construction Ltd Causewayside Refurbishment 102659
117 27/02/2018 ALDL Legal Deposit Services 27067
118 07/03/2018 John Graham Construction Ltd Causewayside Refurbishment 89559
119 14/03/2018 Bernard Quaritch Ltd Literary & Archival Items 372500
120 14/03/2018 ECG Facilities Service Facilities Management Charge 35758
121 21/03/2018 Site Sealants Ltd Causewayside Refurbishment 27747
122 30/03/2018 Private Sale Literary & Archival Items 100000
123 30/03/2018 ECG Facilities Service Facilities Management Charge 35758
124 30/04/2018 ECG FACILITIES SERVICE Causewayside IT Work 25634.7
125 30/04/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
126 14/05/2018 GLASGOW CITY COUNCIL Kelvin Hall 90946
127 11/06/2018 ALDL ALDL Charges 27067
128 11/06/2018 JOHN GRAHAM CONSTRUCTION LTD Causewayisde Refurbishment 127753.31
129 22/06/2018 BONHAMS - LONDON Literary & Archival Items 25025
130 22/06/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
131 22/06/2018 EX LIBRIS IT equipment 39000
132 30/06/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
133 16/07/2018 EX LIBRIS IT equipment 80057.83
134 18/07/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
135 18/07/2018 Sotheby's Literary & Archival Items 41600
136 31/08/2018 AUTOMATED DOCUMENT SERVICES IT equipment 84480
137 31/08/2018 XMA SCOTLAND LTD IT equipment 313000
138 13/09/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
139 13/09/2018 CITY OF EDINBURGH COUNCIL Non Domestic Rates 59303.2
140 13/09/2018 CITY OF EDINBURGH COUNCIL Non Domestic Rates 146740
141 20/09/2018 FROST AND SULLIVAN LTD Literary & Archival Items 28125
142 20/09/2018 SJS Property Services George IV Bridge Work 44684.2
143 20/09/2018 CENGAGE LEARNING (EMEA )LTD Literary & Archival Items 64791
144 30/09/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
145 30/09/2018 SJS Property Services George IV Bridge Work 51635.35
146 24/10/2018 XMA SCOTLAND LTD IT equipment 35313.48
147 24/10/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
148 21/11/2018 EX LIBRIS IT equipment 39000
149 21/11/2018 EX LIBRIS IT equipment 53327.09
150 26/11/2018 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
151 26/11/2018 SJS Property Services George IV Bridge Work 66818.25
152 11/12/2018 CALEDONIAN LIFT SERVICES LTD Causewayside Work 47944.8
153 31/12/2018 SOFTCAT IT equipment 37064.3
154 14/01/2019 m-hance IT Work 33164.4
155 14/01/2019 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
156 24/01/2019 ARTHUR MCKAY BUILDING SERVICES Causewayside Work 100235.17
157 31/01/2019 ECG FACILITIES SERVICE Causewayside Work 32517.45
158 31/01/2019 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
159 31/01/2019 CENGAGE LEARNING (EMEA )LTD Literary & Archival Items 66443
160 14/02/2019 Private Sale Literary & Archival Items 50000
161 27/02/2019 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
162 31/03/2019 ECG FACILITIES SERVICE Facilities Management Charge 35757.91
163 31/03/2019 ECG FACILITIES SERVICE George IV Bridge Work 37320.15
164 31/03/2019 HP INC UK LTD IT equipment 40746
165 31/03/2019 INSIGHT DIRECT (UK) LTD IT equipment 56223.35
166 23/04/2019 EX LIBRIS IT equipment 129584.58
167 30/04/2019 ECG FACILITIES SERVICE Facilities Management Charge 36907.14
168 30/04/2019 COMPUTACENTER UK IT equipment 139571.14
169 13/05/2019 GLASGOW LIFE Kelvin Hall Service Charge 120335
170 04/06/2019 ECG FACILITIES SERVICE Facilities Management Charge 36907.14
171 24/06/2019 Private Sale Literary & Archival Items 34400
172 25/06/2019 ECG FACILITIES SERVICE Facilities Management Charge 36907.14
173 31/07/2019 ECG FACILITIES SERVICE Facilities Management Charge 36907.14
174 26/08/2019 MICROBOX GmbH Digital equipment 65881.58
175 27/08/2019 ECG FACILITIES SERVICE Facilities Management Charge 36907.14
176 27/08/2019 FROST AND SULLIVAN LTD Literary & Archival Items 28687.5
177 18/09/2019 CITY OF EDINBURGH COUNCIL Annual Property Rates 2019/20 for three buildings 221467.2
178 25/09/2019 LOTHIAN HEATING SERVICES LTD Payment 1 - GB Boiler replacement 57114.18
179 25/09/2019 ECG FACILITIES SERVICE Facilities Management Charge 34021.61
180 25/09/2019 EDF Energy Electricity 33122.06
181 18/09/2019 INSTITUTE OF CONSERVATION Bursary Recruitment and Professional Services costs for intern 26805.2
182 10/10/2019 ECG FACILITIES SERVICE CB Bolier Replacement (1),USP Batteries,Gutter Works & Cleaning of pigeon fouling 112794
183 23/10/2019 ECG FACILITIES SERVICE CB Bolier Replacement (2),Facilities Management Charge October 19, intumescent strips & unblocking toilets 103462.39
184 23/10/2019 Private Sale Purchase of Manuscripts 45000
185 04/10/2019 ECG FACILITIES SERVICE Facilities Management Charge September 19 44288.57
186 10/10/2019 GLASGOW LIFE Service Charges Kelvin Hall 39100.16
187 15/10/2019 EDF ENERGY Electricity 26805.74
188 04/10/2019 JISC SERVICES LTD SUBSCRIPTION ACCOUNT Annual Subscription 25731
189 23/10/2019 ALDL Oct19-Dec19 charge from Agency for Legal Deposit Libraries 25155.6
190 27/11/2019 ECG FACILITIES SERVICE Paymnet for 31 invoices including Facilities Managemenr Charge Nov 19, Lift Repairs, replacement refrigerant gas detection system & data cabling and install of WIFI devices 104526.09
191 05/11/2019 LOTHIAN HEATING SERVICES LTD GB Bolier Replacement - application 2 45728.9
192 27/11/2019 GLASGOW LIFE Service Charges Kelvin Hall 01/07/19-30/09/19 41541.47
193 19/11/2019 EDF ENERGY Electricity Oct 2019 3 buildings 26660.9
194 10/12/2019 PRIVATE SALE Collection of papers of an individual 125000
195 06/12/2019 PROQUEST Purchase of 9 subscriptions 01/11/19-31/10/20 61638
196 18/12/2019 ECG Payment of 19 separate invoice including for service of chiller, re-route return pipes, data cabling and install of WifI devices, sprinkler work 44556.15
197 22/01/2020 ECG Payment of 28 separate invoices including for supply and fit aluminium screen, upgrade boilerhouse electrical panels,CCTV components, pump casting & lift repairs 89297.94
198 09/01/2020 ECG Payment of 18 separate invoices including for December facilities services and boiler replacement CB 78585.73
199 14/01/2020 LM Information Delivery UK LTD Payment of 18 separate invoice for Online/Print subscriptions Jan 20-Dec 20 27822.54
200 14/01/2020 EDF Electricity 25172.34
201 14/01/2020 ALDL Jan20-Mar 20 charge from Agency for Legal Deposit Libraries 25155.6
202 06/02/2020 XMA Scotland Scality Ring Maintenance 68464.62
203 06/02/2020 Trustmarque Miscrosoft Software Licenses 38069.66
204 11/02/2020 Studio MB Concept Design Semi-Permanent Exhibtion 27000
205 11/02/2020 EDF Electricity 25484.03
206 06/03/2020 British Library Governance and Management Costs 27766.6
207 10/03/2020 Proquest Subscriptions 50309.81
208 10/03/2020 ECG Two months maintance contracts 80041.02
209 17/03/2020 BSI Subscription 30951.6
210 17/03/2020 Glasgow Life Kelvin Hall Service Charges 55857.04
211 17/03/2020 Private Collection Collection of literary papers 60000
212 20/03/2020 EDF Electricity 25829.65
213 20/03/2020 ECG This payment covers 16 invoices including upgrade to boiler control panel & remedial works following 5 year test 32025.98
214 06/04/2020 Gardiner and Theobald GB Feasibility Study 49508
215 06/04/2020 ECG This payment covers 8 invocies including monthly facilities management fees & site inspection fees 51822.68
216 23/04/2020 OCLC UK Cataloging and Metadata subscription 26251.2
217 23/04/2020 John Graham Stonework Retention Payment 25104.56
218 23/04/2020 EDF Electricity 25025.89
219 23/04/2020 Studio MB Exhibition design 63000
220 23/04/2020 ECG This payment covers 5 invocies including monthly facilities management fees, software and hardware maintenance & Lighting Upgrades 65200.11
221 14/05/2020 GARDINER AND THEOBALD LLP GB Feasibility Study 26291.48
222 14/05/2020 HP INC UK LTD IT equipment purchase 30640.32
223 14/05/2020 XMA SCOTLAND LTD Purchase of IT equipment and renewal of maintenance agreement. This payment covers 2 invoices 139167.6
224 14/05/2020 CENGAGE LEARNING EMEA LTD Annual hosting fee 28800
225 21/05/2020 ECG FACILITIES SERVICE CB Boiler replacement plus monthly maintenance fee. This payment covers 2 invoices 47899.83
226 29/05/2020 EDF ENERGY Electricity for April in Causewayside and George IV Bridge buildings. This payment covers 2 invoices. 30175.09
227 29/05/2020 SOFTCAT Software Licence 42866.5
228 09/06/2020 Ex Libris Annual subsriptions. This payment covers 2 invoices. 189036.11
229 09/06/2020 Glasgow Life Service Charges 49509.2
230 09/06/2020 XMA Scotland Ltd IT equipment 25371.84
231 18/06/2020 JISC SERVICES LTD SUBSCRIPTION ACCOUNT Annual subscription 25896
232 25/06/2020 ECG FACILITIES SERVICE Facility Management fees 49000
233 25/06/2020 GARDINER AND THEOBALD LLP GB Feasibility Study 26291.48
234 25/06/2020 THE LEARNING POOL E-Learning Resources 25344
235 07/07/2020 Agency for the Legal Deposit Libraries Agency services 26007.95
236 07/07/2020 Lyon and Turnball Various collection items 54094
237 09/07/2020 XMA Scotland Ltd Computer equipment 33327
238 14/07/2020 EDF Energy Utilities 25768.85
239 23/07/2020 Computer Centre UK Ltd Computer equipment 27750.79
240 23/07/2020 ECG Facility Services Facility Management fees 49000
241 23/07/2020 GARDINER AND THEOBALD LLP GB Feasibility Study 26291.48
242 13/08/2020 EDF Energy Utilities. This transaction is made up of 3 invoices. 26688.27
243 13/08/2020 Frost & Sullivan Ltd Annual subscription 34425
244 27/08/2020 Agency for Legal Deposit Libaries Agency services 26007.95
245 27/08/2020 ECG Facilities Services Facility Management fees 49000
246 27/08/2020 Gardiner and Theobald LLP GB Feasibility Study 26291.48
247 17/09/2020 EDF Energy This payment covers 3 invoices for utility services 34283.03
248 17/09/2020 JISC Services Ltd Subscription 26179.72
249 17/09/2020 XMA Scotland Ltd IT equipment 26533.92
250 24/09/2020 ECG Facilities Services Facility Management fees 55450.58
251 24/09/2020 Glasgow Life Service charges 25211.17
252 08/10/2020 EDF Energy This payment covers 5 invoices for utility services 27625.53
253 08/10/2020 ALDL Agency services 26007.95
254 08/10/2020 Institute of Conservation This payment covers 2 invoices for student bursary costs 31654
255 08/10/2020 Studio MB Exhibition build works 36000
256 22/10/2020 ECG Facilities This payment covers 11 invoices for facility Management fees 55672.9
257 22/10/2020 Glasgow City Council Capital works 34802.4
258 19/11/2020 DTEK DIGITAL SOLUTIONS LTD Computer equipment 39348
259 19/11/2020 ECG FACILITIES SERVICE This payment covers multiple invoices for facility Management fees 31888.51
260 19/11/2020 GLASGOW LIFE Builidng service charges 47690.16
261 26/11/2020 ECG FACILITIES SERVICE This payment covers multiple invoices for facility Management fees 55299.92
262 26/11/2020 LEE BOYD LIMITED This payment covers 7 invoices for project management fees 26440.98
263 03/12/2020 PROQUEST INFORMATION AND LEARN This payment covers multiple invoices for collection items 50232.54
264 10/12/2020 STUDIO MB This payment covers 2 invoices for exhibition services and equipment 55902
265 17/12/2020 ECG FACILITIES SERVICE Facility Management Fees 49000
266 17/12/2020 LEE BOYD LIMITED This payment covers multiple invoices for project management fees 28922.8
267 07/01/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 39150.26
268 14/01/2021 EDF ENERGY This payment covers multiple invoices for electricity 28711.17
269 14/01/2021 ALDL Legal deposit services 26007.95
270 14/01/2021 EXCHANGE COMMUNICATIONS INSTALLATIONS LTD Telecom services 31878
271 21/01/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 28797.1
272 28/01/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 54875.74
273 04/02/2021 PROQUEST INFORMATION AND LEARN One invoice for collection items 40000
274 18/02/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 54931.68
275 25/02/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 51283.39
276 25/02/2021 HP INC UK LTD IT Equipment 37868.04
277 10/03/2021 BSI BSOL Modular Subscription 30510
278 16/03/2021 PHOENIX SOFTWARE LTD IT Hardware plus 5 year licence 74432.04
279 16/03/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 134758.64
280 23/03/2021 ECG FACILITIES SERVICE Maintenance Contract - March 49000
281 23/03/2021 ICAM ARCHIVE SYSTEMS Camera System - phase 1 39120
282 25/03/2021 ECG FACILITIES SERVICE This payment covers multiple invoices for facility management fees 108450.85
283 31/03/2021 GLASGOW LIFE Oct 20 to Dec 20 service charge - Kelvin Hall 54840.53
284 31/03/2021 ECG FACILITIES SERVICE Replacement Humidifer units 76751
285 31/03/2021 ECG FACILITIES SERVICE Cooling and Humidifer system upgrade 26943.84
286 31/03/2021 ECG FACILITIES SERVICE Installation of CCTV 29404.62
287 29/04/2021 ECG FACILITIES SERVICE This payment covers April 21 Maintenance Contract and the installation of battery rack and batteries plus smaller maintenance invoices 71604.07
288 29/04/2021 GLASGOW LIFE Jan 21 to Mar 21 service charge - Kelvin Hall 46657.33
289 20/05/2021 ECG FACILITIES SERVICE Routine inspection and maintenance of all NLS properties 52584.2
290 27/05/2021 XMA SCOTLAND LTD 2 invoices one for the replacement of obsolete hardware and the other for a new laptop 28587.59
291 13/05/2021 ALDL Claiming, receipting and onward distribution of legal deposit on behalf of NLS 26376.68
292 27/05/2021 LYON AND TURNBULL Purchase of a manuscript 26000
293 27/05/2021 ARNOLD CLARK Purchase of an electric van 25949.5
294 28/06/2021 XMA Scotland Ltd Purchase of IT hardware for cloud and maintenance of hardware 72061.92
295 08/07/2021 EX LIBRIS Subscription April to Oct 21 cloud based library services 95045.31
296 08/07/2021 ECG FACILITIES SERVICE Maintenance contract - June 21 period 52459.25
297 08/07/2021 XMA SCOTLAND LTD IT hardware equipment 37620.86
298 22/07/2021 ALDL Quarterly invoice legal deposit materials - July to Sept 21 26400.68
299 12/08/2021 ECG FACILITIES SERVICE Maintenance contract - July 21 period 52459.25
300 27/08/2021 ECG FACILITIES SERVICE Maintenance contract - August 21 period 52459.25
301 27/08/2021 ECG FACILITIES SERVICE Water penetration works - part 2 28350
302 27/08/2021 ECG FACILITIES SERVICE Water penetration works - part 3 28350
303 22/09/2021 GLASGOW LIFE Kelvin Hall Service Charge - April to June 21 35420.45
304 29/09/2021 ECG FACILITIES SERVICE Maintenance contract - all properties 52459.25
305 29/09/2021 FROST AND SULLIVAN LTD Annual Subscription - Sept 21 to Oct 22 35147.09
306 21/10/2021 ECG FACILITIES SERVICE Maintenance contract - October 52459.25
307 31/10/2021 SOFTCAT It purchases for server 42282.72
308 14/10/2021 ALDL Claiming, receipting and onward distribution for quarter Oct to Dec 21 26400.68
309 04/11/2021 Web of Science JISC SHEDL subs Subscription 2021 to 2021 SHEDL 28361.78
310 11/11/2021 M and J Kelman Ltd Literary and personal papers of James Kelman 40000
311 11/11/2021 John Graham Constrution Ltd External fabric repairs - Causeway Side building 75262.75
312 11/11/2021 Robert Harland Correspondance and Literary papers - Thomas Carlyle 94000
313 11/11/2021 Jisc Services Ltd IT Subscription and router service charge 25896
314 25/11/2021 ECG Facilities Maintenance Contract - November 52459.25
315 25/11/2021 Ex Libris IT Subscription 81729.02
316 31/12/2021 ECG FACILITIES SERVICE Electrical and mechanical works 28071.17
317 16/12/2021 JAMES BRECK LTD Re-slating of roof LB 28572.28
318 23/12/2021 CENGAGE LEARNING EMEA LTD Subscription - Historical Archive 32460
319 31/12/2021 GLASGOW LIFE Quarterly service charge KH 45541.34
320 31/12/2021 ECG FACILITIES SERVICE Maintenance Contract - December 52459.25
321 16/12/2021 ECG FACILITIES SERVICE Electrical, mechanical and building works 82227.96
322 27/01/2022 ECG FACILITIES SERVICE January maintenance contract 52459.25
323 31/01/2022 ALDL 1st January to 31st March 22 - receipting and onward distribution of UK legal deposit materials on behalf of National Library of Scotland 26388.68
324 03/02/2022 ECG FACILITIES SERVICE Monthly maintenance contract, drainage jetting and cctv remedials, patio roofing wash 62411.69
325 10/02/2022 JAMES BRECK LTD Roof uplifting and re-slating 31890.41
326 10/02/2022 LEE BOYD LIMITED Various invoices smoke extract system and rateable value review 30552
327 17/02/2022 LEE BOYD LIMITED Various invoices for CB smoke extract system, project work - FM maintenance framework, sprinkler system 57766.9
328 24/02/2022 ECG FACILITIES SERVICE Carry out tanking works, supply and fit mini drive unit, balustrade repairs 27723.16
329 24/02/2022 ADAM MATTHEW DIGITAL LTD Resource - slavery abolution and social justice 37080
330 10/03/2022 ECG FACILITIES SERVICE Maintenance contract - March 52459.25
331 10/03/2022 XMA SCOTLAND LTD It equipment 61885.56
332 17/03/2022 EDF ENERGY Electricity bill for various sites 57220.55
333 17/03/2022 ECG FACILITIES SERVICE Maintenance contract - Feb plus various smaller invoices for maintenance jobs 71653.47
334 17/03/2022 XMA010 IT equipment 77208.77
335 17/03/2022 OXFORD UNIVERSITY PRESS Annual subscription 28576.89
336 24/03/2022 ECG FACILITIES SERVICE Various small maintenance jobs around library sites 34055.73
337 24/03/2022 GLASGOW LIFE Kelvin Hall quarterly service charge 41637.96
338 24/03/2022 LEE BOYD LIMITED Sprinkler system project and lift refurb George IV 55234
339 24/03/2022 BSI Annual subscription 31425
340 31/03/2022 ECG FACILITIES SERVICE Various small maintenance jobs around library sites 28760.32
341 31/03/2022 XMA SCOTLAND LTD It equipment 47461.25
342 31/03/2022 JAMES BRECK LTD Roof uplift and reslating 28230.64
343 31/03/2022 LEE BOYD LIMITED Various small maintenance jobs around library sites 26396.1
344 31/03/2022 UNIVERSITY OF DUNDEE Salary costs for SCURL Scottish Universities press project 39726.44
345 30/04/2022 JISC Services Ltd Managed router service charge annual subscription 01/04/22 to 31/03/23 25896
346 30/04/2022 EX Libris Subscription Alma and Primo 01/04/22 to 31/10/22 114420.65
347 11/05/2022 KENNYS BOOKSHOP&ART GALLERIES Purchase of Smillie Archive 30000
348 12/05/2022 ECG FACILITIES SERVICE Inspection and Maintenance of all Library properties 55711.72
349 19/05/2022 CAE TECHNOLOGY SERVICES LIMITED Subscription renewal 25041.31
350 19/05/2022 GLASGOW LIFE Kelvin Hall service charge Jan to Mar 22 59084.95
351 31/05/2022 ECG FACILITIES SERVICE Fit pre-purchased humidifiers 29710.8
352 31/05/2022 ECG FACILITIES SERVICE Routine inspection and maintenance May 22 55711.72
353 31/05/2022 ALDL Legal deposit materials April to July 22 27013.18
354 09/06/2022 LEE BOYD LIMITED Architectural Works 93690
355 16/06/2022 CITY OF EDINBURGH COUNCIL Rates for 33 Salisbury Place 136240
356 16/06/2022 CITY OF EDINBURGH COUNCIL Rates 57 George IV Bridge 41920
357 23/06/2022 ECG FACILITIES SERVICE Maintenance contract - June 22 55711.72
358 21/07/2022 ALDL Claiming,receipting and onward distribution of UK legal deposit materials July to Sept 22 27013.16
359 21/07/2022 RICK GEKOSKI Papers 1970's to 2019 Alisdair Gray 125000
360 28/07/2022 SONYA LEONARD Literary and personal papers of Tom Leonard 1961 to 2018 40000

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,102 @@
Date,Supplier,Description,Transaction value (£),Classification
15/08/2016,Creative Video Productions Ltd,Kelvin Hall,26866,Other
29/05/2017,John Graham Construction Ltd,Causewayside Refurbishment,74806,Building Improvement
29/05/2017,Morris & Spottiswood Ltd,George IV Bridge Work,56448,Building Improvement
31/05/2017,John Graham Construction Ltd,Causewayside Refurbishment,164691,Building Improvement
24/07/2017,John Graham Construction Ltd,Causewayside Refurbishment,27926,Building Improvement
24/07/2017,John Graham Construction Ltd,Causewayside Refurbishment,212690,Building Improvement
16/08/2017,John Graham Construction Ltd,Causewayside Refurbishment,59021,Building Improvement
16/08/2017,John Graham Construction Ltd,Causewayside Refurbishment,136379,Building Improvement
23/08/2017,Culture And Sport Glasgow,Kelvin Hall,60503,Building Improvement
23/08/2017,XMA Scotland Ltd,Kelvin Hall,31830,Building Improvement
31/08/2017,John Graham Construction Ltd,Causewayside Refurbishment,36313,Building Improvement
31/08/2017,Insight Direct (UK) Ltd,Causewayside Refurbishment,68222,Building Improvement
31/08/2017,Mark Finn Laboratory,George IV Bridge Work,53884,Building Improvement
11/09/2017,John Graham Construction Ltd,Causewayside Refurbishment,189483,Building Improvement
23/10/2017,John Graham Construction Ltd,Causewayside Refurbishment,151659,Building Improvement
23/10/2017,City Building LLP,Causewayside Refurbishment,53147,Building Improvement
07/02/2017,John Graham Construction Ltd,Causewayside Refurbishment,52404,Building Improvement
13/02/2017,John Graham Construction Ltd,Causewayside Refurbishment,272390,Building Improvement
06/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,31781,Building Improvement
06/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,198048,Building Improvement
31/03/2017,Nicholson Bros(Electrical Contractors) Ltd,Causewayside Refurbishment,33666,Building Improvement
31/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,222090,Building Improvement
31/03/2017,John Graham Construction Ltd,Causewayside Refurbishment,63971,Building Improvement
24/04/2017,Scottish Historic Buildings Trust,Lawnmarket Work,50057,Building Improvement
30/04/2017,Morris & Spottiswood Ltd,George IV Bridge Work,63716,Building Improvement
15/05/2017,John Graham Construction Ltd,Causewayside Refurbishment,245381,Building Improvement
12/09/2016,Flexiform,Kelvin Hall,42623,Building Improvement
12/09/2016,John Graham Construction Ltd,Causewayside Refurbishment,228689,Building Improvement
26/09/2016,Senator International,Kelvin Hall,35706,Building Improvement
26/09/2016,John Graham Construction Ltd,Causewayside Refurbishment,28378,Building Improvement
30/09/2016,A McGillivray,Causewayside Refurbishment,44392,Building Improvement
10/10/2016,John Graham Construction Ltd,Causewayside Refurbishment,303999,Building Improvement
31/10/2016,John Graham Construction Ltd,Causewayside Refurbishment,74245,Building Improvement
07/11/2016,CBRE,Kelvin Hall,83736,Building Improvement
14/11/2016,University Of Glasgow,Kelvin Hall,188682,Building Improvement
14/11/2016,John Graham Construction Ltd,Causewayside Refurbishment,362326,Building Improvement
12/12/2016,John Graham Construction Ltd,Causewayside Refurbishment,385310,Building Improvement
30/12/2016,John Graham Construction Ltd,Causewayside Refurbishment,253618,Building Improvement
30/12/2016,John Graham Construction Ltd,Causewayside Refurbishment,45127,Building Improvement
21/04/2016,M & J Ballantyne Ltd,George IV Bridge Work,35098,Building Improvement
09/05/2016,John Graham Construction Ltd,Causewayside Refurbishment,64361,Building Improvement
09/05/2016,A McGillivray,Causewayside Refurbishment,53690,Building Improvement
16/05/2016,John Graham Construction Ltd,Causewayside Refurbishment,365344,Building Improvement
10/06/2016,Wavetek Ltd,Kelvin Hall,87589,Building Improvement
10/06/2016,John Graham Construction Ltd,Causewayside Refurbishment,381803,Building Improvement
30/06/2016,Glasgow City Council,Kelvin Hall,1700000,Building Improvement
11/07/2016,Wavetek Ltd,Kelvin Hall,65692,Building Improvement
11/07/2016,John Graham Construction Ltd,Causewayside Refurbishment,139845,Building Improvement
25/07/2016,A McGillivray,Causewayside Refurbishment,30113,Building Improvement
15/08/2016,John Graham Construction Ltd,Causewayside Refurbishment,196807,Building Improvement
06/11/2017,John Graham Construction Ltd,Causewayside Refurbishment,134208,Building Improvement
31/03/2017,NLS Foundation,Grant Payment,177500,Other
09/10/2017,Frost And Sullivan Ltd,Literary & Archival Items,28125,Literature & Archive
09/10/2017,JISC Services Ltd ,Literary & Archival Items,43481,Literature & Archive
27/02/2017,Cengage Learning (Emea )Ltd,Literary & Archival Items,43302,Literature & Archive
06/03/2017,Private Sale,Literary & Archival Items,72500,Literature & Archive
31/03/2017,Private Sale,Literary & Archival Items,3422500,Literature & Archive
24/04/2017,Cengage Learning (Emea )Ltd,Literary & Archival Items,43302,Literature & Archive
22/05/2017,ALDL,Legal Deposit Services,27067,Literature & Archive
19/09/2016,Jisc Services Ltd Subscription Account,Literary & Archival Items,42629,Literature & Archive
10/10/2016,Cengage Learning (Emea )Ltd,Literary & Archival Items,86604,Literature & Archive
24/10/2016,ALDL,ALDL Charges,32317,Literature & Archive
26/04/2016,Private Sale,Literary & Archival Items,30000,Literature & Archive
30/05/2016,ALDL,ALDL Charges,32317,Literature & Archive
15/07/2016,Sotheby'S,Literary & Archival Items,28500,Literature & Archive
18/07/2016,Christies,Literary & Archival Items,33800,Literature & Archive
31/07/2016,ALDL,ALDL Charges,32317,Literature & Archive
08/12/2016,Sothebys,Literary & Archival Items,166000,Literature & Archive
08/12/2016,Private Sale,Literary & Archival Items,87500,Literature & Archive
26/06/2017,ECG Facilities Service,Facilities Management Charge,33386,Utility Bills
26/06/2017,British Library,Legal Deposit Services,50056,Other
24/07/2017,ALDL,Legal Deposit Services,27067,Other
16/08/2017,ECG Facilities Service,Facilities Management Charge,33386,Utility Bills
23/08/2017,ECG Facilities Service,Facilities Management Charge,33386,Utility Bills
07/02/2017,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
27/02/2017,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
27/03/2017,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
22/05/2017,ECG Facilities Service,Facilities Management Charge,33386,Utility Bills
26/09/2016,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
24/10/2016,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
08/12/2016,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
30/12/2016,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
23/05/2016,ECG Facilities Service,Facilities Management Charge,32777,Utility Bills
23/05/2016,ECG Facilities Service,Facilities Management Charge,32777,Utility Bills
28/06/2016,ECG Facilities Service,Facilities Management Charge,32832,Utility Bills
08/08/2016,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
24/08/2016,ECG Facilities Service,Facilities Management Charge,32795,Utility Bills
30/10/2017,ECG Facilities Service,Facilities Management Charge,35758,Utility Bills
16/08/2017,Ex Libris,IT equipment,76610,Software/IT
31/03/2017,XMA Scotland Ltd,IT equipment,33450,Software/IT
31/03/2017,XMA Scotland Ltd,IT equipment,84524,Software/IT
24/04/2017,Insight Direct (UK) Ltd,IT equipment,56768,Software/IT
09/05/2016,Computacenter Uk,Kelvin Hall,72835,Software/IT
23/05/2016,Computacenter Uk,Kelvin Hall,26506,Software/IT
15/09/2017,City Of Edinburgh Council,Non Domestic Rates ,57662,Utility Bills
15/09/2017,City Of Edinburgh Council,Non Domestic Rates ,142680,Utility Bills
08/05/2017,Anglian Water Business,Water,26832,Utility Bills
30/04/2016,City Of Edinburgh Council,Non Domestic Rates ,40800,Utility Bills
12/09/2016,City Of Edinburgh Council,Non Domestic Rates ,144330,Utility Bills
12/09/2016,City Of Edinburgh Council,Non Domestic Rates ,49827,Utility Bills
24/07/2017,AM Phillip,Vehicle Purchase,26604,Other
1 Date Supplier Description Transaction value (£) Classification
2 15/08/2016 Creative Video Productions Ltd Kelvin Hall 26866 Other
3 29/05/2017 John Graham Construction Ltd Causewayside Refurbishment 74806 Building Improvement
4 29/05/2017 Morris & Spottiswood Ltd George IV Bridge Work 56448 Building Improvement
5 31/05/2017 John Graham Construction Ltd Causewayside Refurbishment 164691 Building Improvement
6 24/07/2017 John Graham Construction Ltd Causewayside Refurbishment 27926 Building Improvement
7 24/07/2017 John Graham Construction Ltd Causewayside Refurbishment 212690 Building Improvement
8 16/08/2017 John Graham Construction Ltd Causewayside Refurbishment 59021 Building Improvement
9 16/08/2017 John Graham Construction Ltd Causewayside Refurbishment 136379 Building Improvement
10 23/08/2017 Culture And Sport Glasgow Kelvin Hall 60503 Building Improvement
11 23/08/2017 XMA Scotland Ltd Kelvin Hall 31830 Building Improvement
12 31/08/2017 John Graham Construction Ltd Causewayside Refurbishment 36313 Building Improvement
13 31/08/2017 Insight Direct (UK) Ltd Causewayside Refurbishment 68222 Building Improvement
14 31/08/2017 Mark Finn Laboratory George IV Bridge Work 53884 Building Improvement
15 11/09/2017 John Graham Construction Ltd Causewayside Refurbishment 189483 Building Improvement
16 23/10/2017 John Graham Construction Ltd Causewayside Refurbishment 151659 Building Improvement
17 23/10/2017 City Building LLP Causewayside Refurbishment 53147 Building Improvement
18 07/02/2017 John Graham Construction Ltd Causewayside Refurbishment 52404 Building Improvement
19 13/02/2017 John Graham Construction Ltd Causewayside Refurbishment 272390 Building Improvement
20 06/03/2017 John Graham Construction Ltd Causewayside Refurbishment 31781 Building Improvement
21 06/03/2017 John Graham Construction Ltd Causewayside Refurbishment 198048 Building Improvement
22 31/03/2017 Nicholson Bros(Electrical Contractors) Ltd Causewayside Refurbishment 33666 Building Improvement
23 31/03/2017 John Graham Construction Ltd Causewayside Refurbishment 222090 Building Improvement
24 31/03/2017 John Graham Construction Ltd Causewayside Refurbishment 63971 Building Improvement
25 24/04/2017 Scottish Historic Buildings Trust Lawnmarket Work 50057 Building Improvement
26 30/04/2017 Morris & Spottiswood Ltd George IV Bridge Work 63716 Building Improvement
27 15/05/2017 John Graham Construction Ltd Causewayside Refurbishment 245381 Building Improvement
28 12/09/2016 Flexiform Kelvin Hall 42623 Building Improvement
29 12/09/2016 John Graham Construction Ltd Causewayside Refurbishment 228689 Building Improvement
30 26/09/2016 Senator International Kelvin Hall 35706 Building Improvement
31 26/09/2016 John Graham Construction Ltd Causewayside Refurbishment 28378 Building Improvement
32 30/09/2016 A McGillivray Causewayside Refurbishment 44392 Building Improvement
33 10/10/2016 John Graham Construction Ltd Causewayside Refurbishment 303999 Building Improvement
34 31/10/2016 John Graham Construction Ltd Causewayside Refurbishment 74245 Building Improvement
35 07/11/2016 CBRE Kelvin Hall 83736 Building Improvement
36 14/11/2016 University Of Glasgow Kelvin Hall 188682 Building Improvement
37 14/11/2016 John Graham Construction Ltd Causewayside Refurbishment 362326 Building Improvement
38 12/12/2016 John Graham Construction Ltd Causewayside Refurbishment 385310 Building Improvement
39 30/12/2016 John Graham Construction Ltd Causewayside Refurbishment 253618 Building Improvement
40 30/12/2016 John Graham Construction Ltd Causewayside Refurbishment 45127 Building Improvement
41 21/04/2016 M & J Ballantyne Ltd George IV Bridge Work 35098 Building Improvement
42 09/05/2016 John Graham Construction Ltd Causewayside Refurbishment 64361 Building Improvement
43 09/05/2016 A McGillivray Causewayside Refurbishment 53690 Building Improvement
44 16/05/2016 John Graham Construction Ltd Causewayside Refurbishment 365344 Building Improvement
45 10/06/2016 Wavetek Ltd Kelvin Hall 87589 Building Improvement
46 10/06/2016 John Graham Construction Ltd Causewayside Refurbishment 381803 Building Improvement
47 30/06/2016 Glasgow City Council Kelvin Hall 1700000 Building Improvement
48 11/07/2016 Wavetek Ltd Kelvin Hall 65692 Building Improvement
49 11/07/2016 John Graham Construction Ltd Causewayside Refurbishment 139845 Building Improvement
50 25/07/2016 A McGillivray Causewayside Refurbishment 30113 Building Improvement
51 15/08/2016 John Graham Construction Ltd Causewayside Refurbishment 196807 Building Improvement
52 06/11/2017 John Graham Construction Ltd Causewayside Refurbishment 134208 Building Improvement
53 31/03/2017 NLS Foundation Grant Payment 177500 Other
54 09/10/2017 Frost And Sullivan Ltd Literary & Archival Items 28125 Literature & Archive
55 09/10/2017 JISC Services Ltd Literary & Archival Items 43481 Literature & Archive
56 27/02/2017 Cengage Learning (Emea )Ltd Literary & Archival Items 43302 Literature & Archive
57 06/03/2017 Private Sale Literary & Archival Items 72500 Literature & Archive
58 31/03/2017 Private Sale Literary & Archival Items 3422500 Literature & Archive
59 24/04/2017 Cengage Learning (Emea )Ltd Literary & Archival Items 43302 Literature & Archive
60 22/05/2017 ALDL Legal Deposit Services 27067 Literature & Archive
61 19/09/2016 Jisc Services Ltd Subscription Account Literary & Archival Items 42629 Literature & Archive
62 10/10/2016 Cengage Learning (Emea )Ltd Literary & Archival Items 86604 Literature & Archive
63 24/10/2016 ALDL ALDL Charges 32317 Literature & Archive
64 26/04/2016 Private Sale Literary & Archival Items 30000 Literature & Archive
65 30/05/2016 ALDL ALDL Charges 32317 Literature & Archive
66 15/07/2016 Sotheby'S Literary & Archival Items 28500 Literature & Archive
67 18/07/2016 Christies Literary & Archival Items 33800 Literature & Archive
68 31/07/2016 ALDL ALDL Charges 32317 Literature & Archive
69 08/12/2016 Sothebys Literary & Archival Items 166000 Literature & Archive
70 08/12/2016 Private Sale Literary & Archival Items 87500 Literature & Archive
71 26/06/2017 ECG Facilities Service Facilities Management Charge 33386 Utility Bills
72 26/06/2017 British Library Legal Deposit Services 50056 Other
73 24/07/2017 ALDL Legal Deposit Services 27067 Other
74 16/08/2017 ECG Facilities Service Facilities Management Charge 33386 Utility Bills
75 23/08/2017 ECG Facilities Service Facilities Management Charge 33386 Utility Bills
76 07/02/2017 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
77 27/02/2017 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
78 27/03/2017 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
79 22/05/2017 ECG Facilities Service Facilities Management Charge 33386 Utility Bills
80 26/09/2016 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
81 24/10/2016 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
82 08/12/2016 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
83 30/12/2016 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
84 23/05/2016 ECG Facilities Service Facilities Management Charge 32777 Utility Bills
85 23/05/2016 ECG Facilities Service Facilities Management Charge 32777 Utility Bills
86 28/06/2016 ECG Facilities Service Facilities Management Charge 32832 Utility Bills
87 08/08/2016 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
88 24/08/2016 ECG Facilities Service Facilities Management Charge 32795 Utility Bills
89 30/10/2017 ECG Facilities Service Facilities Management Charge 35758 Utility Bills
90 16/08/2017 Ex Libris IT equipment 76610 Software/IT
91 31/03/2017 XMA Scotland Ltd IT equipment 33450 Software/IT
92 31/03/2017 XMA Scotland Ltd IT equipment 84524 Software/IT
93 24/04/2017 Insight Direct (UK) Ltd IT equipment 56768 Software/IT
94 09/05/2016 Computacenter Uk Kelvin Hall 72835 Software/IT
95 23/05/2016 Computacenter Uk Kelvin Hall 26506 Software/IT
96 15/09/2017 City Of Edinburgh Council Non Domestic Rates 57662 Utility Bills
97 15/09/2017 City Of Edinburgh Council Non Domestic Rates 142680 Utility Bills
98 08/05/2017 Anglian Water Business Water 26832 Utility Bills
99 30/04/2016 City Of Edinburgh Council Non Domestic Rates 40800 Utility Bills
100 12/09/2016 City Of Edinburgh Council Non Domestic Rates 144330 Utility Bills
101 12/09/2016 City Of Edinburgh Council Non Domestic Rates 49827 Utility Bills
102 24/07/2017 AM Phillip Vehicle Purchase 26604 Other

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,9 @@
"""
Note: To answer questions based on text documents, we recommend the procedure in
[Question Answering using Embeddings](https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb).
Some of the code below may rely on [deprecated API endpoints](https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints).
"""
import argparse
import openai

View File

@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color:orange; font-weight:bold\">Note: To answer questions based on text documents, we recommend the procedure in <a href=\"https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb\">Question Answering using Embeddings</a>. Some of the code below may rely on <a href=\"https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints\">deprecated API endpoints</a>.</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -164,7 +171,7 @@
" discard_categories: Set[str] = discard_categories,\n",
") -> str:\n",
" \"\"\"\n",
" Extract the sections of a Wikipedia page, discarding the the references and other low information sections\n",
" Extract the sections of a Wikipedia page, discarding the references and other low information sections\n",
" \"\"\"\n",
" if len(wiki_text) == 0:\n",
" return []\n",
@ -487,11 +494,9 @@
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"display_name": "Python 3.9.9 64-bit ('3.9.9')",
"language": "python",
"name": "python3"
},
"language_info": {
@ -504,9 +509,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.9"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "cb9817b186a29e4e9713184d901f26c1ee05ad25243d878baff7f31bb1fef480"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color:orange; font-weight:bold\">Note: To answer questions based on text documents, we recommend the procedure in <a href=\"https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb\">Question Answering using Embeddings</a>. Some of the code below may rely on <a href=\"https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints\">deprecated API endpoints</a>.</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -142,7 +149,7 @@
"\n",
"Note: We have used temperature=0, but it may be beneficial to experiment with a higher temperature to get a higher diversity of questions.\n",
"\n",
"<span style=\"color:orange\">**WARNING: This step will last a long time, and consume a lot of tokens, as it calls davinci-instruct for every section to generate a number of questions.**</span>"
"<span style=\"color:orange; font-weight:bold\">WARNING: This step will last a long time, and consume a lot of tokens, as it calls davinci-instruct for every section to generate a number of questions.</span>"
]
},
{
@ -298,8 +305,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.5 Search file\n",
"We create a search file ([API reference](https://beta.openai.com/docs/api-reference/files/list)), which can be used to retrieve the relevant context when a question is asked.\n"
"## 2.5 Search file (DEPRECATED)\n",
"We create a search file ([API reference](https://beta.openai.com/docs/api-reference/files/list)), which can be used to retrieve the relevant context when a question is asked.\n",
"\n",
"<span style=\"color:orange; font-weight:bold\">DEPRECATED: The /search endpoint is deprecated in favour of using embeddings. Embeddings are cheaper, faster and can support a better search experience. See <a href=\"https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb\">Question Answering Guide</a> for a search implementation using the embeddings</span>\n"
]
},
{
@ -725,11 +734,9 @@
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"display_name": "Python 3.9.9 64-bit ('3.9.9')",
"language": "python",
"name": "python3"
},
"language_info": {
@ -742,9 +749,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.9"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "cb9817b186a29e4e9713184d901f26c1ee05ad25243d878baff7f31bb1fef480"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color:orange; font-weight:bold\">Note: To answer questions based on text documents, we recommend the procedure in <a href=\"https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb\">Question Answering using Embeddings</a>. Some of the code below may rely on <a href=\"https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints\">deprecated API endpoints</a>.</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -611,11 +618,9 @@
}
],
"metadata": {
"interpreter": {
"hash": "be4b5d5b73a21c599de40d6deb1129796d12dc1cc33a738f7bac13269cfcafe8"
},
"kernelspec": {
"display_name": "Python 3.7.3 64-bit ('base': conda)",
"display_name": "Python 3.9.9 64-bit ('3.9.9')",
"language": "python",
"name": "python3"
},
"language_info": {
@ -628,9 +633,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.9"
},
"orig_nbformat": 4
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "cb9817b186a29e4e9713184d901f26c1ee05ad25243d878baff7f31bb1fef480"
}
}
},
"nbformat": 4,
"nbformat_minor": 2

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,20 @@
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:1.14.0
restart: on-failure:0
ports:
- "8080:8080"
environment:
QUERY_DEFAULTS_LIMIT: 20
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: "./data"
DEFAULT_VECTORIZER_MODULE: text2vec-transformers
ENABLE_MODULES: text2vec-transformers
TRANSFORMERS_INFERENCE_API: http://t2v-transformers:8080
CLUSTER_HOSTNAME: 'node1'
t2v-transformers:
image: semitechnologies/transformers-inference:sentence-transformers-msmarco-distilroberta-base-v2
environment:
ENABLE_CUDA: 0 # set to 1 to enable
# NVIDIA_VISIBLE_DEVICES: all # enable if running with CUDA

Binary file not shown.

After

Width:  |  Height:  |  Size: 799 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 301 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 355 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 332 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 160 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 303 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

BIN
images/lm_cascades_fig1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

BIN
images/lm_cascades_fig3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
images/lm_cascades_fig4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
images/lm_cascades_fig5.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

BIN
images/lm_cascades_fig6.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

BIN
images/maieutic_fig2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

BIN
images/maieutic_fig6.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

BIN
images/maieutic_tab1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 175 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 855 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 487 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 440 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 692 KiB

BIN
images/star_fig1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 340 KiB

BIN
images/star_tab1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 273 KiB

BIN
images/verifiers_fig3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 216 KiB

BIN
images/verifiers_fig5.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 281 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 362 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 323 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 305 KiB

View File

@ -0,0 +1,574 @@
# Techniques to improve reliability
When GPT-3 fails on a task, what should you do?
- Search for a better prompt that elicits more reliable answers?
- Invest in thousands of examples to fine-tune a custom model?
- Assume the model is incapable of the task, and move on?
There is no simple answer - it depends. However, if your task involves logical reasoning or complexity, consider trying the techniques in this article to build more reliable, high-performing prompts.
## Why GPT-3 fails on complex tasks
If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit multiplication. With a few seconds, and some pen and paper, it's not too taxing to work out that 13 x 17 = 130 + 70 + 21 = 221.
Similarly, if you give GPT-3 a task that's too complex to do in the time it takes to calculate its next token, it may confabulate an incorrect guess. Yet, akin to humans, that doesn't necessarily mean the model is incapable of the task. With some time and space to reason things out, the model still may be able to answer reliably.
As an example, if you ask `text-davinci-002` the following math problem about juggling balls, it answers incorrectly:
```text-davinci-002
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?
A:
```
```text-davinci-002
There are 8 blue golf balls.
```
Does this mean that GPT-3 cannot do simple math problems? No; in fact, it turns out that by prompting the model with `Let's think step by step`, the model solves the problem reliably:
```text-davinci-002
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?
A: Let's think step by step.
```
```text-davinci-002
There are 16 balls in total.
Half of the balls are golf balls.
That means that there are 8 golf balls.
Half of the golf balls are blue.
That means that there are 4 blue golf balls.
```
Of course, it's hard to tell from only a single example whether this `Let's think step by step` trick actually works in general or just got lucky on this particular problem. But it really does work. On a benchmark of word math problems, the `Let's think step by step` trick raised GPT-3's solve rate massively, from a worthless 18% to a decent 79%!
## Model capabilities depend on context
When learning to work with GPT-3, one common conceptual mistake is to believe that its capabilities are fixed across all contexts. E.g., if GPT-3 gets a simple logic question wrong, then it must be incapable of simple logic.
But as the `Let's think step by step` example illustrates, apparent failures of GPT-3 can sometimes be remedied with a better prompt that helps the model steer itself toward the correct output.
## How to improve reliability on complex tasks
The rest of this article shares techniques for improving reliability of large language models on complex tasks. Although some of the techniques are specific to certain types of problems, many of them are built upon general principles that can be applied to a wide range of tasks, e.g.:
- Give clearer instructions
- Split complex tasks into simpler subtasks
- Structure the instruction to keep the model on task
- Prompt the model to explain before answering
- Ask for justifications of many possible answers, and then synthesize
- Generate many outputs, and then use the model to pick the best one
- Fine-tune custom models to maximize performance
## Split complex tasks into simpler tasks
One way to give a model more time and space to think is to break tasks into simpler pieces.
As an example, consider a task where we ask the model a multiple-choice question about some text - in this case, a game of Clue. When asked directly, `text-davinci-002` isn't able to put clues 3 & 5 together, and answers incorrectly:
```text-davinci-002
Use the following clues to answer the following multiple-choice question.
Clues:
1. Miss Scarlett was the only person in the lounge.
2. The person with the pipe was in the kitchen.
3. Colonel Mustard was the only person in the observatory.
4. Professor Plum was not in the library nor the billiard room.
5. The person with the candlestick was in the observatory.
Question: Was Colonel Mustard in the observatory with the candlestick?
(a) Yes; Colonel Mustard was in the observatory with the candlestick
(b) No; Colonel Mustard was not in the observatory with the candlestick
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
Solution:
```
```text-davinci-002
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
```
Although clues 3 and 5 establish that Colonel Mustard was the only person in the observatory and that the person in the observatory had the candlestick, the models fails to combine them into a correct answer of (a) Yes.
However, instead of asking for the answer directly, we can split the task into three pieces:
- First, go through the clues one by one and and consider whether the clue is potentially relevant
- Second, combine the relevant clues to reason out the answer to the question
- Third, write the final answer: either (a), (b), or (c)
```text-davinci-002
Use the following clues to answer the following multiple-choice question, using the following procedure:
(1) First, go through the clues one by one and and consider whether the clue is potentially relevant
(2) Second, combine the relevant clues to reason out the answer to the question
(3) Third, map the answer to one of the multiple choice answers: either (a), (b), or (c)
Clues:
1. Miss Scarlett was the only person in the lounge.
2. The person with the pipe was in the kitchen.
3. Colonel Mustard was the only person in the observatory.
4. Professor Plum was not in the library nor the billiard room.
5. The person with the candlestick was in the observatory.
Question: Was Colonel Mustard in the observatory with the candlestick?
(a) Yes; Colonel Mustard was in the observatory with the candlestick
(b) No; Colonel Mustard was not in the observatory with the candlestick
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
Solution:
(1) First, go through the clues one by one and and consider whether the clue is potentially relevant:
```
```text-davinci-002
Clue 1: Miss Scarlett was the only person in the lounge. This clue is not relevant.
Clue 2: The person with the pipe was in the kitchen. This clue is not relevant.
Clue 3: Colonel Mustard was the only person in the observatory. This clue is relevant.
Clue 4: Professor Plum was not in the library nor the billiard room. This clue is not relevant.
Clue 5: The person with the candlestick was in the observatory. This clue is relevant.
(2) Second, combine the relevant clues to reason out the answer to the question:
Clue 3 and Clue 5 both say that the person with the candlestick was in the observatory. Colonel Mustard was the only person in the observatory, so Colonel Mustard was the person with the candlestick.
(3) Third, write the final answer:
(a) Yes; Colonel Mustard was in the observatory with the candlestick
```
By giving the model more time and space to think, and guiding it along a reasoning plan, it's able to figure out the correct answer of (a) Yes.
Another benefit of splitting complex instructions into smaller subtasks is that it can help keep the model focused on each subtask.
For example, if we ask `text-davinci-002` to summarize a text in its original language, the model can lapse back into English:
```text-davinci-002
Summarize the text using the original language of the text. The summary should be one sentence long.
Text:
"""
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano statista, "hombre de Estado") es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad. La estadística es una ciencia formal deductiva, con un conocimiento propio, dinámico y en continuo desarrollo obtenido a través del método científico formal. En ocasiones, las ciencias fácticas necesitan utilizar técnicas estadísticas durante su proceso de investigación factual, con el fin de obtener nuevos conocimientos basados en la experimentación y en la observación. En estos casos, la aplicación de la estadística permite el análisis de datos provenientes de una muestra representativa, que busca explicar las correlaciones y dependencias de un fenómeno físico o natural, de ocurrencia en forma aleatoria o condicional.
"""
Summary:
```
```text-davinci-002
The text explains that statistics is a science that studies the variability, collection, organization, analysis, interpretation, and presentation of data, as well as the random process that generates them following the laws of probability.
```
However, if we first ask the model to identify the language of the text, and then summarize the text, it becomes more reliable:
```text-davinci-002
First, identify the language of the text. Second, summarize the text using the original language of the text. The summary should be one sentence long.
Text:
"""
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano statista, "hombre de Estado") es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad. La estadística es una ciencia formal deductiva, con un conocimiento propio, dinámico y en continuo desarrollo obtenido a través del método científico formal. En ocasiones, las ciencias fácticas necesitan utilizar técnicas estadísticas durante su proceso de investigación factual, con el fin de obtener nuevos conocimientos basados en la experimentación y en la observación. En estos casos, la aplicación de la estadística permite el análisis de datos provenientes de una muestra representativa, que busca explicar las correlaciones y dependencias de un fenómeno físico o natural, de ocurrencia en forma aleatoria o condicional.
"""
Language:
```
```text-davinci-002
Spanish
La estadística es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad.
```
## Prompt the model to explain before answering
Another powerful technique for improving the reliability of answers is to prompt the model to gradually reason out the answer rather than jumping immediately to the final answer. By 'thinking aloud' the model can be far more likely to arrive at the correct answer.
### Zero-shot
#### Method
Published by [Takeshi Kojima et al. in 2022](https://arxiv.org/abs/2205.11916), the easiest way to prompt a model to reason out the answer is to simply prepend answers with `Let's think step by step.` Figure 2 illustrates an example:
[![zero-shot reasoning example](images/zero-shot_reasoners_fig2.png)
<br>Source: *Large Language Models are Zero-Shot Reasoners* by Takeshi Kojima et al. (2022).](https://arxiv.org/abs/2205.11916)
#### Results
Applying this simple trick to the MultiArith math dataset, the authors found `Let's think step by step` quadrupled the accuracy, from 18% to 79%!
[![zero-shot reasoning example](images/zero-shot_reasoners_tab5.png)
<br>Source: *Large Language Models are Zero-Shot Reasoners* by Takeshi Kojima et al. (2022).](https://arxiv.org/abs/2205.11916)
#### Implications
Although the `Let's think step by step` trick works well on math problems, it's not effective on all tasks. The authors found that it was most helpful for multi-step arithmetic problems, symbolic reasoning problems, strategy problems, and other reasoning problems. It didn't help with simple math problems or common sense questions, and presumably wouldn't help with many other non-reasoning tasks either.
[![zero-shot reasoning example](images/zero-shot_reasoners_tab1.png)
<br>Source: *Large Language Models are Zero-Shot Reasoners* by Takeshi Kojima et al. (2022).](https://arxiv.org/abs/2205.11916)
To learn more, read the [full paper](https://arxiv.org/abs/2205.11916).
If you apply this technique to your own tasks, don't be afraid to experiment with customizing the instruction. `Let's think step by step` is rather generic, so you may find better performance with instructions that hew to a stricter format customized to your use case. For example, if you were you can try more structured variants like `First, think step by step about why X might be true. Second, think step by step about why Y might be true. Third, think step by step about whether X or Y makes more sense.`. And you can even give the model an example format to help keep it on track, e.g.:
```text-davinci-002
Using the IRS guidance below, answer the following questions using this format:
(1) For each criterion, determine whether it is met by the vehicle purchase
- {Criterion} Let's think step by step. {explanation} {yes or no, or if the question does not apply then N/A}.
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the answer is likely {yes or no}."
IRS guidance:
"""
You may be eligible for a federal tax credit under Section 30D if you purchased a car or truck that meets the following criteria:
- Does the vehicle have at least four wheels?
- Does the vehicle weigh less than 14,000 pounds?
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged from an external source?
- Was the vehicle purchased in a year before 2022?
- If so, has the manufacturer sold less than 200,000 qualifying vehicles? (Tesla and GM have sold more than 200,000 qualifying vehicles.)
- Was the vehicle purchased in a year after 2022?
- If so, is the vehicle present in the following list of North American-assembled vehicles? (The only electric vehicles assembled in North America are the Audi Q5, BMW 330e, BMW X5, Chevrolet Bolt EUV, Chevrolet Bolt EV, Chrysler Pacifica PHEV, Ford Escape PHEV, Ford F Series, Ford Mustang MACH E, Ford Transit Van, GMC Hummer Pickup, GMC Hummer SUV, Jeep Grand Cherokee PHEV, Jeep Wrangler PHEV, Lincoln Aviator PHEV, Lincoln Corsair Plug-in, Lucid Air, Nissan Leaf, Rivian EDV, Rivian R1S, Rivian R1T, Tesla Model 3, Tesla Model S, Tesla Model X, Tesla Model Y, Volvo S60, BMW 330e, Bolt EV, Cadillac Lyriq, Mercedes EQS SUV, and Nissan Leaf.)
"""
Question: Can I claim a federal tax credit for my Toyota Prius Prime bought in 2021?
Solution:
(1) For each criterion, determine whether it is met by the vehicle purchase
- Does the vehicle have at least four wheels? Let's think step by step.
```
```text-davinci-002
The Toyota Prius Prime has four wheels, so the answer is yes.
- Does the vehicle weigh less than 14,000 pounds? Let's think step by step. The Toyota Prius Prime weighs less than 14,000 pounds, so the answer is yes.
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged from an external source? Let's think step by step. The Toyota Prius Prime has a battery with at least 4 kilowatt hours that may be recharged from an external source, so the answer is yes.
- Was the vehicle purchased in a year before 2022? Let's think step by step. The Toyota Prius Prime was purchased in 2021, which is before 2022, so the answer is yes.
- Was the vehicle purchased in a year after 2022? N/A
- If so, is the vehicle present in the following list of North American-assembled vehicles? N/A
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the answer is likely {yes or no}."
Because the Toyota Prius Prime meets all of the criteria for a federal tax credit, the answer is likely yes.
```
### Few-shot examples
#### Method
Prompting the model to reason out its answers can be done in many ways. One way is to demonstrate with a few examples ('few-shot'), as studied by [Jason Wei and Denny Zhou et al. from Google](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html). Here's an example few-shot chain-of-thought prompt:
[![chain of thought example](images/chain_of_thought_fig1.png)
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
More demonstrations of reasoning chains written by human labelers:
[![chain of thought example](images/chain_of_thought_fig3.png)
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
[(Note that it has been called into question whether pears actually float)](https://twitter.com/Meaningness/status/1561062170074370048?s=20&t=mpHt8f3RRboztXxdhLFnWQ)
#### Results
Testing on grade school math problems, the authors found that chain of thought prompting tripled the solve rate, from 18% to 57%.
[![chain of thought example](images/chain_of_thought_fig5.png)
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
In addition to math problems, chain of thought prompting also lifted performance on questions related to sports understanding, coin flip tracking, and last letter concatenation. In most cases, not many examples were need to saturate the performance gains (less than 8 or so).
[![chain of thought example](images/chain_of_thought_fig11.png)
<br>Source: *Chain of Thought Prompting Elicits Reasoning in Large Language Models* Jason Wei and Denny Zhou et al. (2022)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html)
To learn more, read the [full paper](https://arxiv.org/abs/2201.11903).
#### Implications
One advantage of the few-shot example-based approach relative to the `Let's think step by step` technique is that you can more easily specify the format, length, and style of reasoning that you want the model to perform before landing on its final answer. This can be be particularly helpful in cases where the model isn't initially reasoning in the right way or depth.
### Fine-tuned
#### Method
In general, to eke out maximum performance on a task, you'll need to fine-tune a custom model. However, fine-tuning a model using explanations may take thousands of example explanations, which are costly to write.
In 2022, Eric Zelikman and Yuhuai Wu et al. published a clever procedure for using a few-shot prompt to generate a dataset of explanations that could be used to fine-tune a model. The idea is to use a few-shot prompt to generate candidate explanations, and only keep the explanations that produce the correct answer. Then, to get additional explanations for some of the incorrect answers, retry the the few-shot prompt but with correct answers given as part of the question. The authors called their procedure STaR (Self-taught Reasoner):
[![STaR procedure](images/star_fig1.png)
<br>Source: *STaR: Bootstrapping Reasoning With Reasoning* by Eric Zelikman and Yujuai Wu et al. (2022)](https://arxiv.org/abs/2203.14465)
With this technique, you can combine the benefits of fine-tuning with the benefits of chain-of-thought prompting without needing to write thousands of example explanations.
#### Results
When the authors applied this technique to a Common Sense Q&A dataset, they found that STaR outperformed both chain-of-thought prompting alone (73% > 37%) and fine-tuning alone (73% > 60%):
[![STaR results](images/star_tab1.png)
<br>Source: *STaR: Bootstrapping Reasoning With Reasoning* by Eric Zelikman and Yujuai Wu et al. (2022)](https://arxiv.org/abs/2203.14465)
To learn more, read the [full paper](https://arxiv.org/abs/2203.14465).
#### Implications
Using a few-shot prompt to extend or modify a fine-tuning dataset is an idea that can be generalized beyond explanation writing. For example, if you have large quantities of unstructured text that you want to train on, you may find opportunities to use a prompt to extract a structured dataset from your unstructured text, and then fine-tune a custom model on that structured dataset.
## Extensions to chain-of-thought prompting
A number of extensions of chain-of-thought prompting have been published as well.
### Selection-inference prompting
#### Method
Published by Antonia Creswell et al., one extension of the chain-of-thought technique is to split the single prompt for generating explanations and answers into smaller parts. First, a prompt selects a relevant subset of facts from the text ('selection prompt'). Then, a second prompt infers a conclusion from the selected facts ('inference prompt'). These prompts are then alternated in a loop to generate multiple steps of reasoning and eventually land on a final answer. The authors illustrate the idea in the following figure:
[![Selection-inference prompting](images/selection-inference_fig1.png)
<br>Source: *Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2205.09712)
#### Results
When applied to a 7B-parameter model, the authors found that selection-inference prompting substantially improved performance relative to chain-of-thought prompting on the bAbi and Proof Writer benchmark tasks (both of which require longer sequences of reasoning steps). The best performance they achieved combined both selection-inference prompting with fine-tuning.
[![Selection-inference prompting](images/selection-inference_fig4.png)
<br>Source: *Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2205.09712)
#### Implications
Although the gains on these benchmarks were large, these benchmarks were specifically chosen because they required longer sequences of reasoning. On problems that don't require reasoning with many steps, the gains are likely smaller.
The results highlight a couple of general lessons for working with large language models. One, splitting up complex tasks into smaller tasks is a great way to improve reliability and performance; the more atomic the task, the less room there is for the model to err. Two, getting maximum performance often means combining fine-tuning with whatever approach you've chosen.
To learn more, read the [full paper](https://arxiv.org/abs/2205.09712).
### Faithful reasoning architecture
A few months after publishing the selection-inference prompting technique, the authors extended the technique in a follow-up paper, with ideas for:
- figuring out when the selection-inference cycle should stop or continue
- adding a value function to help search over multiple reasoning paths
- reducing hallucination of fake facts by fine-tuning a model to reason about sentence labels (e.g., sen1) rather than writing out the sentences themselves
#### Method
In the original selection-inference technique, specialized 'selection' and 'inference' prompts are alternated to select facts and make inferences from those facts, combining to generate a sequence of reasoning steps.
The authors extend this technique with two additional components.
First, the authors add a 'halter' model that, after each inference step, is asked whether the inferences thus far are sufficient to answer the question. If yes, then the model generates a final answer.
The halter models brings a couple of advantages:
- it can tell the selection-inference process to stop or keep going, as necessary.
- if the process never halts, you'll get no answer, which is often preferrable to a hallucinated guess
[![Faithful reasoning](images/faithful-reasoning_fig3.png)
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
[![Faithful reasoning](images/faithful-reasoning_fig5.png)
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
Second, the authors add a value function, which is used to assess the quality of reasoning steps and search over multiple reasoning trajectories. This echoes a common theme for increasing reliability; instead of generating a single answer from the model, generate a set of answers and then use some type of value function / discriminator / verifier model to pick the best one.
[![Faithful reasoning](images/faithful-reasoning_fig7.png)
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
In addition to these two extensions, the authors also use a trick to reduce hallucination of fake facts. Rather than asking the model to write out factual sentences, they fine-tune a model to work with sentence labels (e.g., sen1) instead. This helps prevent the model from hallucinating fake facts not mentioned in the prompt context.
[![Faithful reasoning](images/faithful-reasoning_fig4.png)
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
#### Results
The authors evaluated their technique on two benchmarks: the ProofWriter task (not shown) and [EntailmentBankQA](https://allenai.org/data/entailmentbank) (shown). The technique increased accuracy substantially, especially on harder reasoning problems.
![Faithful reasoning](images/faithful-reasoning_tab2.png)
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
In addition, their sentence label manipulation trick essentially eliminated hallucination!
![Faithful reasoning](images/faithful-reasoning_tab5.png)
<br>Source: *Faithful Reasoning Using Large Language Models* by Antonia Creswell et al. (2022)](https://arxiv.org/abs/2208.14271)
#### Implications
This paper illustrates a number of helpful lessons for improving the reliability of large language models:
- Split complex tasks into smaller, more reliable subtasks
- Generate your answer in a step-by-step fashion, evaluating it along the way
- Generate many possible answers and use another model or function to pick the ones that look best
- Reduce hallucination by constraining what the model can say (e.g., by using sentence labels instead of sentences)
- Maximize performance of models by fine-tuning them on specialized tasks
To learn more, read the [full paper](https://arxiv.org/abs/2205.09712).
### Least-to-most prompting
In addition to doing poorly on long reasoning chains (where selection-inference shines), chain-of-thought prompting can especially struggle when the examples are short but the task is long.
#### Method
Least-to-most prompting is another technique that splits up reasoning tasks into smaller, more reliable subtasks. The idea is to elicit a subtask from the model by prompting it with something like `To solve {question}, we need to first solve: "`. Then, with that subtask in hand, the model can generate a solution. The solution is appended to the original question and the process is repeated until a final answer is produced.
[![Least-to-most prompting](images/least-to-most_fig1.png)
<br>Source: *Least-to-most Prompting Enables Complex Reasoning in Large Language Models* by Denny Zhou et al. (2022)](https://arxiv.org/abs/2205.10625)
#### Results
When applied to benchmarks involving long reasoning chains using `code-davinci-002` (which is optimized for code but can still understand text), the authors measured gains as large as 16% -> 99.7%!
[
![Least-to-most prompting results on last-letter-concatenation task](images/least-to-most_tab4.png)
![Least-to-most prompting results on SCAN](images/least-to-most_tab9.png)
![Least-to-most prompting results on DROP numerical reasoning](images/least-to-most_tab11.png)
<br>Source: *Least-to-most Prompting Enables Complex Reasoning in Large Language Models* by Denny Zhou et al. (2022)](https://arxiv.org/abs/2205.10625)
#### Implications
Although the above gains from least-to-most prompting are impressive, they are measured on a very narrow set of tasks that require long reasoning chains.
Still, they illustrate a common theme: increase reliability by (a) breaking complex tasks into smaller subtasks and (b) giving the model more time and space to work out the answer.
To learn more, read the [full paper](https://arxiv.org/abs/2205.10625).
## Related ideas
### Maieutic prompting
#### Method
In contrast to the previous techniques, which try to maximize the likelihood of correct answers, another approach is to use GPT-3 to generate a tree of possible explanations (both correct *and incorrect*), and then analyze their relationships to guess at which set is correct. This technique was coined maieutic prompting by [Jaehun Jung et al. in May 2022](https://arxiv.org/abs/2205.11822) (maieutic means relating to the Socratic method of asking questions to elicit ideas).
The method is complicated, and works as follows:
- First, build a maieutic tree, where each node is a statement that could be true or false:
- Start with a multiple-choice question or true/false statement (e.g. `War cannot have a tie`)
- For each possible answer to the question, use the model to generate a correponding explanation (with a prompt like `War cannot have a tie? True, because`)
- Then, prompt the model with the question and the generated explanation, and ask it to produce the answer. If reversing the explanation (with a prefix like `It is wrong to say that {explanation}`) reverses the answer, then the explanation is considered 'logically integral.'
- If an explanation is not logically integral, then repeat the above process recursively, with each explanation turned into a True or False question, and generate more explanations for each new question.
- After all of the recursive explaining is done, you end up with a tree of explanations, where each leaf on the tree has the property that reversing the explanation reverses the model's answer.
- Second, convert the tree into a graph of relations:
- For each node in the tree, calculate the model's relative belief in each node (inferred from the probability of getting an answer of `True` to given an explanation)
- For each pair of nodes in the tree, use the model to identify whether they are entailed (implied) or contradicted
- Third, find the most consistent set of beliefs and take those to be true:
- Specifically, using the strength of belief in each node and the logical relationships between them, formulate the problem as a weighted maximum satisfiability problem (MAX-SAT)
- Use a solver to the find the most self-consistent set of beliefs, and take those as true
[
![Maieutic prompting](images/maieutic_fig2.png)
![Maieutic prompting](images/maieutic_fig6.png)
<br>Source: *Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations* by Jaehun Jung et al. (2022)](https://arxiv.org/abs/2205.11822)
#### Results
[![Maieutic prompting results](images/maieutic_tab1.png)
<br>Source: *Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations* by Jaehun Jung et al. (2022)](https://arxiv.org/abs/2205.11822)
#### Implications
Beyond the complexity, one limitation of this method is that it appears to only apply to questions that can be posed as multiple-choice.
To learn more, read the [full paper](https://arxiv.org/abs/2205.11822).
## Extensions
### Self-consistency
#### Method
For tasks with a discrete set of answers, one simple way to improve reliability is to sample multiple explanations & answers from the model (using a positive temperature) and then pick the final answer that appears most often.
[![Self-consistency method](images/self-consistency_fig1.png)
<br>Source: *Self-Consistency Improves Chain of Thought Reasoning in Language Models* by Xuezhi Wang et al. (2022)](https://arxiv.org/abs/2203.11171)
#### Results
This technique lifted accuracies by anywhere from 1 to 24 percentage points on a suite of math and reasoning benchmarks. (Plotted below are results from Google's LaMDA model; using Google's larger PaLM model, the baselines were higher but the gains were a bit smaller.)
[![Self-consistency results](images/self-consistency_fig3.png)
<br>Source: *Self-Consistency Improves Chain of Thought Reasoning in Language Models* by Xuezhi Wang et al. (2022)](https://arxiv.org/abs/2203.11171)
#### Implications
Although this technique is simple to implement, it can be costly. Generating a set of 10 answers will increase your costs by 10x.
Also, as with many of these techniques, it applies only to tasks with a limited set of answers. For open-ended tasks where each answer is unique (such as writing a poem), it's not obvious what it would mean to pick the most common answer.
Lastly, this technique ought to be most beneficial when there are multiple paths or phrasings to reach an answer; if there's only one path, then the technique may not help at all. An extreme example: If the task was to generate a single token answer, then taking the most common token from 100 generations would be no different than taking the token with the highest logprobs (which you can get with a single generation at temperature=0).
### Verifiers
Another key technique for improving task performance is to train a verifier or discriminator model to evaluate the outputs of the main generative model. If the discriminator rejects the output, then you can resample the generative model until you get an acceptable output. In many cases, it's easier to judge an answer than it is to create an answer, which helps explain the power of this method.
#### Method
In 2021, OpenAI researchers applied this technique to grade school math problems, using the following procedure:
- First, they fine-tuned a model on questions and solutions
- For each problem in the training set, they generated 100 solutions
- Each of those 100 solutions was automatically labeled as either correct or incorrect, based on whether the final answer was correct
- Using those solutions, with some labeled correct and some labeled incorrect, they fine-tuned a verifier model to classify whether a question and candidate solution was correct or incorrect
- Finally, at test time, the generative model creates 100 solutions to each problem, and the one with the highest score according to the verifier model is picked as the final answer
[![Verifier method](images/verifiers_fig3.png)
<br>Source: *Training Verifiers to Solve Math Word Problems* by Karl Cobbe et al. (2021)](https://arxiv.org/abs/2110.14168)
#### Results
With a 175B GPT-3 model and 8,000 training examples, this technique substantially lifted gradeschool math accuracy from ~33% to ~55%.
[![Verifier results](images/verifiers_fig5.png)
<br>Source: *Training Verifiers to Solve Math Word Problems* by Karl Cobbe et al. (2021)](https://arxiv.org/abs/2110.14168)
#### Implications
Similar to the self-consistency technique, this method can get expensive, as generating, say, 100 solutions per task will increase your costs by roughly ~100x.
## Theories of reliability
Although the techniques above vary in their approach, they all share the goal of improving reliability on complex tasks. Mainly they do this by:
- decomposing unreliable operations into smaller, more reliable operations (e.g., selection-inference prompting)
- using multiple steps or multiple relationships to make the system's reliability greater than any individual component (e.g., maieutic prompting)
### Probabilistic graphical models
This paradigm of trying to build a reliable system out of less reliable components is reminiscent of probabilistic programming, and many of the analysis techniques of that field can be applied to this one.
In the paper *Language Model Cascades*, David Dohan et al. interpret the above techniques in the paradigm of probabilistic graphical models:
#### Chain of thought prompting
[![graphical model of chain of thought prompting](images/lm_cascades_fig1.png)
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
#### Fine-tuned chain of thought prompting / Self-taught reasoner
[![graphical model of fine-tuned chain of thought prompting](images/lm_cascades_fig3.png)
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
#### Selection-inference prompting
[![graphical model of selection-inference prompting](images/lm_cascades_fig4.png)
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
#### Verifiers
[![graphical model of verifiers](images/lm_cascades_fig5.png)
<br>Source: *Language Model Cascades* by David Dohan et al. (2022)](https://arxiv.org/abs/2207.10342)
#### Implications
Although formulating these techniques as probabilistic graphical models may not be immediately useful for solving any particular problem, the framework may be helpful in selecting, combining, and discovering new techniques.
## Closing thoughts
Research into large language models is very active and evolving rapidly. Not only do researchers continue to improve the models, they are also continue to improve our understanding of how to best employ the models. To underscore the pace of these developments, note that all of the papers shared above were published within the past 12 months (as I write in Sep 2022).
In the future, expect better models and better techniques to be published. Even if the specific techniques here are eclipsed by future best practices, the general principles behind them will likely remain a key part of any expert user's toolkit.
## Bibliography
| Lesson | Paper | Date |
|--------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|----------|
| Break complex tasks into simpler subtasks (and consider exposing the intermediate outputs to users) | [AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts](https://arxiv.org/abs/2110.01691) | 2021 Oct |
| You can improve output by generating many candidates, and then picking the one that looks best | [Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168) | 2021 Oct |
| On reasoning tasks, models do better when they reason step-by-step before answering | [Chain of Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) | 2022 Jan |
| You can improve step-by-step reasoning by generating many explanation-answer outputs, and picking the most popular answer | [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) | 2022 Mar |
| If you want to fine-tune a step-by-step reasoner, you can do it with multiple-choice question & answer data alone | [STaR: Bootstrapping Reasoning With Reasoning](https://arxiv.org/abs/2203.14465) | 2022 Mar |
| The step-by-step reasoning method works great even with zero examples | [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916) | 2022 May |
| You can do better than step-by-step reasoning by alternating a selection prompt and an inference prompt | [Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning](https://arxiv.org/abs/2205.09712) | 2022 May |
| On long reasoning problems, you can improve step-by-step reasoning by splitting the problem into pieces to solve incrementally | [Least-to-most Prompting Enables Complex Reasoning in Large Language Models](https://arxiv.org/abs/2205.10625) | 2022 May |
| You can have the model analyze both good and bogus explanations to figure out which set of explanations are most consistent | [Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations](https://arxiv.org/abs/2205.11822) | 2022 May |
| You can think about these techniques in terms of probabilistic programming, where systems comprise unreliable components | [Language Model Cascades](https://arxiv.org/abs/2207.10342) | 2022 Jul |
| You can eliminate hallucination with sentence label manipulation, and you can reduce wrong answers with a 'halter' prompt | [Faithful Reasoning Using Large Language Models](https://arxiv.org/abs/2208.14271) | 2022 Aug |

View File

@ -34,7 +34,7 @@ We've written guides and code examples for transitioning from the deprecated API
[Guide: How to transition off the Classifications endpoint](https://help.openai.com/en/articles/6272941-classifications-transition-guide)
* Option 1: transition to fine-tuning **(recommended)**
* Example code: [Classification.ipynb](../examples/Classification.ipynb)
* Example code: [Fine-tuned_classification.ipynb](../examples/Fine-tuned_classification.ipynb)
* Option 2: transition to embeddings
* Example code: [Semantic_text_search_using_embeddings.ipynb](../examples/Semantic_text_search_using_embeddings.ipynb)
* Option 3: reimplement Classifications endpoint functionality

View File

@ -133,7 +133,7 @@ def classifications(
{{ an optional instruction }}
Text: example 1 text
Category: example 2 label
Category: example 1 label
---
Text: example 1 text
Category: example 2 label

View File

@ -35,7 +35,7 @@ def get_score(context, query, log_probs, text_offsets) -> float:
def search(query, documents, engine):
prompts = [construct_context(query, doc) for doc in [""] + docs]
prompts = [construct_context(query, doc) for doc in [""] + documents]
resps = openai.Completion.create(
model=engine,