James B. Pollack 2d0b62225c Update encoder.py to work on windows
This fixes https://github.com/openai/gpt-2/issues/26

```  File "C:\Users\James Pollack\Desktop\gpt-2\src\encoder.py", line 112, in get_encoder
    bpe_data = f.read()
  File "C:\Anaconda\envs\gpt-2\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 66951: character maps to <undefined>```
2019-02-15 11:55:23 -08:00
2019-02-15 11:55:23 -08:00
2019-02-10 23:47:49 -08:00
2019-02-15 10:48:19 -08:00
2019-02-14 00:17:55 -08:00
2019-02-10 23:47:49 -08:00

gpt-2

Code and samples from the paper "Language Models are Unsupervised Multitask Learners".

For now, we have only released a smaller (117M parameter) version of GPT-2.

See more details in our blog post.

Installation

Download the model data (needs gsutil):

sh download_model.sh 117M

Install python packages:

pip3 install -r requirements.txt

Unconditional sample generation

WARNING: Samples are unfiltered and may contain offensive content.

To generate unconditional samples from the small model:

python3 src/generate_unconditional_samples.py | tee samples

There are various flags for controlling the samples:

python3 src/generate_unconditional_samples.py --top_k 40 --temperature 0.7 | tee samples

While we have not yet released GPT-2 itself, you can see some unconditional samples from it (with default settings of temperature 1 and no truncation) in gpt2-samples.txt.

Conditional sample generation

To give the model custom prompts, you can use:

python3 src/interactive_conditional_samples.py

Future work

We may release code for evaluating the models on various benchmarks.

We are still considering release of the larger models.

Description
Code for the paper "Language Models are Unsupervised Multitask Learners"
Readme 4.8 MiB
Languages
Python 100%