8 Commits

Author SHA1 Message Date
Christopher Hesse
41a6793dc6 Update README.md 2019-07-26 17:02:46 -07:00
Albert Wu
c0859d7523 Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop' (#119)
* do initial run on full context

* decrement while loop iterations

* add context to output

* remove first param

* removing first param: change shape invariant
2019-05-30 21:49:18 -07:00
Memo Akten
e5c5054474 allow models to be in a separate folder via models_dir argument (#129)
* models_dir argument to allow models in a separate folder

* default value for models_dir to be same as before

* allow environment variables and user home in models_dir
2019-05-16 09:42:58 -07:00
Jeff Wu
dd75299dfe remove samples 2019-05-03 15:43:08 -07:00
Jeff Wu
b5ef71a922 reference dataset 2019-05-03 15:26:08 -07:00
Jeff Wu
0503b1b249 updates for 345M model 2019-05-02 20:39:33 -07:00
Jeff Wu
d14501aade Update CONTRIBUTORS.md 2019-03-18 14:27:10 -07:00
Jeff Wu
86378284e1 fix for windows (thanks to chrothenbach) 2019-03-07 11:26:58 -08:00
17 changed files with 42 additions and 136987 deletions

1
.gitignore vendored
View File

@@ -1,2 +1,3 @@
__pycache__
.mypy_cache/
models/

View File

@@ -6,7 +6,7 @@
* **[Margaret Mitchell et al](https://arxiv.org/abs/1810.03993)**
Our [usage](./readme#usage) writeup was loosely inspired by the paper
Our [usage](./README.md#usage) writeup was loosely inspired by the paper
[Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993)
and related conversations with some of the authors.

View File

@@ -28,6 +28,7 @@ pip3 install -r requirements.txt
Download the model data
```
python3 download_model.py 117M
python3 download_model.py 345M
```
## Docker Installation

View File

@@ -6,3 +6,4 @@ WORKDIR /gpt-2
ADD . /gpt-2
RUN pip3 install -r requirements.txt
RUN python3 download_model.py 117M
RUN python3 download_model.py 345M

View File

@@ -15,3 +15,4 @@ WORKDIR /gpt-2
ADD . /gpt-2
RUN pip3 install -r requirements.txt
RUN python3 download_model.py 117M
RUN python3 download_model.py 345M

View File

@@ -1,24 +1,26 @@
**Status:** Archive (code is provided as-is, no updates expected)
# gpt-2
Code and samples from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
Code from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
For now, we have only released a smaller (117M parameter) version of GPT-2.
We have currently released small (117M parameter) and medium (345M parameter) versions of GPT-2. While we have not released the larger models, we have [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.
See more details in our [blog post](https://blog.openai.com/better-language-models/).
## Usage
This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2-117M. While GPT-2-117M is less proficient than GPT-2-1.5B, it is useful for a wide range of research and applications which could also apply to larger models.
This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.
### Some caveats
- GPT-2-117M robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2-117M for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
- The dataset our GPT-2-117M was trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2-117M is likely to be biased and inaccurate as well.
- GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
- The dataset our GPT-2 models were trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
- To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.
### Work with us
Please [let us know](mailto:languagequestions@openai.com) if youre doing interesting research with or working on applications of GPT-2-117M! Were especially interested in hearing from and potentially working with those who are studying
Please [let us know](mailto:languagequestions@openai.com) if youre doing interesting research with or working on applications of GPT-2! Were especially interested in hearing from and potentially working with those who are studying
- Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
- The extent of problematic content (e.g. bias) being baked into the models and effective mitigations
@@ -30,15 +32,6 @@ See [DEVELOPERS.md](./DEVELOPERS.md)
See [CONTRIBUTORS.md](./CONTRIBUTORS.md)
## GPT-2 samples
| WARNING: Samples are unfiltered and may contain offensive content. |
| --- |
While we have not yet released GPT-2 itself, you can see some samples from it in the `gpt-2-samples` folder.
We show unconditional samples with default settings (temperature 1 and no truncation), with temperature 0.7, and with truncation with top_k 40.
We show conditional samples, with contexts drawn from `WebText`'s test set, with default settings (temperature 1 and no truncation), with temperature 0.7, and with truncation with top_k 40.
## Citation
Please use the following bibtex entry:

View File

@@ -12,6 +12,7 @@ model = sys.argv[1]
subdir = os.path.join('models', model)
if not os.path.exists(subdir):
os.makedirs(subdir)
subdir = subdir.replace('\\','/') # needed for Windows
for filename in ['checkpoint','encoder.json','hparams.json','model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta', 'vocab.bpe']:

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@@ -105,10 +105,10 @@ class Encoder:
text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=self.errors)
return text
def get_encoder(model_name):
with open(os.path.join('models', model_name, 'encoder.json'), 'r') as f:
def get_encoder(model_name, models_dir):
with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:
encoder = json.load(f)
with open(os.path.join('models', model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
bpe_data = f.read()
bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split('\n')[1:-1]]
return Encoder(

View File

@@ -16,6 +16,7 @@ def sample_model(
length=None,
temperature=1,
top_k=0,
models_dir='models',
):
"""
Run the sample_model
@@ -35,10 +36,13 @@ def sample_model(
considered for each step (token), resulting in deterministic completions,
while 40 means 40 words are considered at each step. 0 (default) is a
special setting meaning no restrictions. 40 generally is a good value.
:models_dir : path to parent folder containing model subfolders
(i.e. contains the <model_name> folder)
"""
enc = encoder.get_encoder(model_name)
models_dir = os.path.expanduser(os.path.expandvars(models_dir))
enc = encoder.get_encoder(model_name, models_dir)
hparams = model.default_hparams()
with open(os.path.join('models', model_name, 'hparams.json')) as f:
with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
hparams.override_from_dict(json.load(f))
if length is None:
@@ -58,7 +62,7 @@ def sample_model(
)[:, 1:]
saver = tf.train.Saver()
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
saver.restore(sess, ckpt)
generated = 0

View File

@@ -16,6 +16,7 @@ def interact_model(
length=None,
temperature=1,
top_k=0,
models_dir='models',
):
"""
Interactively run the model
@@ -34,14 +35,17 @@ def interact_model(
considered for each step (token), resulting in deterministic completions,
while 40 means 40 words are considered at each step. 0 (default) is a
special setting meaning no restrictions. 40 generally is a good value.
:models_dir : path to parent folder containing model subfolders
(i.e. contains the <model_name> folder)
"""
models_dir = os.path.expanduser(os.path.expandvars(models_dir))
if batch_size is None:
batch_size = 1
assert nsamples % batch_size == 0
enc = encoder.get_encoder(model_name)
enc = encoder.get_encoder(model_name, models_dir)
hparams = model.default_hparams()
with open(os.path.join('models', model_name, 'hparams.json')) as f:
with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
hparams.override_from_dict(json.load(f))
if length is None:
@@ -61,7 +65,7 @@ def interact_model(
)
saver = tf.train.Saver()
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
saver.restore(sess, ckpt)
while True:

View File

@@ -41,36 +41,33 @@ def sample_sequence(*, hparams, length, start_token=None, batch_size=None, conte
}
with tf.name_scope('sample_sequence'):
# Don't feed the last context token -- leave that to the loop below
# TODO: Would be slightly faster if we called step on the entire context,
# rather than leaving the last token transformer calculation to the while loop.
context_output = step(hparams, context[:, :-1])
def body(past, prev, output):
next_outputs = step(hparams, prev[:, tf.newaxis], past=past)
next_outputs = step(hparams, prev, past=past)
logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature)
logits = top_k_logits(logits, k=top_k)
samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)
return [
tf.concat([past, next_outputs['presents']], axis=-2),
tf.squeeze(samples, axis=[1]),
tf.concat([output, samples], axis=1),
next_outputs['presents'] if past is None else tf.concat([past, next_outputs['presents']], axis=-2),
samples,
tf.concat([output, samples], axis=1)
]
past, prev, output = body(None, context, context)
def cond(*args):
return True
_, _, tokens = tf.while_loop(
cond=cond, body=body,
maximum_iterations=length,
maximum_iterations=length - 1,
loop_vars=[
context_output['presents'],
context[:, -1],
context,
past,
prev,
output
],
shape_invariants=[
tf.TensorShape(model.past_shape(hparams=hparams, batch_size=batch_size)),
tf.TensorShape([batch_size]),
tf.TensorShape([batch_size, None]),
tf.TensorShape([batch_size, None]),
],
back_prop=False,