Compare commits
8 Commits
new_readme
...
christophe
Author | SHA1 | Date | |
---|---|---|---|
|
41a6793dc6 | ||
|
c0859d7523 | ||
|
e5c5054474 | ||
|
dd75299dfe | ||
|
b5ef71a922 | ||
|
0503b1b249 | ||
|
d14501aade | ||
|
86378284e1 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1,2 +1,3 @@
|
|||||||
__pycache__
|
__pycache__
|
||||||
|
.mypy_cache/
|
||||||
models/
|
models/
|
||||||
|
@@ -6,7 +6,7 @@
|
|||||||
|
|
||||||
* **[Margaret Mitchell et al](https://arxiv.org/abs/1810.03993)**
|
* **[Margaret Mitchell et al](https://arxiv.org/abs/1810.03993)**
|
||||||
|
|
||||||
Our [usage](./readme#usage) writeup was loosely inspired by the paper
|
Our [usage](./README.md#usage) writeup was loosely inspired by the paper
|
||||||
[Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993)
|
[Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993)
|
||||||
and related conversations with some of the authors.
|
and related conversations with some of the authors.
|
||||||
|
|
||||||
|
@@ -28,6 +28,7 @@ pip3 install -r requirements.txt
|
|||||||
Download the model data
|
Download the model data
|
||||||
```
|
```
|
||||||
python3 download_model.py 117M
|
python3 download_model.py 117M
|
||||||
|
python3 download_model.py 345M
|
||||||
```
|
```
|
||||||
|
|
||||||
## Docker Installation
|
## Docker Installation
|
||||||
|
@@ -6,3 +6,4 @@ WORKDIR /gpt-2
|
|||||||
ADD . /gpt-2
|
ADD . /gpt-2
|
||||||
RUN pip3 install -r requirements.txt
|
RUN pip3 install -r requirements.txt
|
||||||
RUN python3 download_model.py 117M
|
RUN python3 download_model.py 117M
|
||||||
|
RUN python3 download_model.py 345M
|
||||||
|
@@ -15,3 +15,4 @@ WORKDIR /gpt-2
|
|||||||
ADD . /gpt-2
|
ADD . /gpt-2
|
||||||
RUN pip3 install -r requirements.txt
|
RUN pip3 install -r requirements.txt
|
||||||
RUN python3 download_model.py 117M
|
RUN python3 download_model.py 117M
|
||||||
|
RUN python3 download_model.py 345M
|
||||||
|
23
README.md
23
README.md
@@ -1,24 +1,26 @@
|
|||||||
|
**Status:** Archive (code is provided as-is, no updates expected)
|
||||||
|
|
||||||
# gpt-2
|
# gpt-2
|
||||||
|
|
||||||
Code and samples from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
|
Code from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
|
||||||
|
|
||||||
For now, we have only released a smaller (117M parameter) version of GPT-2.
|
We have currently released small (117M parameter) and medium (345M parameter) versions of GPT-2. While we have not released the larger models, we have [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.
|
||||||
|
|
||||||
See more details in our [blog post](https://blog.openai.com/better-language-models/).
|
See more details in our [blog post](https://blog.openai.com/better-language-models/).
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2-117M. While GPT-2-117M is less proficient than GPT-2-1.5B, it is useful for a wide range of research and applications which could also apply to larger models.
|
This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.
|
||||||
|
|
||||||
### Some caveats
|
### Some caveats
|
||||||
|
|
||||||
- GPT-2-117M robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2-117M for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
|
- GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
|
||||||
- The dataset our GPT-2-117M was trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2-117M is likely to be biased and inaccurate as well.
|
- The dataset our GPT-2 models were trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
|
||||||
- To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.
|
- To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.
|
||||||
|
|
||||||
### Work with us
|
### Work with us
|
||||||
|
|
||||||
Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2-117M! We’re especially interested in hearing from and potentially working with those who are studying
|
Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2! We’re especially interested in hearing from and potentially working with those who are studying
|
||||||
- Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
|
- Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
|
||||||
- The extent of problematic content (e.g. bias) being baked into the models and effective mitigations
|
- The extent of problematic content (e.g. bias) being baked into the models and effective mitigations
|
||||||
|
|
||||||
@@ -30,15 +32,6 @@ See [DEVELOPERS.md](./DEVELOPERS.md)
|
|||||||
|
|
||||||
See [CONTRIBUTORS.md](./CONTRIBUTORS.md)
|
See [CONTRIBUTORS.md](./CONTRIBUTORS.md)
|
||||||
|
|
||||||
## GPT-2 samples
|
|
||||||
|
|
||||||
| WARNING: Samples are unfiltered and may contain offensive content. |
|
|
||||||
| --- |
|
|
||||||
|
|
||||||
While we have not yet released GPT-2 itself, you can see some samples from it in the `gpt-2-samples` folder.
|
|
||||||
We show unconditional samples with default settings (temperature 1 and no truncation), with temperature 0.7, and with truncation with top_k 40.
|
|
||||||
We show conditional samples, with contexts drawn from `WebText`'s test set, with default settings (temperature 1 and no truncation), with temperature 0.7, and with truncation with top_k 40.
|
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
Please use the following bibtex entry:
|
Please use the following bibtex entry:
|
||||||
|
@@ -12,6 +12,7 @@ model = sys.argv[1]
|
|||||||
subdir = os.path.join('models', model)
|
subdir = os.path.join('models', model)
|
||||||
if not os.path.exists(subdir):
|
if not os.path.exists(subdir):
|
||||||
os.makedirs(subdir)
|
os.makedirs(subdir)
|
||||||
|
subdir = subdir.replace('\\','/') # needed for Windows
|
||||||
|
|
||||||
for filename in ['checkpoint','encoder.json','hparams.json','model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta', 'vocab.bpe']:
|
for filename in ['checkpoint','encoder.json','hparams.json','model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta', 'vocab.bpe']:
|
||||||
|
|
||||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
@@ -105,10 +105,10 @@ class Encoder:
|
|||||||
text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=self.errors)
|
text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=self.errors)
|
||||||
return text
|
return text
|
||||||
|
|
||||||
def get_encoder(model_name):
|
def get_encoder(model_name, models_dir):
|
||||||
with open(os.path.join('models', model_name, 'encoder.json'), 'r') as f:
|
with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:
|
||||||
encoder = json.load(f)
|
encoder = json.load(f)
|
||||||
with open(os.path.join('models', model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
|
with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
|
||||||
bpe_data = f.read()
|
bpe_data = f.read()
|
||||||
bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split('\n')[1:-1]]
|
bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split('\n')[1:-1]]
|
||||||
return Encoder(
|
return Encoder(
|
||||||
|
@@ -16,6 +16,7 @@ def sample_model(
|
|||||||
length=None,
|
length=None,
|
||||||
temperature=1,
|
temperature=1,
|
||||||
top_k=0,
|
top_k=0,
|
||||||
|
models_dir='models',
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Run the sample_model
|
Run the sample_model
|
||||||
@@ -35,10 +36,13 @@ def sample_model(
|
|||||||
considered for each step (token), resulting in deterministic completions,
|
considered for each step (token), resulting in deterministic completions,
|
||||||
while 40 means 40 words are considered at each step. 0 (default) is a
|
while 40 means 40 words are considered at each step. 0 (default) is a
|
||||||
special setting meaning no restrictions. 40 generally is a good value.
|
special setting meaning no restrictions. 40 generally is a good value.
|
||||||
|
:models_dir : path to parent folder containing model subfolders
|
||||||
|
(i.e. contains the <model_name> folder)
|
||||||
"""
|
"""
|
||||||
enc = encoder.get_encoder(model_name)
|
models_dir = os.path.expanduser(os.path.expandvars(models_dir))
|
||||||
|
enc = encoder.get_encoder(model_name, models_dir)
|
||||||
hparams = model.default_hparams()
|
hparams = model.default_hparams()
|
||||||
with open(os.path.join('models', model_name, 'hparams.json')) as f:
|
with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
|
||||||
hparams.override_from_dict(json.load(f))
|
hparams.override_from_dict(json.load(f))
|
||||||
|
|
||||||
if length is None:
|
if length is None:
|
||||||
@@ -58,7 +62,7 @@ def sample_model(
|
|||||||
)[:, 1:]
|
)[:, 1:]
|
||||||
|
|
||||||
saver = tf.train.Saver()
|
saver = tf.train.Saver()
|
||||||
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
|
ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
|
||||||
saver.restore(sess, ckpt)
|
saver.restore(sess, ckpt)
|
||||||
|
|
||||||
generated = 0
|
generated = 0
|
||||||
|
@@ -16,6 +16,7 @@ def interact_model(
|
|||||||
length=None,
|
length=None,
|
||||||
temperature=1,
|
temperature=1,
|
||||||
top_k=0,
|
top_k=0,
|
||||||
|
models_dir='models',
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Interactively run the model
|
Interactively run the model
|
||||||
@@ -34,14 +35,17 @@ def interact_model(
|
|||||||
considered for each step (token), resulting in deterministic completions,
|
considered for each step (token), resulting in deterministic completions,
|
||||||
while 40 means 40 words are considered at each step. 0 (default) is a
|
while 40 means 40 words are considered at each step. 0 (default) is a
|
||||||
special setting meaning no restrictions. 40 generally is a good value.
|
special setting meaning no restrictions. 40 generally is a good value.
|
||||||
|
:models_dir : path to parent folder containing model subfolders
|
||||||
|
(i.e. contains the <model_name> folder)
|
||||||
"""
|
"""
|
||||||
|
models_dir = os.path.expanduser(os.path.expandvars(models_dir))
|
||||||
if batch_size is None:
|
if batch_size is None:
|
||||||
batch_size = 1
|
batch_size = 1
|
||||||
assert nsamples % batch_size == 0
|
assert nsamples % batch_size == 0
|
||||||
|
|
||||||
enc = encoder.get_encoder(model_name)
|
enc = encoder.get_encoder(model_name, models_dir)
|
||||||
hparams = model.default_hparams()
|
hparams = model.default_hparams()
|
||||||
with open(os.path.join('models', model_name, 'hparams.json')) as f:
|
with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
|
||||||
hparams.override_from_dict(json.load(f))
|
hparams.override_from_dict(json.load(f))
|
||||||
|
|
||||||
if length is None:
|
if length is None:
|
||||||
@@ -61,7 +65,7 @@ def interact_model(
|
|||||||
)
|
)
|
||||||
|
|
||||||
saver = tf.train.Saver()
|
saver = tf.train.Saver()
|
||||||
ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
|
ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
|
||||||
saver.restore(sess, ckpt)
|
saver.restore(sess, ckpt)
|
||||||
|
|
||||||
while True:
|
while True:
|
||||||
|
@@ -41,36 +41,33 @@ def sample_sequence(*, hparams, length, start_token=None, batch_size=None, conte
|
|||||||
}
|
}
|
||||||
|
|
||||||
with tf.name_scope('sample_sequence'):
|
with tf.name_scope('sample_sequence'):
|
||||||
# Don't feed the last context token -- leave that to the loop below
|
|
||||||
# TODO: Would be slightly faster if we called step on the entire context,
|
|
||||||
# rather than leaving the last token transformer calculation to the while loop.
|
|
||||||
context_output = step(hparams, context[:, :-1])
|
|
||||||
|
|
||||||
def body(past, prev, output):
|
def body(past, prev, output):
|
||||||
next_outputs = step(hparams, prev[:, tf.newaxis], past=past)
|
next_outputs = step(hparams, prev, past=past)
|
||||||
logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature)
|
logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature)
|
||||||
logits = top_k_logits(logits, k=top_k)
|
logits = top_k_logits(logits, k=top_k)
|
||||||
samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)
|
samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)
|
||||||
return [
|
return [
|
||||||
tf.concat([past, next_outputs['presents']], axis=-2),
|
next_outputs['presents'] if past is None else tf.concat([past, next_outputs['presents']], axis=-2),
|
||||||
tf.squeeze(samples, axis=[1]),
|
samples,
|
||||||
tf.concat([output, samples], axis=1),
|
tf.concat([output, samples], axis=1)
|
||||||
]
|
]
|
||||||
|
|
||||||
|
past, prev, output = body(None, context, context)
|
||||||
|
|
||||||
def cond(*args):
|
def cond(*args):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
_, _, tokens = tf.while_loop(
|
_, _, tokens = tf.while_loop(
|
||||||
cond=cond, body=body,
|
cond=cond, body=body,
|
||||||
maximum_iterations=length,
|
maximum_iterations=length - 1,
|
||||||
loop_vars=[
|
loop_vars=[
|
||||||
context_output['presents'],
|
past,
|
||||||
context[:, -1],
|
prev,
|
||||||
context,
|
output
|
||||||
],
|
],
|
||||||
shape_invariants=[
|
shape_invariants=[
|
||||||
tf.TensorShape(model.past_shape(hparams=hparams, batch_size=batch_size)),
|
tf.TensorShape(model.past_shape(hparams=hparams, batch_size=batch_size)),
|
||||||
tf.TensorShape([batch_size]),
|
tf.TensorShape([batch_size, None]),
|
||||||
tf.TensorShape([batch_size, None]),
|
tf.TensorShape([batch_size, None]),
|
||||||
],
|
],
|
||||||
back_prop=False,
|
back_prop=False,
|
||||||
|
Reference in New Issue
Block a user