Update README.md

Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop' (#119 )
* do initial run on full context * decrement while loop iterations * add context to output * remove first param * removing first param: change shape invariant
2019-07-26 17:02:46 -07:00 · 2019-05-30 21:49:18 -07:00 · 2019-05-16 09:42:58 -07:00 · 2019-05-03 15:43:08 -07:00 · 2019-05-03 15:26:08 -07:00 · 2019-05-02 20:39:33 -07:00
17 changed files with 42 additions and 136987 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 __pycache__
+.mypy_cache/
 models/
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -6,7 +6,7 @@

 * **[Margaret Mitchell et al](https://arxiv.org/abs/1810.03993)**

-  Our [usage](./readme#usage) writeup was loosely inspired by the paper
+  Our [usage](./README.md#usage) writeup was loosely inspired by the paper
  [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993)
  and related conversations with some of the authors.

--- a/DEVELOPERS.md
+++ b/DEVELOPERS.md
@@ -28,6 +28,7 @@ pip3 install -r requirements.txt
 Download the model data
 ```
 python3 download_model.py 117M
+python3 download_model.py 345M
 ```

 ## Docker Installation
--- a/Dockerfile.cpu
+++ b/Dockerfile.cpu
@@ -6,3 +6,4 @@ WORKDIR /gpt-2
 ADD . /gpt-2
 RUN pip3 install -r requirements.txt
 RUN python3 download_model.py 117M
+RUN python3 download_model.py 345M
--- a/Dockerfile.gpu
+++ b/Dockerfile.gpu
@@ -15,3 +15,4 @@ WORKDIR /gpt-2
 ADD . /gpt-2
 RUN pip3 install -r requirements.txt
 RUN python3 download_model.py 117M
+RUN python3 download_model.py 345M
--- a/README.md
+++ b/README.md
@@ -1,24 +1,26 @@
+**Status:** Archive (code is provided as-is, no updates expected)
+
 # gpt-2

-Code and samples from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
+Code from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).

-For now, we have only released a smaller (117M parameter) version of GPT-2.
+We have currently released small (117M parameter) and medium (345M parameter) versions of GPT-2.  While we have not released the larger models, we have [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.

 See more details in our [blog post](https://blog.openai.com/better-language-models/).

 ## Usage

-This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2-117M.  While GPT-2-117M is less proficient than GPT-2-1.5B, it is useful for a wide range of research and applications which could also apply to larger models.
+This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

 ### Some caveats

- GPT-2-117M robustness and worst case behaviors are not well-understood.  As with any machine-learned model, carefully evaluate GPT-2-117M for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
- The dataset our GPT-2-117M was trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2-117M is likely to be biased and inaccurate as well.
+- GPT-2 models' robustness and worst case behaviors are not well-understood.  As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
+- The dataset our GPT-2 models were trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
 - To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination.  Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

 ### Work with us

-Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2-117M!  We’re especially interested in hearing from and potentially working with those who are studying
+Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2!  We’re especially interested in hearing from and potentially working with those who are studying
 - Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
 - The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

@@ -30,15 +32,6 @@ See [DEVELOPERS.md](./DEVELOPERS.md)

 See [CONTRIBUTORS.md](./CONTRIBUTORS.md)

-## GPT-2 samples
-
-| WARNING: Samples are unfiltered and may contain offensive content. |
-| --- |
-
-While we have not yet released GPT-2 itself, you can see some samples from it in the `gpt-2-samples` folder.
-We show unconditional samples with default settings (temperature 1 and no truncation), with temperature 0.7, and with truncation with top_k 40.
-We show conditional samples, with contexts drawn from `WebText`'s test set, with default settings (temperature 1 and no truncation), with temperature 0.7, and with truncation with top_k 40.
-
 ## Citation

 Please use the following bibtex entry:
--- a/download_model.py
+++ b/download_model.py
@@ -12,6 +12,7 @@ model = sys.argv[1]
 subdir = os.path.join('models', model)
 if not os.path.exists(subdir):
    os.makedirs(subdir)
+subdir = subdir.replace('\\','/') # needed for Windows

 for filename in ['checkpoint','encoder.json','hparams.json','model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta', 'vocab.bpe']:

--- a/gpt-2-samples/conditional-t07.txt
+++ b/gpt-2-samples/conditional-t07.txt
--- a/gpt-2-samples/conditional-topk40.txt
+++ b/gpt-2-samples/conditional-topk40.txt
--- a/gpt-2-samples/conditional.txt
+++ b/gpt-2-samples/conditional.txt
--- a/gpt-2-samples/unconditional-t07.txt
+++ b/gpt-2-samples/unconditional-t07.txt
--- a/gpt-2-samples/unconditional-topk40.txt
+++ b/gpt-2-samples/unconditional-topk40.txt
--- a/gpt-2-samples/unconditional.txt
+++ b/gpt-2-samples/unconditional.txt
--- a/src/encoder.py
+++ b/src/encoder.py
@@ -105,10 +105,10 @@ class Encoder:
        text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=self.errors)
        return text

-def get_encoder(model_name):
-    with open(os.path.join('models', model_name, 'encoder.json'), 'r') as f:
+def get_encoder(model_name, models_dir):
+    with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:
        encoder = json.load(f)
-    with open(os.path.join('models', model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
+    with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
        bpe_data = f.read()
    bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split('\n')[1:-1]]
    return Encoder(
--- a/src/generate_unconditional_samples.py
+++ b/src/generate_unconditional_samples.py
@@ -16,6 +16,7 @@ def sample_model(
    length=None,
    temperature=1,
    top_k=0,
+    models_dir='models',
 ):
    """
    Run the sample_model
@@ -35,10 +36,13 @@ def sample_model(
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
+     :models_dir : path to parent folder containing model subfolders
+     (i.e. contains the <model_name> folder)
    """
-    enc = encoder.get_encoder(model_name)
+    models_dir = os.path.expanduser(os.path.expandvars(models_dir))
+    enc = encoder.get_encoder(model_name, models_dir)
    hparams = model.default_hparams()
-    with open(os.path.join('models', model_name, 'hparams.json')) as f:
+    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
@@ -58,7 +62,7 @@ def sample_model(
        )[:, 1:]

        saver = tf.train.Saver()
-        ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
+        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
        saver.restore(sess, ckpt)

        generated = 0
--- a/src/interactive_conditional_samples.py
+++ b/src/interactive_conditional_samples.py
@@ -16,6 +16,7 @@ def interact_model(
    length=None,
    temperature=1,
    top_k=0,
+    models_dir='models',    
 ):
    """
    Interactively run the model
@@ -34,14 +35,17 @@ def interact_model(
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
+     :models_dir : path to parent folder containing model subfolders
+     (i.e. contains the <model_name> folder)     
    """
+    models_dir = os.path.expanduser(os.path.expandvars(models_dir))
    if batch_size is None:
        batch_size = 1
    assert nsamples % batch_size == 0

-    enc = encoder.get_encoder(model_name)
+    enc = encoder.get_encoder(model_name, models_dir)
    hparams = model.default_hparams()
-    with open(os.path.join('models', model_name, 'hparams.json')) as f:
+    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
@@ -61,7 +65,7 @@ def interact_model(
        )

        saver = tf.train.Saver()
-        ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
+        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
        saver.restore(sess, ckpt)

        while True:
--- a/src/sample.py
+++ b/src/sample.py
@@ -41,36 +41,33 @@ def sample_sequence(*, hparams, length, start_token=None, batch_size=None, conte
        }

    with tf.name_scope('sample_sequence'):
-        # Don't feed the last context token -- leave that to the loop below
-        # TODO: Would be slightly faster if we called step on the entire context,
-        # rather than leaving the last token transformer calculation to the while loop.
-        context_output = step(hparams, context[:, :-1])
-
        def body(past, prev, output):
-            next_outputs = step(hparams, prev[:, tf.newaxis], past=past)
+            next_outputs = step(hparams, prev, past=past)
            logits = next_outputs['logits'][:, -1, :]  / tf.to_float(temperature)
            logits = top_k_logits(logits, k=top_k)
            samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)
            return [
-                tf.concat([past, next_outputs['presents']], axis=-2),
-                tf.squeeze(samples, axis=[1]),
-                tf.concat([output, samples], axis=1),
+                next_outputs['presents'] if past is None else tf.concat([past, next_outputs['presents']], axis=-2),
+                samples,
+                tf.concat([output, samples], axis=1)
            ]

+        past, prev, output = body(None, context, context)
+
        def cond(*args):
            return True

        _, _, tokens = tf.while_loop(
            cond=cond, body=body,
-            maximum_iterations=length,
+            maximum_iterations=length - 1,
            loop_vars=[
-                context_output['presents'],
-                context[:, -1],
-                context,
+                past,
+                prev,
+                output
            ],
            shape_invariants=[
                tf.TensorShape(model.past_shape(hparams=hparams, batch_size=batch_size)),
-                tf.TensorShape([batch_size]),
+                tf.TensorShape([batch_size, None]),
                tf.TensorShape([batch_size, None]),
            ],
            back_prop=False,
Author	SHA1	Message	Date
Christopher Hesse	41a6793dc6	Update README.md	2019-07-26 17:02:46 -07:00
Albert Wu	c0859d7523	Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop' (#119 ) * do initial run on full context * decrement while loop iterations * add context to output * remove first param * removing first param: change shape invariant	2019-05-30 21:49:18 -07:00
Memo Akten	e5c5054474	allow models to be in a separate folder via models_dir argument (#129 ) * models_dir argument to allow models in a separate folder * default value for models_dir to be same as before * allow environment variables and user home in models_dir	2019-05-16 09:42:58 -07:00
Jeff Wu	dd75299dfe	remove samples	2019-05-03 15:43:08 -07:00
Jeff Wu	b5ef71a922	reference dataset	2019-05-03 15:26:08 -07:00
Jeff Wu	0503b1b249	updates for 345M model	2019-05-02 20:39:33 -07:00
Jeff Wu	d14501aade	Update CONTRIBUTORS.md	2019-03-18 14:27:10 -07:00
Jeff Wu	86378284e1	fix for windows (thanks to chrothenbach)	2019-03-07 11:26:58 -08:00