push 774M model

2019-08-20 08:50:19 -07:00
parent cb415376c3
commit f35fa1d920
7 changed files with 20 additions and 15 deletions
--- a/DEVELOPERS.md
+++ b/DEVELOPERS.md
@@ -27,8 +27,9 @@ pip3 install -r requirements.txt

 Download the model data
 ```
-python3 download_model.py 117M
-python3 download_model.py 345M
+python3 download_model.py 124M
+python3 download_model.py 355M
+python3 download_model.py 774M
 ```

 ## Docker Installation
--- a/Dockerfile.cpu
+++ b/Dockerfile.cpu
@@ -5,5 +5,6 @@ RUN mkdir /gpt-2
 WORKDIR /gpt-2
 ADD . /gpt-2
 RUN pip3 install -r requirements.txt
-RUN python3 download_model.py 117M
-RUN python3 download_model.py 345M
+RUN python3 download_model.py 124M
+RUN python3 download_model.py 355M
+RUN python3 download_model.py 774M
--- a/Dockerfile.gpu
+++ b/Dockerfile.gpu
@@ -14,5 +14,6 @@ RUN mkdir /gpt-2
 WORKDIR /gpt-2
 ADD . /gpt-2
 RUN pip3 install -r requirements.txt
-RUN python3 download_model.py 117M
-RUN python3 download_model.py 345M
+RUN python3 download_model.py 124M
+RUN python3 download_model.py 355M
+RUN python3 download_model.py 774M
--- a/README.md
+++ b/README.md
@@ -4,9 +4,11 @@

 Code from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).

-We have currently released small (117M parameter) and medium (345M parameter) versions of GPT-2.  While we have not released the larger models, we have [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.
+We have currently released small (124M parameter), medium (355M parameter), and large (774M parameter) versions of GPT-2<sup>*</sup>, with only the full model as of yet unreleased.  We have also [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.

-See more details in our [blog post](https://blog.openai.com/better-language-models/).
+You can read about GPT-2 and release decisions in our [original blog post](https://blog.openai.com/better-language-models/) and [6 month follow-up post](https://openai.com/blog/gpt-2-6-month-follow-up/).
+
+<sup>*</sup> *Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper).  Thus you may have seen small referred to as 117M and medium referred to as 345M.*

 ## Usage

--- a/download_model.py
+++ b/download_model.py
@@ -4,7 +4,7 @@ import requests
 from tqdm import tqdm

 if len(sys.argv) != 2:
-    print('You must enter the model name as a parameter, e.g.: download_model.py 117M')
+    print('You must enter the model name as a parameter, e.g.: download_model.py 124M')
    sys.exit(1)

 model = sys.argv[1]
--- a/src/generate_unconditional_samples.py
+++ b/src/generate_unconditional_samples.py
@@ -9,7 +9,7 @@ import tensorflow as tf
 import model, sample, encoder

 def sample_model(
-    model_name='117M',
+    model_name='124M',
    seed=None,
    nsamples=0,
    batch_size=1,
@@ -20,7 +20,7 @@ def sample_model(
 ):
    """
    Run the sample_model
-    :model_name=117M : String, which model to use
+    :model_name=124M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to
     reproduce results
    :nsamples=0 : Number of samples to return, if 0, continues to
--- a/src/interactive_conditional_samples.py
+++ b/src/interactive_conditional_samples.py
@@ -9,7 +9,7 @@ import tensorflow as tf
 import model, sample, encoder

 def interact_model(
-    model_name='117M',
+    model_name='124M',
    seed=None,
    nsamples=1,
    batch_size=1,
@@ -20,7 +20,7 @@ def interact_model(
 ):
    """
    Interactively run the model
-    :model_name=117M : String, which model to use
+    :model_name=124M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :nsamples=1 : Number of samples to return total