GPT Unicorn has drawn a unicorn

  • It has become common knowledge that GPT4 (and also 3.5) have problems with deterministic outputs (even at T=0). So what we're seeing here is just the effect of random sampling, not any actual change to the model itself. If you scroll down, you'll see other close attempts by the exact same model that could already be counted as a win depending on who you ask.

    Edit: This comment section is a super fascinating case study on the inherent flaws in human cognition. Especially when it comes to seeing patterns in random noise. The fact that some people believe that the model really has to have changed in the past few days is amazing, because if you've kept up with the GPT architecture and the way OpenAI does things (especially on the API), it is incredibly obvious that nothing has happened. But people who want to believe that something has happened will definitely also start to see something.

  • I'm confused as to why this would see any improvement over time. Looking at the code, it's by default hitting the gpt 3.5-turbo API. Maybe I'm misremembering, but I thought I've seen statements from people working at OpenAI where it's been claimed that the API is static, we'd be informed of any changes to the underlying model. Is the model actually receiving updates?

    edit: Looking at previous days, too, it doesn't exactly seem to be improving. I think we just got a lucky sampling.

  • I was curious about how they were getting GPT to generate images, the prompting is so simple[1] that GPT still blows my mind:

      async fetchImage(context) {
        const messages = context || [
          { role: 'system', content: `You are a helpful assistant that generates SVG drawings. You respond only with SVG. You do not respond with text.` },
          { role: 'user', content: `Draw a unicorn in SVG format. Dimensions: 500x500. Respond ONLY with a single SVG string. Do not respond with conversation or codeblocks.` }
        ]
    
        const response = await this.api.generateCompletion(messages)
    
        if (!isSvg(response.content)) {
          console.error('Generated image is not valid SVG:', response.content)
          messages.push({ role: 'assistant', content: response.content })
          messages.push({ role: 'user', content: `The generated image is not valid SVG. Please try again. Only respond with SVG code. No text.` })
          return this.fetchImage(messages)
        }
    
        console.log('Generated image:', response.content)
        return response
      }
    
    
    [1] https://github.com/adamkdean/gpt-unicorn/blob/master/src/lib...

  • If you get non-deterministic output from a black box, and know that there's both an indeterminate amount of constant random noise in the output AND you do not have any insight into the actual changes being made in the black box...

    There's nothing to infer with any greater accuracy than the unknown amount of noise in each sample. So, all inference is of unknown accuracy. Also known as useless.

  • Several months ago when ChatGPT was just published, I asked it to draw a diagram of the contents of a cell (nucleus, membrane, amynoacids etc) in SVG (similar to the experiment shown here). At that time, it generated an OK SVG drawing, with very basic figures (squares, trianlges, circles, etc) representing the elements in a cell. I did this test to show my Biologist mom, the "power" of this new AI technology.

    Fast forward a couple of months later, I tried it again, and it was "blocked": It kept telling me "as an AI I cannot draw", even after emphazising the part of "generate SVG code". For me, it was another example of OpenAI "borking" the capabilities of ChatGPT.

  • What is strange to me about this is that with a remarkably similar prompt ("Draw a unicorn in SVG format") I'm able to get this from the Bing image creator (which is powered by DALL-E -- and I would think GPT-4 would have this capability. Perhaps I'm being naive):

    https://th.bing.com/th/id/OIG.GqpaRZ.NXCN6uxKj7X1u?pid=ImgGn

  • I would argue that `image-2023-04-25` looks comparable close to a unicorn. The art style does vary a lot though.

  • I’ve asked ChatGPT to create models using Blender’s Python API. It typically generates working code. The API has been updated since 2021 so it’s expected that things might break.

    It mostly just arranges simple spheres and cylinders. You can have it label parts which are usually correct.

    It struggles with the orientation of things so if you ask it to model a plane the engine nacelles are oriented the wrong way but correctly positioned.

  • There's no progress from something that is definitely not a unicorn towards a unicorn, the images seem randomly bad.

    I think it would be a lot of fun to give it the previous unicorn SVG attempt and ask it to make it more like a unicorn.

  • Is it just me or doesn't image-2023-04-25 look like a unicorn as well?

  • An interesting modification would be to have it reflect on its own output each day, and build up a list of advice for future attempts, fed in the next day.

    That would give it some “learning” and I’d be curious if

    1. Would it converge to a consistent shape at all? Or just bounce around random shapes day to day

    2. Would it produce unicorns more often than 1/118 times?

    The hardest part would be getting it to interpret its svg output without seeing it rendered. The multimodal model getting a rendered image would probably be much better, but maybe not!

  • Not directly related to the post, but still feels somewhat relevant.

    Back in March I used a bit more elaborate multi-step prompts for GPT3.5 to generate amusing pictures and published a gallery [1]. However, I eventually reached a point where changing prompts did not consistently improve the final results. At the end of the day, the quality of images are only as good as the training dataset, and GPT is a black box.

    For something different, to test whether it is possible to "compress" visual content specifically for GPT, I ran another experiment. SVG, being a verbose format, takes time to generate a detailed image, and it also becomes expensive over time. I translated a subset of SVG elements into Forth words [2], which has a nice synergy with GPT tokens--this allowed me to progressively render pictures and produce smaller outputs without sacrificing much in quality.

    Finally, I training my own GPT2-like model on the QuickDraw dataset [3]. It's not surprising that a sequence transformer can be trained to produce coherent brush strokes and recognizable images as long as there is a way to translate a graphical content into a sequence of tokens. That said, I found myself with more questions than I started, and trying other ideas now.

    [1] https://drawmeasheep.net/pages/about.html

    [2] https://drawmeasheep.net/pages/gpt-forth.html

    [3] https://drawmeasheep.net/pages/nn-training.html

  • I made some drawings with GPT-4 Code Interpreter and Pillow. I think with its sub-mage and image composition features you could make some detailed drawings if you were clever about it.

    [1] https://metastable.org/draw.html

  • The prompt is Draw a unicorn in SVG format. Dimensions: 500x500. Respond ONLY with a single SVG string. Do not respond with conversation or codeblocks.

    https://github.com/adamkdean/gpt-unicorn/blob/master/src/lib...

    I just asked this and got the following result:

      <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="500" height="500" viewBox="0 0 500 500">
      <path fill="#FDD7E4" d="M244.898 309.347c-.355-.421-5.388-6.485-5.388-10.305v-40.614l-3.073-12.759c-.708-2.684-2.527-4.924-5.034-6.38l-24.773-14.334c-8.152-4.72-15.709-10.72-17.905-18.788l-12.3-35.006c-.845-2.696-3.577-4.61-6.55-4.61h-48.235c-2.973 0-5.705 1.914-6.55 4.61l-12.3 35.005c-2.196 8.068-9.753 14.067-17.905 18.787L92.998 235.968c-2.507 1.456-4.325 3.696-5.034 6.38l-3.073 12.759v40.615c0 3.82-5.033 9.884-5.388 10.305-9.168 10.836-14.727 24.606-14.727 39.49v42.812C39.373 426.93 69.062 456.62 105.34 456.62h17.215c1.256 21.698 20.249 38.969 42.405 38.969s41.15-17.271 42.405-38.969h17.48c1.256 21.698 20.25 38.969 42.406 38.969s41.15-17.271 42.404-38.969H394.66c36.278 0 65.968-29.69 65.969-65.969v-42.812c0-14.884-5.559-28.654-14.727-39.491zm-189.9-4.337a77.079 77.079 0 0 0-4.225-.302c-19.322 0-35.012 15.689-35.012 35.012V375.99h74.25v-66.968c0-19.323-15.69-35.012-35.013-35.012zm208.12 0c-19.323 0-35.012 15.689-35.012 35.012v66.968h74.25v-66.968c0-19.323-15.69-35.012-35.012-35.012zm-118.63-66.968c10.286 0 18.632 8.345 18.632 18.632v48.424h-37.264v-48.424c0-10.287 8.345-18.632 18.632-18.632zm-27.944-18.633c-19.323 0-35.012 15.689-35.012 35.012V341.98H351.6V275.27c0-19.323-15.69-35.012-35.012-35.012zm-35.013-17.911c-10.287 0-18.632-8.346-18.632-18.633s8.345-18.632 18.632-18.632 18.632 8.346 18.632 18.632-8.345 18.633-18.632 18.633z"/></svg>
    
    I am not going to fault a language model for not getting that right! This is fundamentally not a language task. It demands an image model.

  • This reminds of the artists from 1500s that would do paintings of exotic animals for courts and other rich people, but without ever seeing the actual animal and they would draw them from descriptions.

  • We queried GPT-4 three times, at roughly equal time intervals over the span of a month while the system was being refined, with the prompt “Draw a unicorn in TikZ”. We can see a clear evolution in the sophistication of GPT-4’s drawings

    Given how random GPT seems with what it's not designed to do the original research is really peculiar. Could it be that they queried GPT on three separate instances some n times and picked the best result ?

  • Does it look like a unicorn to you ? It looks more like a cat with a spaghetti on his head...

    image-2023-04-25 looks more like an unicorn to me (although more a cow based unicorn than a horse based unicorn).

    Which leads me to this genuine question: : how close do the images resemble a unicorn? I mean how can one track resemblance and how to draw a line saying GPT has drawn an unicorn?

  • I gave the SVG unicorn back to GPT-4 and asked it, what is it.

    It recognized a head, eyes, body and legs. But it didn't recognize the unicorn.

    https://chat.openai.com/share/5409b417-b883-429f-893e-abe3d6...

  • What do you think of the current wave of agents spawning off - given that reasoning still needs a lot of grounding?

    With the probabilistic models at hands, adding reasoning and planning in some sense falls back to traditional software and system design. Do you see any plausible breakthroughs in this area with agent frameworks?

  • Could be sheer luck from the randomness of the model, it it came in the 10th draw, you may not have this article

  • That's actually a pretty epic unicorn. I like how its tail looks like a lightning bolt. Should be a logo idk

  • How do they get GPT-4 to produce valid SVGs so well?

    I experimented with SVG generation and it would often produce junk that wasn't even valid as an SVG, and even when it did produce a valid SVG, was often a couple of blobs which it would describe as if it were the mona lisa while just being a couple of elements.

  • I can see so many NFTs in this website.

  • That's uh a pretty loose interpretation of a unicorn...

  • If you presented that image to a random 100 people and asked them what it is, would a substantial number of them say “a unicorn”?

  • The prompt must be particularly bad. I managed to get a nicely looking unicorn at the first (and every subsequent) attempt.

  • The methodology is all wrong, as others pointed out. However image-2023-04-26 is pretty interesting. It has some value.

  • To me, "image-2023-07-22" looks like Picasso-esque rendering of Charlie Brown (apologies to Charles Schultz).

  • Has anyone set up a cafe press site that automatically slaps each day's image on a T-shirt?

    The most recent one would be good.

  • Ah ask it again a few times today and then again tomorrow, the day after, and the day after that.

  • People keep forgetting that asking GPT to draw is like asking a human to imagine a 6D tesseract.

  • I have a sneaky suspicion there is an if-then situation here - always the edge cases :).

  • One thing people do/did with search engines was to observe their ranking stability over time on common queries like "Cats" or "Dogs". If there was any change one could meaningfully investigate what has changed.

    Though this seems a bit more like neuropsychological eval by asking a blackbox(AI) questions.

  • Relevant XKCD: https://xkcd.com/904/

  • This is why, as a product manager, you should always test 20 hypotheses per month. At p-value of 0.05 this basically guarantees a successful product feature test every month!

  • the fact that some of the early ones are anti-smiles, a playful takeaway could be that it saw the insurmountability of the challenge at first!

  • the output varies wildly ... i'd check again after a few days/weeks and see how it varies. it could be very well be a fluke.

  • Coincidentally, I saw this headline after waking from a dream where a unicorn bashed through the window of my childhood kitchen, and I had to fight it off with a water gun.

    Melatonin, not even once.

  • Value it a $1Bn to make it a Unicorn Unicorn.

  • Room for improvement. Edited for negativity.

  • I'm sad @ArtDecider has gone dark.

  • lets see how the unicorn looks tomorrow

  • Unicow

  • For a second I thought this was about a GPT based product hitting a certain valuation, but I'm much more entertained by the actual content. This is great.