Hacker News Clone

Show HN: I built a GPT-4 bot which builds software incrementally

by killerstorm on 3/26/2023, 4:54 PM with 4 comments

The ability to synthesize a relatively short snipped of code was already demonstrated. But I thought it would be interesting to test whether GPT-4 can replace a programmer completely.

To do that, AI needs to plan its actions and work on code incrementally, one piece at a time.

The challenge is the context size: the entire code base + plan does not fit into the context.

My approach: Add only relevant parts of the code base to the context.

Specifically, AI generation engine implements two distinct phases: planning and coding.

In the planning phase, GPT-4 receives a tree of tasks and a summary of code base (list of files and their descriptions). It replies with updated tasks (i.e. it is able to create sub-tasks as needed), the task it wants to work on in the next step and a list of relevant code fragments for that task.

In the coding phase, it receives the task description (as a tree, in YAML) and relevant code fragments. It replies with new generated or updated files, code fragments, and status: Was the task done? Do we need to break it into subtasks?

In both cases bot can also show it's "observations" before the output, as I believe it helps with planning code generation/planning.

Results: Currently I have only tested extremely basic scenarios. It needs a lot of work to be usable in practice. But I'd say it seems to work more-or-less as expected.

Example 1: "Write a reddit-like backend in Kotlin, using Ktor. Start by planning and creating subtasks."

This was the entire task which bot received, no other data.

Results:

Link to output: https://gist.github.com/killerstorm/dd6e26dc80064b7fc731d583f8d740c1#file-ktor_reddit-txt-L9

In short, it formulated reasonably-sounding subtasks and started generating code, e.g. made a Post model. It was aborted at that step due to GPT-4 API failure, it's not reliable yet.

Example 2: "Write a reddit clone in TypeScript. Start by planning and creating subtasks."

Link to output: https://gist.github.com/killerstorm/e3c50bea3ca3463c8b2d947dcfd80b84

You can see more work here, but I expect that it's less interesting.

Challenges: I'd say it can work pretty well in file-at-once mode. Making _fragments_ of the file is more challenging because it's not a well-defined concept. FWIW GPT-4 largely ignored what I wrote about file fragments and made entire files at once, which was the right decision.

I will post link to script in the comment to this post.

by cloudking on 3/26/2023, 5:32 PM
> In the planning phase, GPT-4 receives a tree of tasks and a summary of code base (list of files and their descriptions). It replies with updated tasks (i.e. it is able to create sub-tasks as needed), the task it wants to work on in the next step and a list of relevant code fragments for that task.
Are you passing the original task in each prompt? If not I think that it's going to lose context of what it's trying to build overall.
How are you deciding what are relevant code snippets to send?

by killerstorm on 3/26/2023, 5:05 PM

I tried three formats:

  1. All-YAML
  2. All-XML
  3. Custom parse for code framents + YAML

I first tried it on GPT-3.5 (gpt-3.5-turbo, aka ChatGPT) and it was really struggling with formatting - sometimes it got it right, sometime wrong.

For GPT-4 I used custom listing-style representation and it kind of just worked. I later re-tried with YAML and XML and it seems to work quite well too.

Here's the prompt for code-generation part of the "custom" variant:

    You're a code construction AI which creates code iteratively.
    You're given a list of existing code fragments and a task to work on.
    Files are normally broken into multiple fragments to reduce context size.

    The response should be in the following format:

    // Observations on the task and the code base, if any
    // A plan to implement the task

    // Code fragments to be added to the code base. Use the following markers to delimit code fragments. 
    // (Normally a fragment would be a function, class, or a list of related lines)
    
    /// BEGIN_FILE <path>: <description>
    
    /// BEGIN <marker>: <description>
    <code>
    /// END <marker>
    
    /// END_FILE <path>


    CODE_GENERATION_STATUS: <status> # COMPLETE, PARTIAL, REDO
    description: <description> # updated description if the status is PARTIAL or REDO
    subtasks: # updated subtasks if the status is PARTIAL or REDO
        - id: <id>
        status: <status> # DONE, PARTIAL, TODO
        description: <description>

Here's the script for "custom" variant: https://gist.github.com/killerstorm/2296b282c818ffcfe4ceb729...

Note that you really need GPT-4 to reproduce the results, it doesn't really works in GPT-3.5, although you can see some activity.