Show HN: I built a GPT-4 bot which builds software incrementally
The ability to synthesize a relatively short snipped of code was already demonstrated. But I thought it would be interesting to test whether GPT-4 can replace a programmer completely.
To do that, AI needs to plan its actions and work on code incrementally, one piece at a time.
The challenge is the context size: the entire code base + plan does not fit into the context.
My approach: Add only relevant parts of the code base to the context.
Specifically, AI generation engine implements two distinct phases: planning and coding.
In the planning phase, GPT-4 receives a tree of tasks and a summary of code base (list of files and their descriptions). It replies with updated tasks (i.e. it is able to create sub-tasks as needed), the task it wants to work on in the next step and a list of relevant code fragments for that task.
In the coding phase, it receives the task description (as a tree, in YAML) and relevant code fragments. It replies with new generated or updated files, code fragments, and status: Was the task done? Do we need to break it into subtasks?
In both cases bot can also show it's "observations" before the output, as I believe it helps with planning code generation/planning.
Results: Currently I have only tested extremely basic scenarios. It needs a lot of work to be usable in practice. But I'd say it seems to work more-or-less as expected.
Example 1: "Write a reddit-like backend in Kotlin, using Ktor. Start by planning and creating subtasks."
This was the entire task which bot received, no other data.
Results:
Link to output: https://gist.github.com/killerstorm/dd6e26dc80064b7fc731d583f8d740c1#file-ktor_reddit-txt-L9
In short, it formulated reasonably-sounding subtasks and started generating code, e.g. made a Post model. It was aborted at that step due to GPT-4 API failure, it's not reliable yet.
Example 2: "Write a reddit clone in TypeScript. Start by planning and creating subtasks."
Link to output: https://gist.github.com/killerstorm/e3c50bea3ca3463c8b2d947dcfd80b84
You can see more work here, but I expect that it's less interesting.
Challenges: I'd say it can work pretty well in file-at-once mode. Making _fragments_ of the file is more challenging because it's not a well-defined concept. FWIW GPT-4 largely ignored what I wrote about file fragments and made entire files at once, which was the right decision.
I will post link to script in the comment to this post.
> In the planning phase, GPT-4 receives a tree of tasks and a summary of code base (list of files and their descriptions). It replies with updated tasks (i.e. it is able to create sub-tasks as needed), the task it wants to work on in the next step and a list of relevant code fragments for that task.
Are you passing the original task in each prompt? If not I think that it's going to lose context of what it's trying to build overall.
How are you deciding what are relevant code snippets to send?
I tried three formats:
I first tried it on GPT-3.5 (gpt-3.5-turbo, aka ChatGPT) and it was really struggling with formatting - sometimes it got it right, sometime wrong.1. All-YAML 2. All-XML 3. Custom parse for code framents + YAMLFor GPT-4 I used custom listing-style representation and it kind of just worked. I later re-tried with YAML and XML and it seems to work quite well too.
Here's the prompt for code-generation part of the "custom" variant:
Here's the script for "custom" variant: https://gist.github.com/killerstorm/2296b282c818ffcfe4ceb729...You're a code construction AI which creates code iteratively. You're given a list of existing code fragments and a task to work on. Files are normally broken into multiple fragments to reduce context size. The response should be in the following format: // Observations on the task and the code base, if any // A plan to implement the task // Code fragments to be added to the code base. Use the following markers to delimit code fragments. // (Normally a fragment would be a function, class, or a list of related lines) /// BEGIN_FILE <path>: <description> /// BEGIN <marker>: <description> <code> /// END <marker> /// END_FILE <path> CODE_GENERATION_STATUS: <status> # COMPLETE, PARTIAL, REDO description: <description> # updated description if the status is PARTIAL or REDO subtasks: # updated subtasks if the status is PARTIAL or REDO - id: <id> status: <status> # DONE, PARTIAL, TODO description: <description>Note that you really need GPT-4 to reproduce the results, it doesn't really works in GPT-3.5, although you can see some activity.