Check out our open source tools to give Claude Code GTM superpowers→

Auto Prompt Creator

Anneal loop that graduates prompts at 92%+ accuracy on cheap models.

Open source · MIT · Production-tested inside LeadGrow's live pipeline

Quick install

$ git clone https://github.com/MitchellkellerLG/auto-prompt-creator.git

View on GitHub →Want a custom version?

What you unlock

A prompt optimization system that makes Haiku produce outputs indistinguishable from Opus on a defined task. Takes a baseline prompt, scores it against expert ground truth, and mutates it through a phased anneal loop — bootstrap, generalize, polish — with mutation diversity constraints that prevent overfitting. Graduates to a portable library entry when it clears 92% on held-out validation data.

OutcomeBuilt at LeadGrow, used in live pipelines, shipped open source so you don't have to build it yourself.

How it works

See it in action

3 things you can do right now.

No UI. No clicking. Just commands that execute.

Run the anneal loop on a task

$ prompt-anneal run

--task "classify-pain-points"

--train data/train.csv

--model haiku

✓ Iteration 12 — val 94.3% → PASS Graduated → library/classify-pain-points.md

OutcomeHaiku now performs at Opus level on pain classification. 10x cheaper, same accuracy.

Check accuracy on held-out data

$ prompt-anneal score

--prompt library/classify-pain-points.md

--holdout data/holdout.csv

✓ Holdout accuracy: 92.8% → 46/50 test cases pass

Outcome92.8% on data the loop never saw. Generalization, not memorization.

Inspect a graduated prompt

$ prompt-anneal inspect

--task "classify-pain-points"

--show-metadata

✓ library/classify-pain-points.md → Accuracy: 94.3% val / 92.8% holdout → Target model: claude-haiku-4-5

OutcomeEvery prompt carries accuracy metadata. You know what you're deploying before you deploy it.

What's included

Every capability, ready to script.

Phased mutation loop (bootstrap, generalize, polish) prevents example memorization

Weighted rubric scoring against train/val/holdout splits

Auto halt on threshold, plateau, overfitting, or token budget

Subtractive check at iteration 8 proves generalization, not lookup

Graduated prompts land in `library/` with full accuracy metadata

The workflow

End to end. Zero manual steps.

This is how LeadGrow cuts Opus prompts down to Haiku — same accuracy, 10x cheaper.

01

Define the task.

Name the task, pick the target model, and provide 20 ground-truth examples. Example quality sets the ceiling — garbage in, garbage out.

$ prompt-anneal init --task "icp-fit-score"

--model haiku --examples data/examples.csv

02

Score the baseline.

Run your current Opus prompt against the training set. This is the benchmark every mutation must beat. Most prompts start at 60–75% — that's the gap to close.

$ prompt-anneal baseline

--task "icp-fit-score" --train data/train.csv

03

Run the anneal loop.

The loop generates mutations, scores them, keeps winners, and iterates through bootstrap → generalize → polish phases. Auto-halts when the accuracy threshold clears.

$ prompt-anneal run --task "icp-fit-score"

--threshold 92 --max-iterations 20

04

Graduate to the library.

Cleared 92% on holdout data? The prompt exports to the library with full accuracy metadata. Any Claude Code skill can now load it by name.

$ prompt-anneal graduate

--task "icp-fit-score" --library ./library/

Real scenarios

What teams actually use it for.

Not theoretical. These are the pipelines running at LeadGrow and client stacks today.

01Porting an Opus prompt down to Haiku for a 10x cost cut

02Tuning classification, extraction, or enrichment prompts for production pipelines

03Building a reusable prompt library with measured accuracy per task

Included skills

Pre-built automations. Ready to run.

These aren't demos. They're the Claude Code skills we run inside LeadGrow, shipped open source so you don't have to build them yourself.

prompt-annealer

Prompt Annealer

Takes a task name and example set. Runs the full anneal loop. Returns a graduated prompt cleared at 92%+ on held-out data. Hands off to library-publisher when done.

Triggers: "optimize prompt for [task]"

accuracy-checker

Accuracy Checker

Scores any prompt against a validation set. Reports pass rate per rubric dimension. Runs before graduation to confirm generalization, not memorization.

Triggers: Before promoting any prompt to production

library-publisher

Library Publisher

Takes a graduated prompt and accuracy metadata. Writes the library entry. Updates the catalog. The prompt is now importable by name in any downstream skill.

Triggers: After accuracy-checker clears the threshold

Custom build

Want us to build yours?

Custom CLIs, MCP servers, and Claude Code skills wired to your exact GTM stack.

We build these for clients. Then we ship them open.

Book a Strategy Call →