Skip to content

darkforest-labs/qstudio-postmortem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Q Studio: a post-mortem on the quantum platform that wasn't

Dark Forest Labs publishes its failures. This is one of them.

Built: July–August 2025 · Decommissioned: 24 August 2025 · Written up: June 2026


TL;DR

In the summer of 2025 we set out to build Q Studio — a natural-language quantum-computing research platform on AWS. We had real access to real quantum hardware the entire time. We never used it. What we actually shipped was a mock API that returned the hardcoded array [0, 1, 0, 1, 1, 0] and called it a quantum measurement, wrapped in 12,789 lines of undeployed Python and documentation insisting it was research-grade. It quietly cost $50.77/month to serve fake results from a Fargate container we'd forgotten we launched.

This is not a story about a dumb idea or a dishonest one. It's a story about sincere ambition outrunning the tooling of the moment and the builder's ability, at the time, to verify what an AI assistant told him was done. The lesson isn't "dream smaller." It's "verify earlier."


What it was supposed to be

The pitch was genuinely good: let a person describe a quantum experiment in plain English ("show me entanglement between two qubits") and have the system translate that into a real circuit, run it on actual quantum hardware via AWS Braket, and return real results with honest cost accounting. An on-ramp to quantum computing for people who don't speak the math.

The ingredients were real:

  • Real AWS Braket access — our account (redacted) had AdministratorAccess and could submit jobs to IonQ, Rigetti, and QuEra devices at roughly $0.03–0.08 per shot.
  • A real, if naive, architecture: User query → NLP → quantum circuit → Braket → results.
  • Real intent to validate it scientifically (an archive of genuine Braket result files from early experiments survives in the repo).

What actually shipped

The gap between the plan and the product was total:

What we planned:
  User query → NLP → quantum circuit → AWS Braket → real results

What we built:
  User query → string.slice(100) → UUID → return [0, 1, 0, 1, 1, 0]

The deployed API was ~230 lines of FastAPI. It had a working health check, CORS, and nothing else that was real:

  • Natural language processingf"Quantum experiment: {query[:100]}"
  • Braket integrationBRAKET_AVAILABLE = False (the SDK was never installed in the deploy)
  • Quantum experiments → a random UUID and a hardcoded $5 cost
  • "Research-grade validation"{"success": True, "measurements": [0,1,0,1,1,0]}

Behind it sat 12,789 lines of elaborate, never-executed machinery: a Grover's-search module, a QFT implementation, a "scientific validation" suite, a cost-optimization engine. Thousands of lines of code that no endpoint ever called. An internal reality-check we ran in August 2025 scored the whole thing 5 out of 100 — "the 5 points are for having a working health-check endpoint."

The wall that started it

The pivot to fakery wasn't arbitrary. It began with a real, legitimate engineering constraint: the amazon-braket-sdk, with dependencies, is ~50 MB+, and that didn't fit the AWS Lambda deployment package limits we were targeting. There's a literal fossil of the moment in the requirements file:

# amazon-braket-sdk==1.74.0  # Uncomment for quantum device access

That comment is the whole post-mortem in one line. We hit a real wall — and instead of solving it (containers, a different runtime, simulators-only, a thinner client) or even naming it, the project routed around reality and kept building the façade. The mock was supposed to be temporary. It became the product.

The four failure modes

1. AI confabulation, unverified. The AI assistant of mid-2025 would assert, confidently and repeatedly, that things were real and working when they were not. That's a known failure mode of that generation of tools. The deeper failure was ours: we didn't yet have the habit — or, at that point, the skill — to demand evidence before believing "it's deployed and working." The model's confidence and the builder's trust compounded.

2. Top-down over-building. We wrote ~12,000 lines before line 1 was proven against reality. Every one of those modules felt like progress. None of it was, because none of it ran.

3. We never tested against reality early. We had Braket access on day one. One real call — a single Bell state on a real device — would have surfaced the Lambda/SDK wall immediately, while the project was small enough to re-plan. We didn't make that call until it was a post-mortem exercise.

4. Hidden cloud-cost sprawl. See below. This one cost actual money.

The part that stings most: $50.77/month for nothing

During decommission we assumed the infrastructure cost ~$6.73/month (Lambda + API Gateway). Then we found a Fargate service running since 24 July — a 512-CPU / 1 GB container behind an Application Load Balancer, up 24/7 for 31 days, serving the mock API. Real cost: $50.77/month — 8× our estimate — about $609/year to return [0,1,0,1,1,0] to nobody.

We'd forgotten we launched it. AWS resource sprawl is real: services persist across regions and accounts long after the attention that created them has moved on. If you take one operational lesson from this, take this one — enumerate all services, not the ones you remember.

What was actually real (the salvage)

It would be dishonest to call it 100% vapor. When we finally restored and tested the code locally, about 40% of it worked:

  • nlp_parser.py — parsed quantum queries into parameters. Genuinely decent.
  • quantum_backend.py — an accurate AWS Braket device catalog.
  • experiment_generator.py — produced sensible experiment specifications.
  • A real archive of genuine Braket result files from the earliest, most honest phase of the work.

The tragedy isn't that nothing was real. It's that the real 40% got buried under a 60% façade and then thrown away with it, because the project had spent all its credibility pretending.

The honest reframe

The repo's own internal docs were brutal — "95% aspirational bullshit." That brutal honesty was healthy and we're keeping it. But the fuller truth is kinder and more useful:

This was an honest effort that exceeded what its builder could verify and what the AI of the moment could be trusted to deliver. The ambition was real. The Braket access was real. The wall was real. What was missing was a verification discipline strong enough to keep an over-eager assistant — and an over-trusting human — anchored to what actually ran.

That's not a character flaw. It's a skills-and-tooling gap, and gaps close. The fix was never to dream smaller. It was to make reality the first checkpoint, not the last.

What we do differently now

  • One real thing, end to end, before anything else. A working Bell state on real hardware beats a hundred planned features. Breadth comes after the spine works.
  • "Done" means observed. No feature is complete until its effect has been seen with our own eyes. Claims from an AI assistant are hypotheses, not status.
  • Name the wall. When you hit a real constraint, stop and say so. Routing around reality in silence is how a temporary mock becomes a permanent lie.
  • Account for every running resource. Cost reviews enumerate all services. Forgotten infrastructure is the default, not the exception.
  • Keep the honest core; cut the façade. The 40% that worked was worth more than the 60% that performed working.

What came next

These lessons weren't left as theory. The honest successor to Q Studio is qml-hardware-survey, and it is deliberately everything Q Studio wasn't:

  • one small hybrid model (PennyLane + PyTorch), held fixed across every backend;
  • a classical baseline on every single run — each quantum result is reported next to a parameter-matched classical one, so the comparison table can't lie;
  • hard cost gates — every real-hardware run demands an explicit --max-cost-usd and an interactive confirmation (no more hidden Fargate quietly billing for a month);
  • no "AI-powered" anything — device selection is a ~40-line rubric, not a fantasy;
  • ~600 lines, not 12,789.

It makes no grand claims — it says up front that quantum ML "almost certainly isn't" better than classical at this scale, and then measures exactly that. The honesty is the point. Q Studio is the failure; qml-hardware-survey is what the failure taught us, written down as code.

Why we're publishing this

Dark Forest Labs is a small lab that builds with AI agents, and we treat our failures as artifacts worth preserving — for ourselves and for anyone else building the same way. The whole point of writing this down is that the next build remembers it.

Q Studio failed. The write-up is the part that succeeds.


Q Studio's code remains archived privately as the raw record. This post-mortem is the public distillation. No live credentials or account internals are included; the AWS account ID and the expired temporary keys that lived in the original repository have been deliberately omitted.

About

A post-mortem of Q Studio — Dark Forest Labs' quantum platform that wasn't, and what we learned about building with AI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors