FaultWorx — hands-on troubleshooting interviews

The problem

You've hired this person before.

They crushed the system-design interview. They named all four levels of cache. Then a pod went into CrashLoopBackOff in week two — and they opened a ticket instead of opening the logs.

Resumes lie. Years of experience ≠ ability to debug under pressure.

Whiteboards reward memorization. Not the work the job actually is.

LeetCode measures whether someone studied LeetCode.

Take-homes get outsourced — to a friend, or to an LLM.

None of it answers the one thing you need to know: when something breaks at 2am, can this person fix it?

How it works

Break something. Watch them fix it.

A real environment, a real fault, a real terminal, a running clock. Exactly like the job.

Pick a scenario

A web app throwing 502s. A scheduled task failing silently. An archive job that can't auth to SQL. Choose from the open scenario library or write your own — each is a real environment with a deliberately-broken fault.

kubernetes windows database disk dns

They see it working — then you break it

The candidate sees the system healthy first, so they know what "right" looks like. Then the fault is injected. A trouble ticket lands — exactly as vague as the ones your team actually gets. The clock starts.

They troubleshoot in a real terminal

Live SSH, RDP, and kubectl — through the browser, no setup. Every session is recorded: not just did they fix it, but how they got there. Did they check the logs first, or guess? Did they trace the chain, or thrash?

They submit when they think it's fixed

No "verify" button. Just like production — they decide when it's done, and you find out if they were right. Configs and manifests are snapshotted at submit for review.

Why it's different

Not a quiz. The actual work.

⬡

Real environments

Actual EC2, actual Kubernetes, actual databases. They're not clicking through a quiz — they're in a live box that's genuinely broken.

◎

Tests method, not answers

The recording shows their reasoning. Two people fix the same disk; one ran df then du, the other blindly deleted logs. You'll see which is which.

⎇

Scenarios are Git folders

setup/ break/ validate/ — version them, PR them, fork someone else's. Point at any GitHub repo for community scenarios.

⌃

Escalating, multi-skill

Compose scenarios into role-based exams — L1 SRE, L3 SRE, Cloud Engineer. Pods, then DB connectivity, then a disk, then a cross-system incident.

⚿

You own your data

Self-host the whole platform. Your infra, your candidates, your scenarios, your control. No phone-home, no per-seat tax.

⏿

Built by on-call engineers

Not an HR product with a technical coat of paint. It's the test we wish we'd been able to give — made by people who've carried the pager.

Open source

Read the code. Run it yourself.

The entire platform — backend, frontend, example scenarios — is open and self-hostable. Clone it, point it at your own AWS account, run it behind your own firewall. No license keys, no phone-home, no per-seat tax. If you'd rather own the whole stack, you can.

★ Star on GitHub Read the docs Self-hosting guide

Built on the tools you already run: Terraform, Kubernetes, Guacamole.
Scenarios are portable folders. The schema is documented and versioned.

redis-crashloop-502/ ├── scenario.yml # the manifest ├── ticket.md # what they see ├── diagram.svg ├── setup/ # build known-good ├── break/ # inject the fault ├── validate/ # defines "fixed" └── solution.md # for review

Hosted

Or skip the ops. Let us run it.

Self-hosting means your AWS bill, your provisioning reliability, your teardown logic, your 2am page when an environment won't spin up.

✓ Isolated environment per candidate. Security-group enforced — no candidate can reach another's box.
✓ One stack per quiz, auto-torn-down. No orphaned resources, no surprise bill.
✓ Recording, scoring, result capture — handled for you.
✓ Flat price per quiz. Finish in 30 minutes or take the full two hours — same cost. Predictable, per-candidate.

Pay as you hire

$20 / candidate

no subscription · flat per quiz

Start a quiz →

Self-hosting is free, forever.

Why we built this

I'm an SRE. I've sat through interviews where the candidate had every certification and couldn't tail a log.

I've also watched people with thin resumes calmly walk a broken cluster back to health. The difference never showed up on paper — only when something was actually on fire.

So we built the fire. Safely, in a box, on a timer. This is the interview I always wanted to give.

— the FaultWorx team

Stop interviewing people who can't read a log.