Loupe Internal MVP

Coding-agent failures grouped for triage

Failed runs24last 24h
Open issues93 regressed
Replay gateFailprompt-v13
Avg cost$0.34per run

Issue groups

Repeated failures grouped by normalized test, evaluator, and interceptor signals.

IssueCategoryRunsStatusLast seen
Refresh token test fails after auth refactortest_regression12unresolved14m ago
Grader marks caught sessions as missedwrong_evaluation7regressed38m ago
Interceptor skips defect injection on vague promptsinterceptor_failure5unresolved1h ago

Recent runs

Each run connects instruction, terminal, diff, tests, and grader output.

run_8f21network-nplusonegpt-5.4-minifailed$0.41
run_8f20auth-refreshgpt-5.4-minifailed$0.36
run_8f19sql-dept-avggpt-5.4-minipassed$0.28

Replay gate

Candidate prompt is blocked by grading agreement and false-negative thresholds.

Dataset
loupe-grading-regression-50
Baseline
grader-prompt-v12
Candidate
grader-prompt-v13
Agreement delta
-6pp