Loupe Internal MVP
Coding-agent failures grouped for triage
Issue groups
Repeated failures grouped by normalized test, evaluator, and interceptor signals.
IssueCategoryRunsStatusLast seen
Refresh token test fails after auth refactortest_regression12unresolved14m ago
Grader marks caught sessions as missedwrong_evaluation7regressed38m ago
Interceptor skips defect injection on vague promptsinterceptor_failure5unresolved1h ago
Recent runs
Each run connects instruction, terminal, diff, tests, and grader output.
run_8f21network-nplusonegpt-5.4-minifailed$0.41
run_8f20auth-refreshgpt-5.4-minifailed$0.36
run_8f19sql-dept-avggpt-5.4-minipassed$0.28
Replay gate
Candidate prompt is blocked by grading agreement and false-negative thresholds.
- Dataset
- loupe-grading-regression-50
- Baseline
- grader-prompt-v12
- Candidate
- grader-prompt-v13
- Agreement delta
- -6pp