Sequant Workflow Analysis — March 2026
Generated: 2026-03-25 Data range: 2026-01-09 to 2026-03-25 Runs analyzed: 82 runs, 167 issues
Baselines
Section titled “Baselines”| Metric | Value |
|---|---|
| Overall success rate | 68.3% |
| Average run duration | 27m 52s |
| Single-issue success rate | 71.9% |
| Chain-mode success rate | 50.0% |
Phase Durations (avg)
Section titled “Phase Durations (avg)”| Phase | Avg Duration | Count |
|---|---|---|
| spec | 3m 56s | 139 |
| exec | 6m 11s | 171 |
| qa | 4m 16s | 134 |
| testgen | 4m 33s | 14 |
| test | 2m 26s | 2 |
Top 5 Optimization Targets (AC-6)
Section titled “Top 5 Optimization Targets (AC-6)”Ranked by estimated impact (frequency x severity):
1. Tooling Failures (60.6% of all failures)
Section titled “1. Tooling Failures (60.6% of all failures)”Impact: HIGH — 43 of 71 failed phases are generic “process exited with code 1” errors. These are opaque and undiagnosable from log data alone.
Root cause: Claude Code process crashes are logged as generic exit codes without structured error context. The actual failure reason (context overflow, API error, hook failure) is lost.
Recommended action: Improve error capture in the phase runner to extract structured error info before the process exits. Create follow-up issue.
2. Low First-Pass QA Rate (38.2%)
Section titled “2. Low First-Pass QA Rate (38.2%)”Impact: HIGH — Only 38.2% of issues pass QA on the first attempt. Each additional QA iteration costs ~4m 16s plus the fix cycle time. 18 issues required multiple QA attempts, with 6 requiring 3 attempts.
Root cause candidates:
- 44% of QA phases have no verdict recorded (early logs before verdict tracking)
- Excluding no-verdict phases: first-pass rate among issues with verdicts is higher
- Common first-verdict failures: AC_NOT_MET (8 issues), AC_MET_BUT_NOT_A_PLUS (5 issues)
Recommended action: Investigate whether exec quality can be improved to reduce QA iterations. The spec-to-exec handoff may be losing acceptance criteria precision. Create follow-up issue.
3. Documentation Issues: 42% Success Rate
Section titled “3. Documentation Issues: 42% Success Rate”Impact: MEDIUM — Documentation-labeled issues have the lowest success rate of any significant label category (42% vs 77% for enhancement, 100% for bug).
Root cause hypothesis: Documentation issues may require different workflow phases or have ambiguous ACs. The standard spec-exec-qa pipeline may be a poor fit for doc-only work.
Recommended action: Review failed documentation runs to identify common patterns. Consider a lighter workflow for docs-only issues (skip spec, simpler QA).
4. Chain-Mode Degradation (50% vs 72% single-issue)
Section titled “4. Chain-Mode Degradation (50% vs 72% single-issue)”Impact: MEDIUM — Chain mode (sequential issues branching from each other) succeeds at only 50%, significantly below single-issue runs. Multi-issue runs in general are 67% vs 72%.
Root cause hypothesis: Chain mode accumulates technical debt across issues — a small failure in issue N breaks issue N+1. Worktree conflicts and merge issues compound.
Recommended action: Evaluate whether chain mode is worth the risk. Consider defaulting to parallel worktrees instead.
5. Spec Phase Failures (12 failures, 8.6% of spec runs)
Section titled “5. Spec Phase Failures (12 failures, 8.6% of spec runs)”Impact: MEDIUM — Spec failures prevent any work from starting. 12 spec failures out of 139 runs (8.6%) means roughly 1 in 12 issues can’t even begin.
Root cause: Likely GitHub API issues, context overflow on complex issues, or rate limits during issue fetch.
Recommended action: Add retry logic to spec phase for transient failures.
Decision Framework (AC-7)
Section titled “Decision Framework (AC-7)”| Finding | Category | Action Type | Priority | Est. Effort |
|---|---|---|---|---|
| Opaque tooling failures (60.6%) | Error handling | Code change (phase runner) | P1 | Medium |
| Low first-pass QA rate (38.2%) | Workflow quality | Investigation + prompt change | P1 | Large |
| Doc issues 42% success | Workflow fit | Config change (lighter pipeline) | P2 | Small |
| Chain-mode 50% success | Feature design | Investigation + config | P2 | Medium |
| Spec failures 8.6% | Reliability | Code change (retry logic) | P3 | Small |
| No-verdict QA phases (44%) | Data quality | Code change (backfill) | P3 | Small |
| Rate limits (5.6% of failures) | Infrastructure | Config (throttling) | P3 | Small |
Action Type Key
Section titled “Action Type Key”- Code change: Modify TypeScript source (phase runner, stats, etc.)
- Prompt change: Update skill SKILL.md prompts (spec, exec, qa)
- Config change: Adjust default settings or workflow parameters
- Investigation: Deeper analysis needed before prescribing a fix
QA Verdict Distribution
Section titled “QA Verdict Distribution”| Verdict | Count | Percentage |
|---|---|---|
| no_verdict | 59 | 44.0% |
| READY_FOR_MERGE | 49 | 36.6% |
| AC_NOT_MET | 13 | 9.7% |
| AC_MET_BUT_NOT_A_PLUS | 11 | 8.2% |
| NEEDS_VERIFICATION | 2 | 1.5% |
Note: The high no_verdict rate (44%) is an artifact of early log versions before verdict tracking was implemented. Excluding no-verdict phases, the verdict distribution is: READY_FOR_MERGE 65.3%, AC_NOT_MET 17.3%, AC_MET_BUT_NOT_A_PLUS 14.7%, NEEDS_VERIFICATION 2.7%.
Temporal Trends
Section titled “Temporal Trends”Success rate improved from 44% in week 1 to 79% in the most recent week, with a dip during weeks with more complex issues (Jan 26: 68%, Mar 16: 40%). Average duration increased as issues became more complex (11m early to 46m recent).
Segmentation by Label
Section titled “Segmentation by Label”| Label | Issues | Success Rate | Avg Duration |
|---|---|---|---|
| bug | 17 | 100% | 10m 12s |
| planned | 17 | 82% | 8m 3s |
| cli | 17 | 82% | 17m 41s |
| enhancement | 84 | 77% | 15m 35s |
| auto-ready | 16 | 88% | 18m 37s |
| multi-agent | 13 | 69% | 22m 13s |
| documentation | 12 | 42% | 8m 44s |
Key insight: Bug fixes are 100% successful — well-scoped, clear ACs, predictable changes. Multi-agent and documentation issues are most challenging.
Repeatable Analysis Approach (AC-10)
Section titled “Repeatable Analysis Approach (AC-10)”To re-run this analysis after workflow changes:
# Full human-readable reportnpx tsx scripts/analytics/analyze-runs.ts
# JSON output for programmatic processingnpx tsx scripts/analytics/analyze-runs.ts --json
# Custom log directorynpx tsx scripts/analytics/analyze-runs.ts --path /path/to/logs
# Enhanced stats with detailed metricsnpx sequant stats --detailedThe analysis script loads all .sequant/logs/run-*.json files and computes:
- Baselines — success rates, durations, chain vs single mode
- Temporal trends — weekly buckets showing progression
- QA analysis — verdict distribution and first-pass rate
- Failure forensics — categorization by failure mode
- Segmentation — per-label and per-issue-count breakdowns
Compare against this baseline report to measure the impact of optimizations.