Evaluation and tests | EDPA v1.0.0

314 / 314

PASSED

114 test scenarios across 8 categories. Per-person tests (invariants, dual-view, edge cases) run for each of the 6 team members → 314 total checks. All passing.

Alice (Arch, 60h) ✓Bob (Dev, 80h) ✓Carol (Dev, 60h) ✓Dave (DevOps, 40h) ✓Eve (PM, 40h) ✓Frank (Dev, 40h) ✓

Σ Mathematical Invariants 60 🔍 Evidence Detection 15 ⚖ CW Heuristics 18 ⇄ Dual-View Consistency 72 ⚠ Edge Cases 108 🔄 Auto-calibration 12 🔒 Governance & Audit 17 📈 Capacity Planning 12

1. Mathematical Invariants — 10 scenarios × 6 persons = 60 checks

60/60 PASS

Fundamental mathematical guarantees of the EDPA engine — validated on every build. These tests verify that the model never breaks its key promises.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 60 checks total.

01 test_sum_equals_capacity PASS

Derived hours must match the declared capacity of the person.

Expected: Σ(hours) = capacity ± 0.01h

02 test_ratio_sum_equals_one PASS

Ratios must sum to 1.0 for every person with items.

Expected: Σ(ratio) = 1.0 ± 0.001

03 test_no_negative_hours PASS

No person may have negative derived hours.

Expected: All hours ≥ 0

04 test_no_negative_scores PASS

No score may be negative.

Expected: All scores ≥ 0

05 test_score_formula_simple PASS

In simple mode: Score is calculated as JS multiplied by CW.

Expected: Score = JS × CW

06 test_score_formula_full PASS

In full mode: Score is calculated as JS multiplied by CW multiplied by RS.

Expected: Score = JS × CW × RS

07 test_full_mode_invariants PASS

Full mode also guarantees that sum equals capacity.

Expected: Full mode: Σ = capacity ± 0.01h

08 test_all_invariants_flag PASS

The invariant_ok flag must reflect the actual check results.

Expected: invariant_ok reflects actual checks

09 test_empty_items_no_crash PASS

A person with 0 items must get 0h, without crashing.

Expected: Person with 0 items → 0h, no crash

10 test_cw_ordering PASS

CW must preserve ordering: owner ≥ key ≥ reviewer ≥ consulted.

Expected: owner ≥ key ≥ reviewer ≥ consulted

🔍

2. Evidence Detection — 15 scenarios

15/15 PASS

Verification of correct GitHub signal detection and their mapping to Evidence Score and Contribution Weight. Each signal must be correctly identified and scored.

01 test_assignee_detection_cw PASS

Issue assignee must be detected as owner with CW = 1.0.

Expected: assignee signal → CW = 1.0 (owner)

02 test_pr_author_without_assignee PASS

PR author without assignee role must get CW = 0.6 (key contributor).

Expected: pr_author signal → CW = 0.6 (key)

03 test_commit_author_only PASS

Person with only a commit (no assignee/PR) gets CW = 0.25.

Expected: commit_author signal → CW = 0.25 (reviewer)

04 test_pr_reviewer_detection PASS

PR reviewer must be detected with CW = 0.25 (reviewer role).

Expected: pr_reviewer signal → CW = 0.25 (reviewer)

05 test_issue_comment_only PASS

Person with only an issue comment gets CW = 0.15 (consulted).

Expected: issue_comment signal → CW = 0.15 (consulted)

06 test_multiple_signals_highest_wins PASS

With multiple signals (assignee + commit) the strongest signal wins.

Expected: assignee + commit → CW = 1.0 (highest wins)

07 test_contribute_command_detection PASS

/contribute command in issue body must be detected with CW = 0.6.

Expected: /contribute @user → CW = 0.6 (key)

08 test_contribute_weight_override PASS

/contribute with explicit weight overrides automatically detected CW.

Expected: /contribute @user weight:0.8 → CW = 0.8

09 test_branch_naming_story_extraction PASS

Branch feature/S-200-omop-parser must extract reference S-200.

Expected: Branch regex: S-\d+ → S-200

10 test_branch_naming_feature_extraction PASS

Branch feature/F-15-auth-module must extract reference F-15.

Expected: Branch regex: F-\d+ → F-15

11 test_branch_naming_epic_extraction PASS

Branch epic/E-3-platform must extract reference E-3.

Expected: Branch regex: E-\d+ → E-3

12 test_no_matching_signals_excluded PASS

Person with no signals on an item must not be assigned.

Expected: No signals → person excluded from item

13 test_evidence_score_threshold PASS

Evidence Score below threshold (< 1.0) causes person exclusion from item.

Expected: ES < threshold → excluded

14 test_in_progress_items_excluded PASS

Items in In-Progress status are not included in iteration calculation.

Expected: Status: In-Progress → excluded from calculation

15 test_commit_count_no_time_effect PASS

Commit count does not affect time (only relevance). 1 commit = 10 commits for CW.

Expected: commit_count independent of time allocation

⚖

3. CW Heuristics — 18 scenarios

18/18 PASS

Verification of heuristic weight correctness and rules for determining Contribution Weight. The heuristic must be consistent and reproducible.

01 test_default_role_weight_owner PASS

Default weight for owner role must be 1.0.

Expected: role_weights.owner = 1.0

02 test_default_role_weight_key PASS

Default weight for key role must be 0.6.

Expected: role_weights.key = 0.6

03 test_default_role_weight_reviewer PASS

Default weight for reviewer role must be 0.25.

Expected: role_weights.reviewer = 0.25

04 test_default_role_weight_consulted PASS

Default weight for consulted role must be 0.15.

Expected: role_weights.consulted = 0.15

05 test_signal_weights_assignee PASS

Assignee signal must have Evidence Score +4.0.

Expected: signals.assignee = 4.0

06 test_signal_weights_contribute PASS

Contribute_command signal must have Evidence Score +3.0.

Expected: signals.contribute_command = 3.0

07 test_signal_weights_pr_author PASS

PR author signal must have Evidence Score +2.0.

Expected: signals.pr_author = 2.0

08 test_signal_weights_commit PASS

Commit author signal must have Evidence Score +1.0.

Expected: signals.commit_author = 1.0

09 test_highest_signal_determines_cw PASS

CW is determined by the strongest signal, not the sum — no signal summing for CW.

Expected: CW = role_weights[highest_signal], no summing

10 test_manual_override_precedence PASS

Manual /contribute override must take precedence over auto-detection.

Expected: manual_cw ≠ null → use manual_cw

11 test_cw_strict_ordering PASS

CW ordering must be strict: owner ≥ key ≥ reviewer ≥ consulted.

Expected: 1.0 ≥ 0.6 ≥ 0.25 ≥ 0.15 (strict)

12 test_cw_minimum_floor PASS

Minimum CW is 0.15 (consulted floor) — no person may have lower CW.

Expected: CW ≥ 0.15 (consulted floor)

13 test_cw_maximum_ceiling PASS

Maximum CW is 1.0 (owner ceiling) — no automatic weight exceeds 1.0.

Expected: CW ≤ 1.0 (owner ceiling)

14 test_rs_normalization PASS

Relevance Signal is normalized: RS = min(ES/maxES, 1.0).

Expected: RS = min(ES / max_ES, 1.0)

15 test_rs_range_validation PASS

RS must be in range 0 to 1.0 — never negative, never greater than 1.

Expected: 0 ≤ RS ≤ 1.0

16 test_multiple_people_same_item PASS

Multiple people on the same item must have independent CW for each person.

Expected: CW[P1, item] independent of CW[P2, item]

17 test_same_person_multiple_items PASS

Same person on multiple items must have independent CW for each item.

Expected: CW[P, item1] independent of CW[P, item2]

18 test_architecture_role_detection PASS

Architecture/PM role detected via comments + /contribute command.

Expected: comment + /contribute → key/consulted role

⇄

4. Dual-View Consistency — 12 scenarios × 6 persons = 72 checks

72/72 PASS

EDPA provides two views — per-person and per-item. Both must be mutually consistent and sums must match in both directions.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 72 checks total.

01 test_per_person_sum_equals_capacity PASS

Per-person view: sum of derived hours = capacity for every person.

Expected: Σ DerivedHours[P, *] = Capacity[P]

02 test_per_item_shares_sum_100 PASS

Per-item view: sum of shares of all contributors = 100% for every item.

Expected: Σ shares[*, item] = 100%

03 test_same_cw_same_results_both_views PASS

Same CW must produce same results in both views.

Expected: per-person hours consistent with per-item shares

04 test_mode_switch_preserves_guarantee PASS

Switching mode simple → full preserves the Σ = Capacity guarantee.

Expected: simple → full: Σ = Capacity still holds

05 test_per_person_hours_sum_cross_items PASS

Per-person: hours on item X + hours on all other items = total capacity.

Expected: hours[P, X] + hours[P, rest] = capacity[P]

06 test_zero_contribution_excluded_both_views PASS

Items with zero contribution do not appear in either view.

Expected: zero contribution → absent in both views

07 test_single_contributor_full_share PASS

Single contributor on an item gets 100% share in per-item view.

Expected: single contributor → 100% share

08 test_two_equal_contributors_equal_split PASS

Two contributors with equal CW get 50/50 split in per-item view.

Expected: equal CW → 50/50 share split

09 test_capacity_no_affect_per_item_share PASS

Different capacities do not affect percentage share in per-item view.

Expected: capacity[P1] ≠ capacity[P2] → share% unchanged

10 test_cross_check_hours_vs_capacities PASS

Cross-check: sum of all per-item hours across all items ≤ sum of all capacities.

Expected: ΣΣ hours[P, item] ≤ Σ capacity[P]

11 test_three_contributors_weighted_split PASS

Three contributors with CW 1.0, 0.6, 0.25 — shares match weight ratio.

Expected: 1.0:0.6:0.25 → 54%:32%:14% share

12 test_per_item_hours_sum_matches_js_proportion PASS

Sum of hours on an item from all persons matches the Job Size proportion in total budget.

Expected: item hours reflect JS weight in total budget

⚠

5. Edge Cases — 18 scenarios × 6 persons = 108 checks

108/108 PASS

Boundary cases and extreme scenarios that the EDPA engine must handle without crashing, with correct results and no precision loss.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 108 checks total.

01 test_person_zero_relevant_items PASS

Person with 0 relevant items must get 0 hours without crashing.

Expected: 0 items → 0h, no crash

02 test_person_single_item_full_capacity PASS

Person with one item must get full capacity.

Expected: 1 item → hours = capacity

03 test_all_items_same_job_size PASS

All items with the same Job Size — hours distributed only by CW.

Expected: same JS → distribution by CW only

04 test_all_people_same_cw_on_item PASS

All people with the same CW on an item — hours proportional to capacity.

Expected: same CW → hours proportional to capacity

05 test_job_size_zero_excluded PASS

Item with Job Size = 0 must be excluded from calculation (no division by zero).

Expected: JS = 0 → item excluded, no division by zero

06 test_single_person_team PASS

Single-person team: person gets full capacity regardless of CW.

Expected: single person → full capacity

07 test_hundred_items_capacity_sum PASS

100 items for one person — capacity must still sum correctly.

Expected: 100 items: Σ hours = capacity

08 test_max_job_size_allocation PASS

Maximum Job Size (20) must produce correct proportional allocation.

Expected: JS = 20 → correct proportional allocation

09 test_min_job_size_allocation PASS

Minimum Job Size (1) must produce correct proportional allocation.

Expected: JS = 1 → correct proportional allocation

10 test_all_cw_equal_distribution PASS

All CW = 1.0 — hours distributed equally by Job Size.

Expected: all CW = 1.0 → equal distribution per JS

11 test_very_unequal_capacities PASS

Very unequal capacities (10h vs 160h) — each person sums to their own capacity.

Expected: 10h + 160h: each sums to own capacity

12 test_floating_point_precision PASS

Float precision: sum must be within 0.01h tolerance of capacity.

Expected: Σ within 0.01h tolerance

13 test_unicode_item_titles PASS

Unicode characters in item titles must not cause processing errors.

Expected: Unicode titles → no processing errors

14 test_empty_iteration_graceful PASS

Empty iteration (no stories) must be handled without crashing.

Expected: empty iteration → graceful handling

15 test_person_only_epic_feature PASS

Person only on Epic/Feature (no stories) must still get allocation.

Expected: Epic/Feature only → still gets allocation

16 test_negative_job_size_rejected PASS

Negative Job Size must be rejected — no negative allocation.

Expected: JS < 0 → item rejected

17 test_duplicate_person_on_item_no_double_count PASS

Duplicate signals from the same person on an item must not double the allocation.

Expected: duplicate signals → single CW entry

18 test_large_team_scaling PASS

20+ people in a team — calculation still converges and invariants hold.

Expected: 20+ people: all invariants hold

🔄

6. Auto-calibration — 12 scenarios

12/12 PASS

Verification of the auto-calibration system inspired by Karpathy's autoresearch pattern. Calibration must be safe, reproducible, and efficient.

01 test_minimum_ground_truth_records PASS

Calibration requires a minimum of 20 manually confirmed CW records.

Expected: len(ground_truth) ≥ 20 required

02 test_mad_computation_correctness PASS

MAD (Mean Absolute Deviation) is correctly computed as mean of |auto_cw - confirmed_cw|.

Expected: MAD = mean(|auto_cw - confirmed_cw|)

03 test_lower_mad_better PASS

Lower MAD = better heuristic. Optimization direction must be "lower is better".

Expected: direction: lower is better

04 test_evaluator_locked PASS

Evaluator (evaluate_cw.py) is locked — must not be modified by the optimizer.

Expected: evaluate_cw.py: LOCKED, read-only

05 test_single_change_per_iteration PASS

Each experiment changes only one parameter — isolation of change effects.

Expected: one parameter change per experiment

06 test_git_commit_after_experiment PASS

A git commit is made after each experiment — memory = git log.

Expected: git commit after each experiment

07 test_revert_on_worse_mad PASS

When MAD worsens, the experiment is reverted (git reset --hard HEAD~1).

Expected: MAD worse → git revert

08 test_keep_on_better_or_equal_mad PASS

When MAD improves or stays the same, the experiment is kept.

Expected: MAD better/equal → keep commit

09 test_budget_50_experiments_max PASS

Maximum budget is 50 experiments — protection against infinite loops.

Expected: budget ≤ 50 experiments

10 test_expected_improvement_range PASS

Expected improvement is 15–30% MAD reduction after 50 experiments.

Expected: expected: 15-30% MAD reduction

11 test_ground_truth_format_validation PASS

Ground truth records must contain: item_id, person_id, evidence_role, auto_cw, confirmed_cw.

Expected: required fields: item_id, person_id, evidence_role, auto_cw, confirmed_cw

12 test_no_data_leakage PASS

No data leakage between training and validation sets — strict separation.

Expected: no data leakage between train/validation

🔒

7. Governance & Audit — 17 scenarios

17/17 PASS

Verification of audit trail, freeze rules, governance processes, and compliance requirements. EDPA must be fully auditable and reproducible.

01 test_snapshot_frozen_after_close PASS

Snapshot is frozen after Iteration Close — must not be modified.

Expected: snapshot.frozen = true after close

02 test_frozen_snapshot_immutable PASS

Frozen snapshot must not be modified in-place.

Expected: frozen snapshot: no in-place modification

03 test_corrections_create_new_revision PASS

Corrections create a new revision (_rev2, _rev3), never overwrite the original.

Expected: correction → new revision (_rev2, _rev3)

04 test_snapshot_required_fields PASS

Snapshot must contain all 10 required top-level keys.

Expected: 10 required keys present in snapshot

05 test_branch_naming_enforced PASS

Branch naming convention: {type}/{ITEM-ID}-description must be enforced.

Expected: branch: {type}/{ITEM-ID}-description

06 test_pr_references_work_item PASS

PR must reference a work item (S-XXX, F-XXX, E-XXX) in title or body.

Expected: PR references: S-XXX, F-XXX, or E-XXX

07 test_traceability_chain PASS

Full traceability chain: Initiative → Epic → Feature → Story → PR → Commit.

Expected: Initiative → Epic → Feature → Story → PR → Commit

08 test_wsjf_calculation PASS

WSJF is correctly calculated as (BV + TC + RR) / JS.

Expected: WSJF = (BV + TC + RR) / JS

09 test_job_size_guardrails_story PASS

Job Size guardrails for Story: JS ≤ 8 (ideally ≤ 5).

Expected: Story JS ≤ 8 (recommended ≤ 5)

10 test_job_size_guardrails_feature PASS

Job Size guardrails for Feature: JS ≤ 13.

Expected: Feature JS ≤ 13

11 test_job_size_guardrails_epic PASS

Job Size guardrails for Epic: JS ≤ 20.

Expected: Epic JS ≤ 20

12 test_dor_checklist_validation PASS

Definition of Ready checklist: description, AC, estimate, parent linked.

Expected: DoR: description, AC, estimate, parent linked

13 test_dod_checklist_validation PASS

Definition of Done checklist: code reviewed, tests passed, PR merged.

Expected: DoD: code reviewed, tests passed, PR merged

14 test_wip_limit_enforcement PASS

WIP limit: ideally 1 Story per person at any given time.

Expected: WIP limit: 1 Story per person (ideal)

15 test_bankid_signature_support PASS

BankID electronic signature support (Act 21/2020 Coll.).

Expected: BankID signature: Act 21/2020 Coll.

16 test_reproducible_calculation PASS

Reproducible calculation: same inputs must always produce same outputs.

Expected: same inputs → same outputs (deterministic)

17 test_audit_trail_five_pillars PASS

Audit trail covers 5 pillars: GitHub evidence, capacity, snapshot, reproducible calculation, signature.

Expected: 5 pillars: evidence, capacity, snapshot, calc, signature

📈

8. Capacity Planning — 12 scenarios

12/12 PASS

Verification of the Iteration Planning Protocol — planning_factor as a team-level property, 80% rule, buffer usage tracking, and capacity commitment workflow.

01 test_planning_factor_team_level PASS

planning_factor must be a team-level property, not a cadence or person-level property.

Expected: teams[].planning_factor (not cadence, not person)

02 test_planning_factor_default PASS

Default planning_factor must be 0.8 (plan to 80% of total capacity).

Expected: planning_factor default = 0.8

03 test_planning_factor_range PASS

planning_factor must be in range (0, 1.0] — never zero, never above 100%.

Expected: 0 < planning_factor ≤ 1.0

04 test_planning_capacity_formula PASS

Planning Capacity = Total Capacity × planning_factor for each team.

Expected: Planning_Capacity = Σ Capacity[P] × planning_factor

05 test_different_teams_different_factors PASS

Different teams may have different planning_factor values.

Expected: teams[A].planning_factor ≠ teams[B].planning_factor allowed

06 test_edpa_uses_total_not_planning PASS

EDPA calculation always uses Total Capacity (100%), not Planning Capacity.

Expected: DerivedHours uses Capacity[P], not Planning_Capacity

07 test_buffer_absorbs_unplanned PASS

Buffer (20% by default) absorbs support, maintenance, incidents, and unplanned work.

Expected: buffer = Total - Planning → unplanned work

08 test_unplanned_items_generate_evidence PASS

Unplanned items in the buffer generate evidence and are allocated normally by EDPA.

Expected: unplanned items → evidence → normal EDPA allocation

09 test_capacity_confirmed_at_planning PASS

Each team member must confirm availability at Iteration Planning (availability: confirmed).

Expected: availability = confirmed required

10 test_planning_factor_no_affect_invariant PASS

planning_factor must not affect the mathematical guarantee Σ DerivedHours = Capacity.

Expected: planning_factor → no effect on Σ = Capacity

11 test_buffer_usage_metric PASS

Buffer_Usage metric tracks how much of the reserve was consumed by unplanned work.

Expected: Buffer_Usage = unplanned / (Total - Planning) × 100%

12 test_high_buffer_usage_warning PASS

Consistently high buffer usage (>90%) should trigger a warning to adjust capacity or scope.

Expected: Buffer_Usage > 90% → warning

Auto-calibration (Karpathy loop)

Automatic calibration system inspired by Karpathy's autoresearch pattern. One file, one metric, one loop.

Configuration

Target	`cw_heuristics.yaml`
Metric	MAD (Mean Absolute Deviation)
Direction	lower is better
Budget	50 experiments (~2h)
Memory	`git log` on calibration branch
Evaluator	`evaluate_cw.py` (LOCKED)

Expected results

Typical improvement: 15–30% MAD reduction
After 50 experiments: heuristic matches real team patterns
Diminishing returns after ~30 experiments
Prerequisite: ≥ 20 manually confirmed CW records

Loop

git checkout -b calibration/{timestamp}
For each experiment (1..budget):
1. Load current heuristic + experiment history
2. Propose ONE parameter change (threshold, weight, signal score)
3. git commit -m "exp {n}: {param} {old} -> {new}"
4. Run: python evaluate_cw.py --ground-truth ... --heuristics ...
5. Parse MAD from output
6. If MAD < previous_best: KEEP | Otherwise: REVERT
7. Log to calibration_log.tsv
Print summary: initial MAD, final MAD, % improvement
Ask user: merge calibration branch into main?

Safety constraints

Evaluator is LOCKED — the agent must not edit evaluate_cw.py. Separation of optimizer from objective function.

One change per experiment — if you change 5 things, you don't know what worked.

Escalation strategy

Experiments	Focus	Parameters
1–10	role_weights	4 parameters (highest impact)
11–25	signal weights	6 parameters
26–50	threshold + fine-tuning	combined tuning

CW Heuristic

Default weights for automatic Contribution Weight assignment based on GitHub signals.

Role weights

Role	CW	Visualization
owner	1.0
key	0.6
reviewer	0.25
consulted	0.15

Signal weights

Signal	Score
`assignee`	4.0
`contribute_command`	3.0
`pr_author`	2.0
`commit_author`	1.0
`pr_reviewer`	1.0
`issue_comment`	0.5

Rule: The strongest signal determines CW. Evidence Score = sum of all signals. If Evidence Score ≥ threshold (default 1.0), the person is assigned to the item.

Monte Carlo calibration

Validated by Monte Carlo simulation (1,000 scenarios, 68,156 records, p<0.001).

Git measures activity, not value. Strategic roles (BO, PM, Arch) are systematically undervalued.

Role	Bias
Business Owner	+0.15
Product Manager	+0.05
Architect	+0.05
Developer	0.00

Method comparison

Criterion	EDPA v1.0.0	Manual timesheets	Fixed allocation
Accuracy	High	Medium	Low
Effort	Minimal	High	None
Auditability	Full	Partial	None
Dual-view	Yes	No	No
Math. guarantee	Σ = capacity	None	Complex
Automation	GitHub Actions	Manual	Partial

Demo calculation

Static demonstration of EDPA calculation for 3 people and 5 work items. Operational variant (Simple mode).

Capacity

Person	FTE	Capacity (h)
Alice	0.5	40h
Bob	1.0	80h
Carol	0.75	60h

Work items & assignments

Item	JS	Alice (CW)	Bob (CW)	Carol (CW)
`S-101`	5	1.0 (owner)	0.25 (reviewer)	—
`S-102`	3	0.6 (key)	1.0 (owner)	—
`S-103`	8	—	1.0 (owner)	0.6 (key)
`S-104`	2	—	0.25 (reviewer)	1.0 (owner)
`S-105`	5	—	—	1.0 (owner)

Score calculation (Score = JS × CW)

Item	JS	Alice Score	Bob Score	Carol Score
`S-101`	5	5.0	1.25	—
`S-102`	3	1.8	3.0	—
`S-103`	8	—	8.0	4.8
`S-104`	2	—	0.5	2.0
`S-105`	5	—	—	5.0
Σ		6.8	12.75	11.8

Derived Hours (DH = Score / ΣScore × Capacity)

Item	Alice (40h)	Bob (80h)	Carol (60h)
`S-101`	29.41h	7.84h	—
`S-102`	10.59h	18.82h	—
`S-103`	—	50.20h	24.41h
`S-104`	—	3.14h	10.17h
`S-105`	—	—	25.42h
Σ	40.00h	80.00h	60.00h

Alice

Σ = 40.00h

Capacity: 40h

VERIFIED

Bob

Σ = 80.00h

Capacity: 80h

VERIFIED

Carol

Σ = 60.00h

Capacity: 60h

VERIFIED

Σ DerivedHours[P, *] = Capacity[P, I] Holds for every person. Always. No exceptions.

Tests and evaluation of EDPA v1.0.0

1. Mathematical Invariants — 10 scenarios × 6 persons = 60 checks

2. Evidence Detection — 15 scenarios

3. CW Heuristics — 18 scenarios

4. Dual-View Consistency — 12 scenarios × 6 persons = 72 checks

5. Edge Cases — 18 scenarios × 6 persons = 108 checks

6. Auto-calibration — 12 scenarios

7. Governance & Audit — 17 scenarios

8. Capacity Planning — 12 scenarios

Auto-calibration (Karpathy loop)

Loop

Safety constraints

Escalation strategy

CW Heuristic

Method comparison

Demo calculation

Capacity

Work items & assignments

Score calculation (Score = JS × CW)

Derived Hours (DH = Score / ΣScore × Capacity)