Evaluation and tests | EDPA v2.11.1

300 / 300

PASSED

105 test scenarios across 8 categories. Per-person tests (invariants, dual-view, edge cases) run for each of the 6 team members → 300 total checks. All passing.

Alice (Arch, 60h) ✓Bob (Dev, 80h) ✓Carol (Dev, 60h) ✓Dave (DevOps, 40h) ✓Eve (PM, 40h) ✓Frank (Dev, 40h) ✓

Σ Mathematical Invariants 54 🔍 Evidence Detection 10 ⚖ CW Heuristics 15 ⇄ Dual-View Consistency 72 ⚠ Edge Cases 108 🔄 Auto-calibration 12 🔒 Governance & Audit 17 📈 Capacity Planning 12

1. Mathematical Invariants — 9 scenarios × 6 persons = 54 checks

54/54 PASS

Fundamental mathematical guarantees of the EDPA engine — validated on every build. These tests verify that the model never breaks its key promises.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 54 checks total.

01 test_sum_equals_capacity PASS

Derived hours must match the declared capacity of the person.

Expected: Σ(hours) = capacity ± 0.01h

02 test_ratio_sum_equals_one PASS

Ratios must sum to 1.0 for every person with items.

Expected: Σ(ratio) = 1.0 ± 0.001

03 test_no_negative_hours PASS

No person may have negative derived hours.

Expected: All hours ≥ 0

04 test_no_negative_scores PASS

No score may be negative.

Expected: All scores ≥ 0

05 test_score_formula PASS

Score is calculated as JS multiplied by cw (per-item normalized share).

Expected: Score = JS × cw

06 test_per_item_cw_sums_to_one PASS

Per-item invariant: Σ cw across all persons on an item equals 1.0.

Expected: Σ_persons cw[*, item] = 1.0

07 test_capacity_invariant PASS

Per-person invariant: Σ derived hours equals declared capacity.

Expected: Σ_items DerivedHours[P, *] = Capacity[P, I] ± 0.01h

08 test_all_invariants_flag PASS

The invariant_ok flag must reflect the actual check results.

Expected: invariant_ok reflects actual checks

09 test_empty_items_no_crash PASS

A person with 0 items must get 0h, without crashing.

Expected: Person with 0 items → 0h, no crash

🔍

2. Evidence Detection — 10 scenarios

10/10 PASS

Verification of GitHub signal detection and additive aggregation into contribution_score. Signals add up; the resulting CW is the per-item normalized share (Σ across persons = 1.0 per item).

01 test_commit_author_signal PASS

Commit with S-XXX/F-XXX/E-XXX/I-XXX in branch/message adds signals.commit_author (default 4.00).

Expected: commit_author → +4.00 to score

02 test_pr_reviewer_signal PASS

Submitted PR review (excluding self) adds signals.pr_reviewer (default 2.17).

Expected: pr_reviewer → +2.17 to score

03 test_issue_comment_signal PASS

Issue/PR comment (bots excluded) adds signals.issue_comment (default 1.46).

Expected: issue_comment → +1.46 to score

04 test_signals_aggregate_additively PASS

When a person has commit + review + comment, weights sum (4.00 + 2.17 + 1.46 = 7.63); there is no highest-wins.

Expected: Σ fired signal_weights, not max

05 test_contribute_directive_additive PASS

/contribute @person weight:0.6 adds a manual:* signal with weight 0.6 — it does not override auto-detection.

Expected: /contribute @user weight:X → +X to score

06 test_per_item_normalization PASS

After aggregation, CW is normalized per item: cw = score / Σ persons. Invariant Σ cw[*, item] = 1.0.

Expected: Σ cw per item = 1.0 (±0.001)

07 test_branch_naming_extraction_S_F_E_I PASS

Branch regex extracts S-/F-/E-/I-XXX for Story, Feature, Epic, Initiative.

Expected: Detect: S-200, F-15, E-3, I-1

08 test_no_signals_no_contributor PASS

A person with no fired signal does not appear in contributors[] after aggregation.

Expected: score = 0 → excluded

09 test_bot_comments_filtered PASS

Comments from GitHub apps / bots do not count as issue_comment signals.

Expected: Bot login → signal does not fire

10 test_commit_count_no_time_effect PASS

Many commits by one author on an item count as a single commit_author signal (relevance, not volume).

Expected: 1 commit = 10 commits in terms of score

⚖

3. CW Heuristics — 15 scenarios

15/15 PASS

Verification of heuristic signal weight correctness and rules for determining Contribution Weight. The heuristic must be consistent and reproducible.

01 test_signal_weight_ordering PASS

Hierarchy: commit_author > pr_reviewer > issue_comment.

Expected: sw.commit_author >= sw.pr_reviewer >= sw.issue_comment

02 test_per_item_cw_normalization PASS

Per-item cw share: Σ cw across persons = 1.0 per item.

Expected: Σ_persons cw[*, item] = 1.0 (engine invariant)

03 test_per_person_capacity_invariant PASS

Per-person hours invariant: Σ hours = capacity_per_iteration.

Expected: Σ_items DerivedHours[P, *] = Capacity[P, I]

04 test_no_role_overrides_in_heuristics PASS

Calibration runs on 3 signal weights.

Expected: cw_heuristics.yaml.role_overrides not present

05 test_signal_weights_commit PASS

Commit author signal must have signal weight 4.00 in cw_heuristics.yaml.

Expected: signals.commit_author = 4.00

06 test_signal_weights_pr_reviewer PASS

PR reviewer signal must have signal weight 2.17 in cw_heuristics.yaml.

Expected: signals.pr_reviewer = 2.17

07 test_signal_weights_issue_comment PASS

Issue comment signal must have signal weight 1.46 in cw_heuristics.yaml.

Expected: signals.issue_comment = 1.46

08 test_signals_aggregate_additively PASS

CW is computed by additive signal aggregation + per-item normalization. No highest-signal-wins.

Expected: cw[P, item] = Σ signal_weights / Σ_persons Σ signal_weights

09 test_contribute_directive_additive PASS

Manual /contribute @person weight:X adds a manual:* signal with weight X — it does not override auto-detection.

Expected: cw_after = (cw_auto_score + X) / Σ_persons score

10 test_per_item_invariant PASS

Σ cw across persons on a single item = 1.0 (engine invariant, ±0.001).

Expected: Σ_persons cw[*, item] = 1.0

11 test_cw_range_0_1 PASS

CW is a per-item normalized share, always in [0, 1.0]. No fixed per-signal floor/ceiling.

Expected: 0 ≤ cw[P, item] ≤ 1.0

12 test_no_negative_contribution_score PASS

contribution_score sums positive signal weights — no signal subtracts.

Expected: contribution_score ≥ 0 always

13 test_per_item_independence PASS

A person's CW on item A is independent of CW on item B (per-item normalization).

Expected: cw[P, A] independent of cw[P, B]

14 test_per_person_independence PASS

Two persons' CW on the same item are independent (both come from their own signal aggregations).

Expected: cw[P1, item] independent of cw[P2, item]

15 test_strategic_role_via_signal_calibration PASS

PM/BO/Arch contribution is captured via issue_comment + pr_reviewer + manual /contribute. Per-role multipliers are not used — bias is handled by calibrating signal weights against ground truth.

Expected: No role_overrides in heuristics

⇄

4. Dual-View Consistency — 12 scenarios × 6 persons = 72 checks

72/72 PASS

EDPA provides two views — per-person and per-item. Both must be mutually consistent and sums must match in both directions.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 72 checks total.

01 test_per_person_sum_equals_capacity PASS

Per-person view: sum of derived hours = capacity for every person.

Expected: Σ DerivedHours[P, *] = Capacity[P]

02 test_per_item_shares_sum_100 PASS

Per-item view: sum of shares of all contributors = 100% for every item.

Expected: Σ shares[*, item] = 100%

03 test_same_cw_same_results_both_views PASS

Same CW must produce same results in both views.

Expected: per-person hours consistent with per-item shares

04 test_no_transitions_degenerates_to_done_credit PASS

When git history records no Feature/Epic/Initiative transitions, the engine credits only Story Done items.

Expected: gate_events empty → only Story Done credit fires

05 test_per_person_hours_sum_cross_items PASS

Per-person: hours on item X + hours on all other items = total capacity.

Expected: hours[P, X] + hours[P, rest] = capacity[P]

06 test_zero_contribution_excluded_both_views PASS

Items with zero contribution do not appear in either view.

Expected: zero contribution → absent in both views

07 test_single_contributor_full_share PASS

Single contributor on an item gets 100% share in per-item view.

Expected: single contributor → 100% share

08 test_two_equal_contributors_equal_split PASS

Two contributors with equal CW get 50/50 split in per-item view.

Expected: equal CW → 50/50 share split

09 test_capacity_no_affect_per_item_share PASS

Different capacities do not affect percentage share in per-item view.

Expected: capacity[P1] ≠ capacity[P2] → share% unchanged

10 test_cross_check_hours_vs_capacities PASS

Cross-check: sum of all per-item hours across all items ≤ sum of all capacities.

Expected: ΣΣ hours[P, item] ≤ Σ capacity[P]

11 test_three_contributors_weighted_split PASS

Three contributors with CW 1.0, 0.6, 0.25 — shares match weight ratio.

Expected: 1.0:0.6:0.25 → 54%:32%:14% share

12 test_per_item_hours_sum_matches_js_proportion PASS

Sum of hours on an item from all persons matches the Job Size proportion in total budget.

Expected: item hours reflect JS weight in total budget

⚠

5. Edge Cases — 18 scenarios × 6 persons = 108 checks

108/108 PASS

Boundary cases and extreme scenarios that the EDPA engine must handle without crashing, with correct results and no precision loss.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 108 checks total.

01 test_person_zero_relevant_items PASS

Person with 0 relevant items must get 0 hours without crashing.

Expected: 0 items → 0h, no crash

02 test_person_single_item_full_capacity PASS

Person with one item must get full capacity.

Expected: 1 item → hours = capacity

03 test_all_items_same_job_size PASS

All items with the same Job Size — hours distributed only by CW.

Expected: same JS → distribution by CW only

04 test_all_people_same_cw_on_item PASS

All people with the same CW on an item — hours proportional to capacity.

Expected: same CW → hours proportional to capacity

05 test_job_size_zero_excluded PASS

Item with Job Size = 0 must be excluded from calculation (no division by zero).

Expected: JS = 0 → item excluded, no division by zero

06 test_single_person_team PASS

Single-person team: person gets full capacity regardless of CW.

Expected: single person → full capacity

07 test_hundred_items_capacity_sum PASS

100 items for one person — capacity must still sum correctly.

Expected: 100 items: Σ hours = capacity

08 test_max_job_size_allocation PASS

Maximum Job Size (20) must produce correct proportional allocation.

Expected: JS = 20 → correct proportional allocation

09 test_min_job_size_allocation PASS

Minimum Job Size (1) must produce correct proportional allocation.

Expected: JS = 1 → correct proportional allocation

10 test_all_cw_equal_distribution PASS

All CW = 1.0 — hours distributed equally by Job Size.

Expected: all CW = 1.0 → equal distribution per JS

11 test_very_unequal_capacities PASS

Very unequal capacities (10h vs 160h) — each person sums to their own capacity.

Expected: 10h + 160h: each sums to own capacity

12 test_floating_point_precision PASS

Float precision: sum must be within 0.01h tolerance of capacity.

Expected: Σ within 0.01h tolerance

13 test_unicode_item_titles PASS

Unicode characters in item titles must not cause processing errors.

Expected: Unicode titles → no processing errors

14 test_empty_iteration_graceful PASS

Empty iteration (no stories) must be handled without crashing.

Expected: empty iteration → graceful handling

15 test_person_only_epic_feature PASS

Person only on Epic/Feature (no stories) must still get allocation.

Expected: Epic/Feature only → still gets allocation

16 test_negative_job_size_rejected PASS

Negative Job Size must be rejected — no negative allocation.

Expected: JS < 0 → item rejected

17 test_duplicate_person_on_item_no_double_count PASS

Duplicate signals from the same person on an item must not double the allocation.

Expected: duplicate signals → single CW entry

18 test_large_team_scaling PASS

20+ people in a team — calculation still converges and invariants hold.

Expected: 20+ people: all invariants hold

🔄

6. Auto-calibration — 12 scenarios

12/12 PASS

Verification of the auto-calibration system inspired by Karpathy's autoresearch pattern. Calibration must be safe, reproducible, and efficient.

01 test_minimum_ground_truth_records PASS

Calibration requires a minimum of 20 manually confirmed CW records.

Expected: len(ground_truth) ≥ 20 required

02 test_mad_computation_correctness PASS

MAD (Mean Absolute Deviation) is correctly computed as mean of |auto_cw - confirmed_cw|.

Expected: MAD = mean(|auto_cw - confirmed_cw|)

03 test_lower_mad_better PASS

Lower MAD = better heuristic. Optimization direction must be "lower is better".

Expected: direction: lower is better

04 test_calibrator_locked PASS

Calibrator (calibrate_signals.py) is locked — synthetic corpus + MAD cost function live in one file. The agent must not modify it (gaming).

Expected: calibrate_signals.py: LOCKED, read-only

05 test_single_change_per_iteration PASS

Each experiment changes only one parameter — isolation of change effects.

Expected: one parameter change per experiment

06 test_git_commit_after_experiment PASS

A git commit is made after each experiment — memory = git log.

Expected: git commit after each experiment

07 test_revert_on_worse_mad PASS

When MAD worsens, the experiment is reverted (git reset --hard HEAD~1).

Expected: MAD worse → git revert

08 test_keep_on_better_or_equal_mad PASS

When MAD improves or stays the same, the experiment is kept.

Expected: MAD better/equal → keep commit

09 test_budget_50_experiments_max PASS

Maximum budget is 50 experiments — protection against infinite loops.

Expected: budget ≤ 50 experiments

10 test_expected_improvement_range PASS

Expected improvement is 15–30% MAD reduction after 50 experiments.

Expected: expected: 15-30% MAD reduction

11 test_ground_truth_format_validation PASS

Ground truth records must contain: item_id, person_id, evidence_role, auto_cw, confirmed_cw.

Expected: required fields: item_id, person_id, evidence_role, auto_cw, confirmed_cw

12 test_no_data_leakage PASS

No data leakage between training and validation sets — strict separation.

Expected: no data leakage between train/validation

🔒

7. Governance & Audit — 17 scenarios

17/17 PASS

Verification of audit trail, freeze rules, governance processes, and compliance requirements. EDPA must be fully auditable and reproducible.

01 test_snapshot_frozen_after_close PASS

Snapshot is frozen after Iteration Close — must not be modified.

Expected: snapshot.frozen = true after close

02 test_frozen_snapshot_immutable PASS

Frozen snapshot must not be modified in-place.

Expected: frozen snapshot: no in-place modification

03 test_corrections_create_new_revision PASS

Corrections create a new revision (_rev2, _rev3), never overwrite the original.

Expected: correction → new revision (_rev2, _rev3)

04 test_snapshot_required_fields PASS

Snapshot must contain all 10 required top-level keys.

Expected: 10 required keys present in snapshot

05 test_branch_naming_enforced PASS

Branch naming convention: {type}/{ITEM-ID}-description must be enforced.

Expected: branch: {type}/{ITEM-ID}-description

06 test_pr_references_work_item PASS

PR must reference a work item (S-XXX, F-XXX, E-XXX) in title or body.

Expected: PR references: S-XXX, F-XXX, or E-XXX

07 test_traceability_chain PASS

Full traceability chain: Initiative → Epic → Feature → Story → PR → Commit.

Expected: Initiative → Epic → Feature → Story → PR → Commit

08 test_wsjf_calculation PASS

WSJF is correctly calculated as (BV + TC + RR&OE) / JS.

Expected: WSJF = (BV + TC + RR&OE) / JS

09 test_job_size_guardrails_story PASS

Job Size guardrails for Story: JS ≤ 8 (ideally ≤ 5).

Expected: Story JS ≤ 8 (recommended ≤ 5)

10 test_job_size_guardrails_feature PASS

Job Size guardrails for Feature: JS ≤ 13.

Expected: Feature JS ≤ 13

11 test_job_size_guardrails_epic PASS

Job Size guardrails for Epic: JS ≤ 20.

Expected: Epic JS ≤ 20

12 test_dor_checklist_validation PASS

Definition of Ready checklist: description, AC, estimate, parent linked.

Expected: DoR: description, AC, estimate, parent linked

13 test_dod_checklist_validation PASS

Definition of Done checklist: code reviewed, tests passed, PR merged.

Expected: DoD: code reviewed, tests passed, PR merged

14 test_wip_limit_enforcement PASS

WIP limit: ideally 1 Story per person at any given time.

Expected: WIP limit: 1 Story per person (ideal)

15 test_bankid_signature_support PASS

BankID electronic signature support (Act 21/2020 Coll.).

Expected: BankID signature: Act 21/2020 Coll.

16 test_reproducible_calculation PASS

Reproducible calculation: same inputs must always produce same outputs.

Expected: same inputs → same outputs (deterministic)

17 test_audit_trail_five_pillars PASS

Audit trail covers 5 pillars: GitHub evidence, capacity, snapshot, reproducible calculation, signature.

Expected: 5 pillars: evidence, capacity, snapshot, calc, signature

📈

8. Capacity Planning — 12 scenarios

12/12 PASS

Verification of the Iteration Planning Protocol — planning_factor as a team-level property, 80% rule, buffer usage tracking, and capacity commitment workflow.

01 test_planning_factor_team_level PASS

planning_factor must be a team-level property, not a cadence or person-level property.

Expected: teams[].planning_factor (not cadence, not person)

02 test_planning_factor_default PASS

Default planning_factor must be 0.8 (plan to 80% of total capacity).

Expected: planning_factor default = 0.8

03 test_planning_factor_range PASS

planning_factor must be in range (0, 1.0] — never zero, never above 100%.

Expected: 0 < planning_factor ≤ 1.0

04 test_planning_capacity_formula PASS

Planning Capacity = Total Capacity × planning_factor for each team.

Expected: Planning_Capacity = Σ Capacity[P] × planning_factor

05 test_different_teams_different_factors PASS

Different teams may have different planning_factor values.

Expected: teams[A].planning_factor ≠ teams[B].planning_factor allowed

06 test_edpa_uses_total_not_planning PASS

EDPA calculation always uses Total Capacity (100%), not Planning Capacity.

Expected: DerivedHours uses Capacity[P], not Planning_Capacity

07 test_buffer_absorbs_unplanned PASS

Buffer (20% by default) absorbs support, maintenance, incidents, and unplanned work.

Expected: buffer = Total - Planning → unplanned work

08 test_unplanned_items_generate_evidence PASS

Unplanned items in the buffer generate evidence and are allocated normally by EDPA.

Expected: unplanned items → evidence → normal EDPA allocation

09 test_capacity_confirmed_at_planning PASS

Each team member must confirm availability at Iteration Planning (availability: confirmed).

Expected: availability = confirmed required

10 test_planning_factor_no_affect_invariant PASS

planning_factor must not affect the mathematical guarantee Σ DerivedHours = Capacity.

Expected: planning_factor → no effect on Σ = Capacity

11 test_buffer_usage_metric PASS

Buffer_Usage metric tracks how much of the reserve was consumed by unplanned work.

Expected: Buffer_Usage = unplanned / (Total - Planning) × 100%

12 test_high_buffer_usage_warning PASS

Consistently high buffer usage (>90%) should trigger a warning to adjust capacity or scope.

Expected: Buffer_Usage > 90% → warning

Auto-calibration (Karpathy loop)

Automatic calibration system inspired by Karpathy's autoresearch pattern. One file, one metric, one loop.

Configuration

Target	`cw_heuristics.yaml.tmpl` (signals: block)
Metric	MAD (Mean Absolute Deviation) on the MC corpus
Direction	lower is better
Budget	2000 MC samples + coordinate descent on top-5 (~10s)
Search space	3D signal weights, each in [0.1, 8.0]
Calibrator	`calibrate_signals.py` (LOCKED)

Expected results

Typical improvement: 15–30% MAD reduction
After 50 experiments: heuristic matches real team patterns
Diminishing returns after ~30 experiments
Prerequisite: ≥ 20 manually confirmed CW records

Loop

Run python calibrate_signals.py --scenarios 1000 --seed 42
The script will:
1. Generate a synthetic Monte Carlo corpus (1,000 scenarios × ~31k records) procedurally from a model where signal counts probabilistically reflect each person's true cw share
2. Compute baseline MAD against shipped defaults (commit_author=4.00, pr_reviewer=2.17, issue_comment=1.46)
3. Phase 1 — random sampling: 2000 random weight vectors in the 3D space [0.1, 8.0], sorted by MAD
4. Phase 2 — coordinate descent: refines the top-5 candidates, tries ±step per signal, halves step on no-improvement
5. Return best calibrated weights + MAD improvement %
6. With --apply, rewrite cw_heuristics.yaml.tmpl + refresh the calibration: metadata
Print summary: baseline MAD, calibrated MAD, % improvement, top weights
(Optional) --report report.json dumps the full run for audit

Safety constraints

The calibrator is LOCKED — the agent must not edit calibrate_signals.py. The synthetic corpus generator + MAD cost function live in one file by design; separation by structure prevents gaming.

No parameters inside the cost function — evaluate_mad() takes only a weight vector and pure-reads signal_count × weight with per-item normalization.

Escalation strategy

Experiments	Focus	Parameters
1–30	signal weights (Monte Carlo)	3 parameters, random sampling
31–50	signal weights (Nelder-Mead refinement)	local descent around MC top-K

CW Heuristic

Default weights for automatic Contribution Weight assignment based on GitHub signals.

Derived role labels Display layer only

EDPA doesn't store roles per person — derives them from dominant signal type for UX (timesheets, reports). Engine math sees only cw values.

Dominant signal	Display role
`commit_author`	owner
`manual:*`	key
`pr_reviewer`	reviewer
`issue_comment`	consulted

Signal weights

Signal	Score
`commit_author`	4.00
`pr_reviewer`	2.17
`issue_comment`	1.46
`manual:*` (/contribute)	explicit

Rule: Signals add additively into contribution_score per (person, item). Per-item normalization gives cw = score / Σ_persons score. No "highest signal wins", no threshold — even one comment yields a proportional share.

Monte Carlo calibration

1,000 synthetic scenarios (32,210 records); 5 candidates converged to MAD 0.0805 after Nelder-Mead refinement.

Strategic-role bias correction: EDPA addresses strategic-role bias purely via signal weight calibration:

BO/PM contributions show via issue_comment + manual /contribute directives. Calibrator boosts issue_comment weight if BO/PM are under-credited.
Arch contributions show via pr_reviewer. Calibrator boosts pr_reviewer weight similarly.
Dev/QA reference baseline — Git accurately captures their work.
Edge-case generator simulates pm_driven, pair_programmed, design_heavy, silent_reviewer patterns so calibration generalizes.

Note: Pre-pilot baseline is from synthetic scenarios. After kashealth pilot's first PI close, calibration runs against real ground truth (≥20 confirmed cw records).

Method comparison

Criterion	EDPA v2.11.1	Manual timesheets	Fixed allocation
Accuracy	High	Medium	Low
Effort	Minimal	High	None
Auditability	Full	Partial	None
Dual-view	Yes	No	No
Math. guarantee	Σ = capacity	None	Complex
Automation	GitHub Actions	Manual	Partial

Demo calculation

Static demonstration of EDPA calculation for 3 people and 5 work items. Operational variant (Simple mode).

Capacity

Person	FTE	Capacity (h)
Alice	0.5	40h
Bob	1.0	80h
Carol	0.75	60h

Work items & assignments (CW per-item normalized, Σ per row = 1.0)

Item	JS	Alice (CW)	Bob (CW)	Carol (CW)	Σ
`S-101`	5	0.70	0.30	—	1.00
`S-102`	3	0.55	0.45	—	1.00
`S-103`	8	—	0.65	0.35	1.00
`S-104`	2	—	0.30	0.70	1.00
`S-105`	5	—	—	1.00	1.00

Score calculation (Score = JS × CW)

Item	JS	Alice Score	Bob Score	Carol Score
`S-101`	5	3.50	1.50	—
`S-102`	3	1.65	1.35	—
`S-103`	8	—	5.20	2.80
`S-104`	2	—	0.60	1.40
`S-105`	5	—	—	5.00
Σ		5.15	8.65	9.20

Derived Hours (DH = Score / ΣScore × Capacity)

Item	Alice (40h)	Bob (80h)	Carol (60h)
`S-101`	27.18h	13.87h	—
`S-102`	12.82h	12.49h	—
`S-103`	—	48.09h	18.26h
`S-104`	—	5.55h	9.13h
`S-105`	—	—	32.61h
Σ	40.00h	80.00h	60.00h

Alice

Σ = 40.00h

Capacity: 40h

VERIFIED

Bob

Σ = 80.00h

Capacity: 80h

VERIFIED

Carol

Σ = 60.00h

Capacity: 60h

VERIFIED

Σ DerivedHours[P, *] = Capacity[P, I] Holds for every person. Always. No exceptions.

Tests and evaluation of EDPA v2.11.1

1. Mathematical Invariants — 9 scenarios × 6 persons = 54 checks

2. Evidence Detection — 10 scenarios

3. CW Heuristics — 15 scenarios

4. Dual-View Consistency — 12 scenarios × 6 persons = 72 checks

5. Edge Cases — 18 scenarios × 6 persons = 108 checks

6. Auto-calibration — 12 scenarios

7. Governance & Audit — 17 scenarios

8. Capacity Planning — 12 scenarios

Auto-calibration (Karpathy loop)

Loop

Safety constraints

Escalation strategy

CW Heuristic

Method comparison

Demo calculation

Capacity

Work items & assignments (CW per-item normalized, Σ per row = 1.0)

Score calculation (Score = JS × CW)

Derived Hours (DH = Score / ΣScore × Capacity)