314 / 314
PASSED

114 test scenarios across 8 categories. Per-person tests (invariants, dual-view, edge cases) run for each of the 6 team members → 314 total checks. All passing.

Alice (Arch, 60h) ✓Bob (Dev, 80h) ✓Carol (Dev, 60h) ✓Dave (DevOps, 40h) ✓Eve (PM, 40h) ✓Frank (Dev, 40h) ✓
Σ

1. Mathematical Invariants — 10 scenarios × 6 persons = 60 checks

60/60 PASS

Fundamental mathematical guarantees of the EDPA engine — validated on every build. These tests verify that the model never breaks its key promises.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 60 checks total.
01 test_sum_equals_capacity PASS

Derived hours must match the declared capacity of the person.

Expected: Σ(hours) = capacity ± 0.01h
02 test_ratio_sum_equals_one PASS

Ratios must sum to 1.0 for every person with items.

Expected: Σ(ratio) = 1.0 ± 0.001
03 test_no_negative_hours PASS

No person may have negative derived hours.

Expected: All hours ≥ 0
04 test_no_negative_scores PASS

No score may be negative.

Expected: All scores ≥ 0
05 test_score_formula_simple PASS

In simple mode: Score is calculated as JS multiplied by CW.

Expected: Score = JS × CW
06 test_score_formula_full PASS

In full mode: Score is calculated as JS multiplied by CW multiplied by RS.

Expected: Score = JS × CW × RS
07 test_full_mode_invariants PASS

Full mode also guarantees that sum equals capacity.

Expected: Full mode: Σ = capacity ± 0.01h
08 test_all_invariants_flag PASS

The invariant_ok flag must reflect the actual check results.

Expected: invariant_ok reflects actual checks
09 test_empty_items_no_crash PASS

A person with 0 items must get 0h, without crashing.

Expected: Person with 0 items → 0h, no crash
10 test_cw_ordering PASS

CW must preserve ordering: owner ≥ key ≥ reviewer ≥ consulted.

Expected: owner ≥ key ≥ reviewer ≥ consulted
🔍

2. Evidence Detection — 15 scenarios

15/15 PASS

Verification of correct GitHub signal detection and their mapping to Evidence Score and Contribution Weight. Each signal must be correctly identified and scored.

01 test_assignee_detection_cw PASS

Issue assignee must be detected as owner with CW = 1.0.

Expected: assignee signal → CW = 1.0 (owner)
02 test_pr_author_without_assignee PASS

PR author without assignee role must get CW = 0.6 (key contributor).

Expected: pr_author signal → CW = 0.6 (key)
03 test_commit_author_only PASS

Person with only a commit (no assignee/PR) gets CW = 0.25.

Expected: commit_author signal → CW = 0.25 (reviewer)
04 test_pr_reviewer_detection PASS

PR reviewer must be detected with CW = 0.25 (reviewer role).

Expected: pr_reviewer signal → CW = 0.25 (reviewer)
05 test_issue_comment_only PASS

Person with only an issue comment gets CW = 0.15 (consulted).

Expected: issue_comment signal → CW = 0.15 (consulted)
06 test_multiple_signals_highest_wins PASS

With multiple signals (assignee + commit) the strongest signal wins.

Expected: assignee + commit → CW = 1.0 (highest wins)
07 test_contribute_command_detection PASS

/contribute command in issue body must be detected with CW = 0.6.

Expected: /contribute @user → CW = 0.6 (key)
08 test_contribute_weight_override PASS

/contribute with explicit weight overrides automatically detected CW.

Expected: /contribute @user weight:0.8 → CW = 0.8
09 test_branch_naming_story_extraction PASS

Branch feature/S-200-omop-parser must extract reference S-200.

Expected: Branch regex: S-\d+ → S-200
10 test_branch_naming_feature_extraction PASS

Branch feature/F-15-auth-module must extract reference F-15.

Expected: Branch regex: F-\d+ → F-15
11 test_branch_naming_epic_extraction PASS

Branch epic/E-3-platform must extract reference E-3.

Expected: Branch regex: E-\d+ → E-3
12 test_no_matching_signals_excluded PASS

Person with no signals on an item must not be assigned.

Expected: No signals → person excluded from item
13 test_evidence_score_threshold PASS

Evidence Score below threshold (< 1.0) causes person exclusion from item.

Expected: ES < threshold → excluded
14 test_in_progress_items_excluded PASS

Items in In-Progress status are not included in iteration calculation.

Expected: Status: In-Progress → excluded from calculation
15 test_commit_count_no_time_effect PASS

Commit count does not affect time (only relevance). 1 commit = 10 commits for CW.

Expected: commit_count independent of time allocation

3. CW Heuristics — 18 scenarios

18/18 PASS

Verification of heuristic weight correctness and rules for determining Contribution Weight. The heuristic must be consistent and reproducible.

01 test_default_role_weight_owner PASS

Default weight for owner role must be 1.0.

Expected: role_weights.owner = 1.0
02 test_default_role_weight_key PASS

Default weight for key role must be 0.6.

Expected: role_weights.key = 0.6
03 test_default_role_weight_reviewer PASS

Default weight for reviewer role must be 0.25.

Expected: role_weights.reviewer = 0.25
04 test_default_role_weight_consulted PASS

Default weight for consulted role must be 0.15.

Expected: role_weights.consulted = 0.15
05 test_signal_weights_assignee PASS

Assignee signal must have Evidence Score +4.0.

Expected: signals.assignee = 4.0
06 test_signal_weights_contribute PASS

Contribute_command signal must have Evidence Score +3.0.

Expected: signals.contribute_command = 3.0
07 test_signal_weights_pr_author PASS

PR author signal must have Evidence Score +2.0.

Expected: signals.pr_author = 2.0
08 test_signal_weights_commit PASS

Commit author signal must have Evidence Score +1.0.

Expected: signals.commit_author = 1.0
09 test_highest_signal_determines_cw PASS

CW is determined by the strongest signal, not the sum — no signal summing for CW.

Expected: CW = role_weights[highest_signal], no summing
10 test_manual_override_precedence PASS

Manual /contribute override must take precedence over auto-detection.

Expected: manual_cw ≠ null → use manual_cw
11 test_cw_strict_ordering PASS

CW ordering must be strict: owner ≥ key ≥ reviewer ≥ consulted.

Expected: 1.0 ≥ 0.6 ≥ 0.25 ≥ 0.15 (strict)
12 test_cw_minimum_floor PASS

Minimum CW is 0.15 (consulted floor) — no person may have lower CW.

Expected: CW ≥ 0.15 (consulted floor)
13 test_cw_maximum_ceiling PASS

Maximum CW is 1.0 (owner ceiling) — no automatic weight exceeds 1.0.

Expected: CW ≤ 1.0 (owner ceiling)
14 test_rs_normalization PASS

Relevance Signal is normalized: RS = min(ES/maxES, 1.0).

Expected: RS = min(ES / max_ES, 1.0)
15 test_rs_range_validation PASS

RS must be in range 0 to 1.0 — never negative, never greater than 1.

Expected: 0 ≤ RS ≤ 1.0
16 test_multiple_people_same_item PASS

Multiple people on the same item must have independent CW for each person.

Expected: CW[P1, item] independent of CW[P2, item]
17 test_same_person_multiple_items PASS

Same person on multiple items must have independent CW for each item.

Expected: CW[P, item1] independent of CW[P, item2]
18 test_architecture_role_detection PASS

Architecture/PM role detected via comments + /contribute command.

Expected: comment + /contribute → key/consulted role

4. Dual-View Consistency — 12 scenarios × 6 persons = 72 checks

72/72 PASS

EDPA provides two views — per-person and per-item. Both must be mutually consistent and sums must match in both directions.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 72 checks total.
01 test_per_person_sum_equals_capacity PASS

Per-person view: sum of derived hours = capacity for every person.

Expected: Σ DerivedHours[P, *] = Capacity[P]
02 test_per_item_shares_sum_100 PASS

Per-item view: sum of shares of all contributors = 100% for every item.

Expected: Σ shares[*, item] = 100%
03 test_same_cw_same_results_both_views PASS

Same CW must produce same results in both views.

Expected: per-person hours consistent with per-item shares
04 test_mode_switch_preserves_guarantee PASS

Switching mode simple → full preserves the Σ = Capacity guarantee.

Expected: simple → full: Σ = Capacity still holds
05 test_per_person_hours_sum_cross_items PASS

Per-person: hours on item X + hours on all other items = total capacity.

Expected: hours[P, X] + hours[P, rest] = capacity[P]
06 test_zero_contribution_excluded_both_views PASS

Items with zero contribution do not appear in either view.

Expected: zero contribution → absent in both views
07 test_single_contributor_full_share PASS

Single contributor on an item gets 100% share in per-item view.

Expected: single contributor → 100% share
08 test_two_equal_contributors_equal_split PASS

Two contributors with equal CW get 50/50 split in per-item view.

Expected: equal CW → 50/50 share split
09 test_capacity_no_affect_per_item_share PASS

Different capacities do not affect percentage share in per-item view.

Expected: capacity[P1] ≠ capacity[P2] → share% unchanged
10 test_cross_check_hours_vs_capacities PASS

Cross-check: sum of all per-item hours across all items ≤ sum of all capacities.

Expected: ΣΣ hours[P, item] ≤ Σ capacity[P]
11 test_three_contributors_weighted_split PASS

Three contributors with CW 1.0, 0.6, 0.25 — shares match weight ratio.

Expected: 1.0:0.6:0.25 → 54%:32%:14% share
12 test_per_item_hours_sum_matches_js_proportion PASS

Sum of hours on an item from all persons matches the Job Size proportion in total budget.

Expected: item hours reflect JS weight in total budget

5. Edge Cases — 18 scenarios × 6 persons = 108 checks

108/108 PASS

Boundary cases and extreme scenarios that the EDPA engine must handle without crashing, with correct results and no precision loss.

Each scenario runs per person: Alice, Bob, Carol, Dave, Eve, Frank — 108 checks total.
01 test_person_zero_relevant_items PASS

Person with 0 relevant items must get 0 hours without crashing.

Expected: 0 items → 0h, no crash
02 test_person_single_item_full_capacity PASS

Person with one item must get full capacity.

Expected: 1 item → hours = capacity
03 test_all_items_same_job_size PASS

All items with the same Job Size — hours distributed only by CW.

Expected: same JS → distribution by CW only
04 test_all_people_same_cw_on_item PASS

All people with the same CW on an item — hours proportional to capacity.

Expected: same CW → hours proportional to capacity
05 test_job_size_zero_excluded PASS

Item with Job Size = 0 must be excluded from calculation (no division by zero).

Expected: JS = 0 → item excluded, no division by zero
06 test_single_person_team PASS

Single-person team: person gets full capacity regardless of CW.

Expected: single person → full capacity
07 test_hundred_items_capacity_sum PASS

100 items for one person — capacity must still sum correctly.

Expected: 100 items: Σ hours = capacity
08 test_max_job_size_allocation PASS

Maximum Job Size (20) must produce correct proportional allocation.

Expected: JS = 20 → correct proportional allocation
09 test_min_job_size_allocation PASS

Minimum Job Size (1) must produce correct proportional allocation.

Expected: JS = 1 → correct proportional allocation
10 test_all_cw_equal_distribution PASS

All CW = 1.0 — hours distributed equally by Job Size.

Expected: all CW = 1.0 → equal distribution per JS
11 test_very_unequal_capacities PASS

Very unequal capacities (10h vs 160h) — each person sums to their own capacity.

Expected: 10h + 160h: each sums to own capacity
12 test_floating_point_precision PASS

Float precision: sum must be within 0.01h tolerance of capacity.

Expected: Σ within 0.01h tolerance
13 test_unicode_item_titles PASS

Unicode characters in item titles must not cause processing errors.

Expected: Unicode titles → no processing errors
14 test_empty_iteration_graceful PASS

Empty iteration (no stories) must be handled without crashing.

Expected: empty iteration → graceful handling
15 test_person_only_epic_feature PASS

Person only on Epic/Feature (no stories) must still get allocation.

Expected: Epic/Feature only → still gets allocation
16 test_negative_job_size_rejected PASS

Negative Job Size must be rejected — no negative allocation.

Expected: JS < 0 → item rejected
17 test_duplicate_person_on_item_no_double_count PASS

Duplicate signals from the same person on an item must not double the allocation.

Expected: duplicate signals → single CW entry
18 test_large_team_scaling PASS

20+ people in a team — calculation still converges and invariants hold.

Expected: 20+ people: all invariants hold
🔄

6. Auto-calibration — 12 scenarios

12/12 PASS

Verification of the auto-calibration system inspired by Karpathy's autoresearch pattern. Calibration must be safe, reproducible, and efficient.

01 test_minimum_ground_truth_records PASS

Calibration requires a minimum of 20 manually confirmed CW records.

Expected: len(ground_truth) ≥ 20 required
02 test_mad_computation_correctness PASS

MAD (Mean Absolute Deviation) is correctly computed as mean of |auto_cw - confirmed_cw|.

Expected: MAD = mean(|auto_cw - confirmed_cw|)
03 test_lower_mad_better PASS

Lower MAD = better heuristic. Optimization direction must be "lower is better".

Expected: direction: lower is better
04 test_evaluator_locked PASS

Evaluator (evaluate_cw.py) is locked — must not be modified by the optimizer.

Expected: evaluate_cw.py: LOCKED, read-only
05 test_single_change_per_iteration PASS

Each experiment changes only one parameter — isolation of change effects.

Expected: one parameter change per experiment
06 test_git_commit_after_experiment PASS

A git commit is made after each experiment — memory = git log.

Expected: git commit after each experiment
07 test_revert_on_worse_mad PASS

When MAD worsens, the experiment is reverted (git reset --hard HEAD~1).

Expected: MAD worse → git revert
08 test_keep_on_better_or_equal_mad PASS

When MAD improves or stays the same, the experiment is kept.

Expected: MAD better/equal → keep commit
09 test_budget_50_experiments_max PASS

Maximum budget is 50 experiments — protection against infinite loops.

Expected: budget ≤ 50 experiments
10 test_expected_improvement_range PASS

Expected improvement is 15–30% MAD reduction after 50 experiments.

Expected: expected: 15-30% MAD reduction
11 test_ground_truth_format_validation PASS

Ground truth records must contain: item_id, person_id, evidence_role, auto_cw, confirmed_cw.

Expected: required fields: item_id, person_id, evidence_role, auto_cw, confirmed_cw
12 test_no_data_leakage PASS

No data leakage between training and validation sets — strict separation.

Expected: no data leakage between train/validation
🔒

7. Governance & Audit — 17 scenarios

17/17 PASS

Verification of audit trail, freeze rules, governance processes, and compliance requirements. EDPA must be fully auditable and reproducible.

01 test_snapshot_frozen_after_close PASS

Snapshot is frozen after Iteration Close — must not be modified.

Expected: snapshot.frozen = true after close
02 test_frozen_snapshot_immutable PASS

Frozen snapshot must not be modified in-place.

Expected: frozen snapshot: no in-place modification
03 test_corrections_create_new_revision PASS

Corrections create a new revision (_rev2, _rev3), never overwrite the original.

Expected: correction → new revision (_rev2, _rev3)
04 test_snapshot_required_fields PASS

Snapshot must contain all 10 required top-level keys.

Expected: 10 required keys present in snapshot
05 test_branch_naming_enforced PASS

Branch naming convention: {type}/{ITEM-ID}-description must be enforced.

Expected: branch: {type}/{ITEM-ID}-description
06 test_pr_references_work_item PASS

PR must reference a work item (S-XXX, F-XXX, E-XXX) in title or body.

Expected: PR references: S-XXX, F-XXX, or E-XXX
07 test_traceability_chain PASS

Full traceability chain: Initiative → Epic → Feature → Story → PR → Commit.

Expected: Initiative → Epic → Feature → Story → PR → Commit
08 test_wsjf_calculation PASS

WSJF is correctly calculated as (BV + TC + RR) / JS.

Expected: WSJF = (BV + TC + RR) / JS
09 test_job_size_guardrails_story PASS

Job Size guardrails for Story: JS ≤ 8 (ideally ≤ 5).

Expected: Story JS ≤ 8 (recommended ≤ 5)
10 test_job_size_guardrails_feature PASS

Job Size guardrails for Feature: JS ≤ 13.

Expected: Feature JS ≤ 13
11 test_job_size_guardrails_epic PASS

Job Size guardrails for Epic: JS ≤ 20.

Expected: Epic JS ≤ 20
12 test_dor_checklist_validation PASS

Definition of Ready checklist: description, AC, estimate, parent linked.

Expected: DoR: description, AC, estimate, parent linked
13 test_dod_checklist_validation PASS

Definition of Done checklist: code reviewed, tests passed, PR merged.

Expected: DoD: code reviewed, tests passed, PR merged
14 test_wip_limit_enforcement PASS

WIP limit: ideally 1 Story per person at any given time.

Expected: WIP limit: 1 Story per person (ideal)
15 test_bankid_signature_support PASS

BankID electronic signature support (Act 21/2020 Coll.).

Expected: BankID signature: Act 21/2020 Coll.
16 test_reproducible_calculation PASS

Reproducible calculation: same inputs must always produce same outputs.

Expected: same inputs → same outputs (deterministic)
17 test_audit_trail_five_pillars PASS

Audit trail covers 5 pillars: GitHub evidence, capacity, snapshot, reproducible calculation, signature.

Expected: 5 pillars: evidence, capacity, snapshot, calc, signature
📈

8. Capacity Planning — 12 scenarios

12/12 PASS

Verification of the Iteration Planning Protocol — planning_factor as a team-level property, 80% rule, buffer usage tracking, and capacity commitment workflow.

01 test_planning_factor_team_level PASS

planning_factor must be a team-level property, not a cadence or person-level property.

Expected: teams[].planning_factor (not cadence, not person)
02 test_planning_factor_default PASS

Default planning_factor must be 0.8 (plan to 80% of total capacity).

Expected: planning_factor default = 0.8
03 test_planning_factor_range PASS

planning_factor must be in range (0, 1.0] — never zero, never above 100%.

Expected: 0 < planning_factor ≤ 1.0
04 test_planning_capacity_formula PASS

Planning Capacity = Total Capacity × planning_factor for each team.

Expected: Planning_Capacity = Σ Capacity[P] × planning_factor
05 test_different_teams_different_factors PASS

Different teams may have different planning_factor values.

Expected: teams[A].planning_factor ≠ teams[B].planning_factor allowed
06 test_edpa_uses_total_not_planning PASS

EDPA calculation always uses Total Capacity (100%), not Planning Capacity.

Expected: DerivedHours uses Capacity[P], not Planning_Capacity
07 test_buffer_absorbs_unplanned PASS

Buffer (20% by default) absorbs support, maintenance, incidents, and unplanned work.

Expected: buffer = Total - Planning → unplanned work
08 test_unplanned_items_generate_evidence PASS

Unplanned items in the buffer generate evidence and are allocated normally by EDPA.

Expected: unplanned items → evidence → normal EDPA allocation
09 test_capacity_confirmed_at_planning PASS

Each team member must confirm availability at Iteration Planning (availability: confirmed).

Expected: availability = confirmed required
10 test_planning_factor_no_affect_invariant PASS

planning_factor must not affect the mathematical guarantee Σ DerivedHours = Capacity.

Expected: planning_factor → no effect on Σ = Capacity
11 test_buffer_usage_metric PASS

Buffer_Usage metric tracks how much of the reserve was consumed by unplanned work.

Expected: Buffer_Usage = unplanned / (Total - Planning) × 100%
12 test_high_buffer_usage_warning PASS

Consistently high buffer usage (>90%) should trigger a warning to adjust capacity or scope.

Expected: Buffer_Usage > 90% → warning

Auto-calibration (Karpathy loop)

Automatic calibration system inspired by Karpathy's autoresearch pattern. One file, one metric, one loop.

Configuration
Targetcw_heuristics.yaml
MetricMAD (Mean Absolute Deviation)
Directionlower is better
Budget50 experiments (~2h)
Memorygit log on calibration branch
Evaluatorevaluate_cw.py (LOCKED)
Expected results
  • Typical improvement: 15–30% MAD reduction
  • After 50 experiments: heuristic matches real team patterns
  • Diminishing returns after ~30 experiments
  • Prerequisite: ≥ 20 manually confirmed CW records

Loop

  1. git checkout -b calibration/{timestamp}
  2. For each experiment (1..budget):
    1. Load current heuristic + experiment history
    2. Propose ONE parameter change (threshold, weight, signal score)
    3. git commit -m "exp {n}: {param} {old} -> {new}"
    4. Run: python evaluate_cw.py --ground-truth ... --heuristics ...
    5. Parse MAD from output
    6. If MAD < previous_best: KEEP | Otherwise: REVERT
    7. Log to calibration_log.tsv
  3. Print summary: initial MAD, final MAD, % improvement
  4. Ask user: merge calibration branch into main?

Safety constraints

Evaluator is LOCKED — the agent must not edit evaluate_cw.py. Separation of optimizer from objective function.
One change per experiment — if you change 5 things, you don't know what worked.

Escalation strategy

ExperimentsFocusParameters
1–10role_weights4 parameters (highest impact)
11–25signal weights6 parameters
26–50threshold + fine-tuningcombined tuning

CW Heuristic

Default weights for automatic Contribution Weight assignment based on GitHub signals.

Role weights
RoleCWVisualization
owner1.0
key0.6
reviewer0.25
consulted0.15
Signal weights
SignalScore
assignee4.0
contribute_command3.0
pr_author2.0
commit_author1.0
pr_reviewer1.0
issue_comment0.5
Rule: The strongest signal determines CW. Evidence Score = sum of all signals. If Evidence Score ≥ threshold (default 1.0), the person is assigned to the item.
Monte Carlo calibration

Validated by Monte Carlo simulation (1,000 scenarios, 68,156 records, p<0.001).

Git measures activity, not value. Strategic roles (BO, PM, Arch) are systematically undervalued.

RoleBias
Business Owner+0.15
Product Manager+0.05
Architect+0.05
Developer0.00

Method comparison

Criterion EDPA v1.0.0 Manual timesheets Fixed allocation
Accuracy High Medium Low
Effort Minimal High None
Auditability Full Partial None
Dual-view Yes No No
Math. guarantee Σ = capacity None Complex
Automation GitHub Actions Manual Partial

Demo calculation

Static demonstration of EDPA calculation for 3 people and 5 work items. Operational variant (Simple mode).

Capacity

PersonFTECapacity (h)
Alice0.540h
Bob1.080h
Carol0.7560h

Work items & assignments

ItemJSAlice (CW)Bob (CW)Carol (CW)
S-10151.0 (owner)0.25 (reviewer)
S-10230.6 (key)1.0 (owner)
S-10381.0 (owner)0.6 (key)
S-10420.25 (reviewer)1.0 (owner)
S-10551.0 (owner)

Score calculation (Score = JS × CW)

ItemJSAlice ScoreBob ScoreCarol Score
S-10155.01.25
S-10231.83.0
S-10388.04.8
S-10420.52.0
S-10555.0
Σ6.812.7511.8

Derived Hours (DH = Score / ΣScore × Capacity)

ItemAlice (40h)Bob (80h)Carol (60h)
S-10129.41h7.84h
S-10210.59h18.82h
S-10350.20h24.41h
S-1043.14h10.17h
S-10525.42h
Σ40.00h80.00h60.00h
Alice
Σ = 40.00h
Capacity: 40h
VERIFIED
Bob
Σ = 80.00h
Capacity: 80h
VERIFIED
Carol
Σ = 60.00h
Capacity: 60h
VERIFIED
Σ DerivedHours[P, *] = Capacity[P, I] Holds for every person. Always. No exceptions.