Techies Pulse — Assessment & Scoring Criteria

1How Scoring Works

From raw metric to a single, fair number — in four steps.

Step	What happens
1 · Measure	Claude Code audits GitHub and pulls each raw metric for the period.
2 · Normalize	Each metric is converted to a 0–100 score vs its target (direction-aware).
3 · Weight	Metric scores roll up to a category score; categories roll up to a composite.
4 · Band	The composite maps to a rating band and a trend arrow vs last period.

Thresholds and weights below are starting defaults — they are calibrated against a 4-week baseline before any score counts.

2Per-Metric Scoring (0–100)

Every metric becomes a 0–100 score. Direction matters: some metrics are better high, others better low.

2.1 Normalization formulas

Higher-is-better → score = min(100, (actual / target) × 100)

Lower-is-better → score = min(100, (target / actual) × 100)

Threshold (pass/fail band) → 100 if within target, scaled down by distance outside

Scores are capped at 100 — beating target earns a full mark, not bonus inflation. A floor of 0 applies. "Exceeds" is recognised at the band stage (§6), not by letting one metric exceed 100.

2.2 Metric scoring table

Metric	Category	Direction	Default target	Weight in cat.
Cycle time	Productivity	Lower	≤ 2 days	35%
Merged PR throughput	Productivity	Higher	baseline	25%
Time to first review	Productivity	Lower	≤ 4 hrs	25%
PR size (median)	Productivity	Lower	≤ 400 LOC	15%
Change failure rate	Quality	Lower	< 15%	35%
Review pass rate	Quality	Higher	≥ 85%	30%
Test coverage (changed)	Quality	Higher	≥ 80%	20%
21-day rework ratio	Quality	Lower	baseline	15%
AI-assisted PR share	AI adoption	Higher	growth	45%
Active usage (days/wk)	AI adoption	Higher	baseline	30%
Workflow contributions	AI adoption	Higher	qualitative	25%
CI pass rate	Delivery	Higher	≥ 90%	30%
Deploy frequency*	Delivery	Higher	≥ weekly	25%
Lead time for changes*	Delivery	Lower	baseline	25%
MTTR on incidents*	Delivery	Lower	< 24 hrs	20%

*Delivery metrics are largely team-level; individuals inherit the team score for these, so no one is graded on a deploy they didn't own.

3Category Weights & Composite

Category scores combine into one composite using the policy weights.

Category	Weight	Reads as
Code quality	30%	Correctness is the highest-weighted aspect
Productivity & velocity	25%	Flow of shipped value
Delivery & reliability	25%	DORA-style stability
AI adoption	20%	Capability, leading indicator

Composite = 0.30·Quality + 0.25·Productivity + 0.25·Delivery + 0.20·AI adoption

Quality-weighted on purpose: in an AI workflow, velocity is cheap and correctness is the constraint. The weights stop high output from masking high defect rates.

4Weekly Assessment Rubric every week

A fast pulse — flow and quality signals only. Coaching-grade, not a formal rating.

Scored subset: cycle time, throughput, time-to-first-review, change failure rate, review pass rate, CI pass rate, AI-assisted share.
Output: a weekly score (0–100) + a band + trend arrow vs last week, per developer.
Pairing check: if velocity is up but change-failure or review-pass is down, the week is flagged amber regardless of the headline score.
Purpose: drives the Monday 1:1 conversation; not used alone for performance decisions.

Weekly score = 0.40·Quality(subset) + 0.35·Productivity(subset) + 0.25·Delivery(subset)

AI adoption is reported weekly but lightly weighted in the weekly score — it's a slower-moving capability better judged monthly.

5Monthly Assessment Rubric every month

The official composite — all four categories, plus trend and a light qualitative overlay.

Full composite across all four categories using §3 weights.
Trend factor: a ±5-point adjustment for clear improvement or decline over the month's weeks (rewards trajectory, not just snapshot).
Qualitative overlay: manager note on workflow contributions, collaboration, and changelog quality — capped influence so the score stays mostly objective.
Self vs audited gap: the month's average gap between self-rated % and audited % is reported as a calibration signal.

Monthly score = Composite (§3) ± trend(≤5) + qualitative(≤5)

Dimension	Weekly	Monthly
Categories scored	3 (subset)	All 4
Trend factor	Arrow only	±5 points
Qualitative overlay	No	Yes (≤5)
Used for	Coaching	Review record

6Rating Bands

The composite maps to one of four bands.

90–100Exceeds

75–89Meets

60–74Developing

< 60Needs attention

Bands describe performance against target, not against peers — everyone can land in "Exceeds" in a strong month. "Developing" and below trigger a support plan, never an automatic penalty.

7Worked Example

One developer, one week. Placeholder numbers showing the math end-to-end.

Metric	Actual	Target	Metric score
Cycle time (lower)	2.4 d	2.0 d	83
Throughput (higher)	6 PRs	5 PRs	100 (capped)
Time to first review (lower)	5 hrs	4 hrs	80
Change failure rate (lower)	10%	15%	100 (within)
Review pass rate (higher)	78%	85%	92
CI pass rate (higher)	88%	90%	98

Quality(subset) = 0.54·100 + 0.46·92 ≈ 96
Productivity(subset) = weighted(83,100,80) ≈ 88
Delivery(subset) = 98
Weekly = 0.40·96 + 0.35·88 + 0.25·98 ≈ 93 → Exceeds ▲

Self-rated 90% vs audited 93% → small, well-calibrated gap. No flag.

8Guardrails & Overrides

Metric caps: no metric exceeds 100, so one inflated number can't carry a weak month.
Pairing rule: velocity scores are voided if change-failure rate is in the bottom band — speed never outscores breakage.
Small-sample handling: weeks with very few PRs are marked low-confidence and lean on the monthly view.
Banned inputs: commit count, lines of code, and hours never enter the score.
Human override: the manager can annotate or hold a score with a written reason; the system never auto-penalises.
Leave/PTO: absence periods are excluded, not scored as zero.

9Automation & Cadence

How Claude Code carries the scoring out automatically — approved and in rollout.

Weekly run

Every Friday EOD, the audit + weekly rubric runs per developer and team; outputs the weekly scorecard for Monday 1:1s.

Monthly run

First working day of the month, the four weekly results roll into the monthly composite + trend + qualitative; outputs the review-record scorecard.

9.1 Scoring command

# Claude Code command: /kpi-score  (period = week | month)
For AUTHOR=[user] (or --team) over PERIOD:
  1. Pull metrics via the /weekly-audit data.
  2. Normalize each to 0-100 (direction-aware, capped at 100).
  3. Roll up to category scores, then the weighted composite.
  4. Apply pairing rule, small-sample flags, leave exclusions.
  5. (month only) add trend ±5 and qualitative ≤5.
  6. Map to band; emit scorecard: metric | actual | target | score,
     category scores, composite, band, trend, self-vs-audited gap.

Configuration notes: default targets & weights, the ±5 trend/qualitative caps, and the band thresholds are approved as documented. Scheduling cadence (on-demand vs scheduled job) is confirmed per rollout plan.