Activity Report Tag Refactor

Ticket #149: Activity Report Tag Refactor
Type: Governance / Documentation / Quality Assurance
Affected Component: docs/activity_report/*.md, docs/fr/activity_report/*.md, scripts/tag_inventory.py, scripts/tag_group_by_category.py, scripts/migrate_tags.py, tests/test_activity_report_governance.py

1. Context and initial intent

This work started from a strategic objective: make activity-report tags reliable, comparable, and governable over time across EN and FR documentation.

At the beginning, the problem was not only naming inconsistency. It was a broader governance issue:

duplicate and ambiguous tags;
mixed casing and legacy aliases;
year tags mixed with thematic tags;
uneven report metadata quality;
no strict guardrail to prevent regressions.

The expected outcome was clear: move from ad hoc tagging to a canonical, enforceable taxonomy with verifiable quality gates.

2. From idea to framing (reflections and decisions)

Before implementation, a taxonomy design and alignment phase was completed.

Main decisions from the reflection and framing phase:

establish one canonical taxonomy in English;
classify tags by explicit categories (workstream, technical domain, quality, technology, temporal metadata, etc.);
define alias mapping for legacy tags;
cap report tags to maintain readability and signal quality;
validate EN first, then propagate to FR with parity.

This phase established the foundation for deterministic migration and testable governance.

3. Implementation path

3.1 Inventory and diagnostic

Three implementation layers were delivered to drive decisions with data:

inventory extraction of existing tags from reports;
grouping by canonical categories with alias normalization;
migration matrix to map legacy tags to canonical tags with explicit rationale.

This made inconsistencies visible and removed subjective arbitration during migration.

3.2 Canonical migration (EN then FR)

A dedicated script applied canonicalization at scale:

normalize tag casing and format;
resolve aliases toward canonical tags;
enforce category uniqueness;
enforce maximum tag count;
preserve temporal metadata in the right category.

Migration was executed first on EN reports, then aligned to FR mirror reports.

3.3 Formal governance validation

A formal Pytest gate was added to validate each report according to the governance rules mentioned in the appendix. During validation hardening, an invisible root cause was identified: several files contained a UTF-8 BOM before ---, causing false "missing frontmatter" diagnostics.

Resolution applied:

remove BOM from impacted files;
harden the governance test to tolerate BOM safely;
fail explicitly when frontmatter is missing or incomplete (no silent skip).

Validation was therefore upgraded from permissive behavior to strict and auditable control.

4. Operational result

At closure, the setup reached a full control level:

canonical tags are enforced across EN and FR reports;
governance validation runs cleanly on the full report set;
Speckit artifacts (spec, plan, tasks) are aligned with execution reality;
taxonomy guidance is documented for maintenance continuity.

Final validation snapshot at closure: governance tests passed across the full report set (58/58), with no tagging-rule failure.

5. Lessons learned

Taxonomy work must start with inventory and diagnosis, not direct editing.
Governance quality depends on strict tests; permissive skips hide risk.
Invisible encoding artifacts (like BOM) can invalidate otherwise-correct metadata workflows.
Documentation governance is sustainable only when design, scripts, tests, and process artifacts stay synchronized.

6. Conclusion

This initiative did more than clean tags: it established a durable governance system.

The project now has a repeatable cycle: idea → policy → automation → formal validation, aligned with the Speckit approach and ready for upcoming extension phases.

Appendix: Canonical tag taxonomy (governance)

To ensure documentation compliance and durable governance, the following taxonomy has been adopted for all activity reports:

Governance rules

Maximum 5 tags per report.
Maximum 3 tags per canonical category (except TemporalMetadata).
Year tags (year-2025, year-2026) are allowed only in TemporalMetadata.
Frontmatter YAML is mandatory for every report.

Canonical categories

Workstream: automation, debugging, documentation, migration, operations, governance
ChangeType: bugfix, feature, refactor, hardening, improvement
TechnicalDomain: frontend, backend, api, database, infrastructure, pipeline, security, data, ci-cd
QualityAttribute: reliability, performance, observability, scalability, maintainability, compliance, data-integrity
TechnicalExpertise: architecture, design-patterns, incident-response, cost-optimization
TechnologyLanguage: python, typescript, playwright, pytest, mkdocs, docker, postgresql, marketstack, github-actions, nginx
NonTechnicalExpertise: documentation, process-improvement, risk-management, business-analysis, decision-support
TemporalMetadata: year-2025, year-2026

Validation command

To verify report compliance:

.venv/Scripts/python.exe -m pytest -v tests/test_activity_report_governance.py

Migration matrix reference:

specs/004-activity-tags-refactor/tag-migration-matrix.csv