Skip to content

Token noise cleanup

Ticket #245: Token noise cleanup before LLM Wiki implementation
Type: Governance / Automation / Quality
Affected Component: scripts/rotate_logs.py, .github/workflows/rotate-logs.yml, .github/agents/copilot-instructions.md, tests/test_copilot_scope_rule.py, docs/reports/coverage.xml, docs/reports/report.xml, docs/reports/report.jsonl, docs/reports/quality_score.json, docs/tests/index.md, docs/fr/tests/index.md


1. Context and Strategy

Concerned for some time about my token consumption on this project, I was looking for a way to optimize my interactions with GitHub Copilot agents. I had initially considered building a RAG (Retrieval-Augmented Generation), but this idea was abandoned in favor of the LLM Wiki, judged simpler and better suited to a personal project.

The goal of the LLM Wiki is to create a structured Markdown knowledge base that the AI reads first, in a targeted way, before exploring any other documentation source. The expected result: faster orientation, more precise responses, and better capitalization on past technical decisions.

However, before building this LLM Wiki, I first had to reduce the documentation noise that inflates token consumption.

Why this prerequisite was strategic:

  • an LLM Wiki only creates value if the knowledge base read by the AI is clean and prioritized;
  • without cleanup, the AI consumes tokens on run artifacts, temporary files, and operational logs instead of focusing on business knowledge;
  • this noise increases cost, latency, and the risk of less relevant responses;
  • by cleaning first, I create a governed and stable foundation that then makes the LLM Wiki implementation more reliable, faster, and more cost-efficient.

2. Implemented Solution

The workstream was executed in phases, in line with spec 005-ai-token-noise-cleanup.

  • full definition and clarification of the spec (automatic rotation every 180 days, .gz archiving, permanent deletion of archives after 180 days, automated validation of the AI scope rule);
  • implementation of a dedicated script scripts/rotate_logs.py to:
    • keep only active entries from the last 180 days,
    • archive expired entries as .gz files in logs/archives/ with a date in the filename,
    • automatically purge outdated archives;
  • creation of the GitHub Actions workflow .github/workflows/rotate-logs.yml with:
    • semiannual scheduling,
    • manual trigger (workflow_dispatch),
    • simulation mode (dry_run) and retention parameter;
  • formalization of the AI scope rule in .github/agents/copilot-instructions.md. Concretely, this rule tells the AI what to read first and what to ignore: priority to project knowledge sources written in English (because the documentation is bilingual), explicit exclusion of generated artifacts, temporary files, and operational logs;
  • addition of a non-regression test tests/test_copilot_scope_rule.py to lock in the presence of this rule;
  • removal of temporary files validated as non-useful (tmp_*, pytest_result*.txt, gate_result.txt, requirements.txt.tmp).

3. Validation and Results

Execution validation (full recomputation):

  • Pytest: 123 tests passed, 1 warning, 0 failure;
  • regenerated reports:
    • docs/reports/report.xml (JUnit XML),
    • docs/reports/report.jsonl (detailed log),
    • docs/reports/coverage.xml (code coverage).

Validation of coverage:

  • application code coverage (code_source_simule/*): 67.58% (494/731 lines covered);
  • updated quality score (docs/reports/quality_score.json):
    • coverage: 67.58,
    • pytest: 100.0,
    • e2e: 100.0,
    • overall score: 83.79 (rounded to 84).

Governance validation:

  • the AI scope rule is now automatically tested;
  • the log rotation/purge flow is industrialized;
  • spec 005-ai-token-noise-cleanup has moved to Implemented status.

4. Business Impact

This cleanup directly improves the future profitability of the LLM Wiki:

  • fewer tokens wasted on low-value files;
  • better documentation signal for the AI;
  • more targeted and more consistent responses;
  • a more maintainable knowledge base over time.

In summary, this work turns the repository into a foundation ready for the LLM Wiki phase, with stronger cost/quality control from the outset.


5. Limits and Watchpoints

  • scenario SC-005 (post-implementation AI test session) remains a behavioral validation that must be run explicitly after your review;
  • the logs/ folder is locally ignored by Git, which is consistent with its operational nature. The expected discipline is as follows: let the scheduled workflow run on time, run the workflow manually after an incident or an unusually large run, then check the execution summary to confirm that rotation and purge were correctly applied.

6. Lessons Learned

Here is a summary of token consumption during this working session:

  • Estimated total consumption: 19,602,640 tokens across 212 LLM requests;
  • Input/output split: 19,471,947 input tokens (~99.33%) vs 130,693 output tokens (~0.67%);
  • Main contributors:
    • main session: 19,395,379 tokens (~98.94% of total),
    • AI coach subagent: 206,952 tokens (~1.06%),
    • UI title generation: 309 tokens (~0.00%, negligible);
  • Most consuming models:

    • gpt-5.3-codex: 12,411,325 tokens (~63.31%),
    • claude-sonnet-4.6: 7,191,315 tokens (~36.69%).
  • Items that consumed the most tokens during this session:

    • requests with very large accumulated context (113 requests >= 80,000 input tokens);
    • reinjection of large context blocks and tool outputs throughout iterations, especially after read_file (88 calls) and run_in_terminal (61 calls);
    • inclusion of long attachments (12 requests), which remains secondary but still contributes to context growth.
  • Operational conclusion:

    • the session achieved its business objective, but token cost remains mostly driven by input context size;
    • the most direct lever for the next session is to reduce the size of injected excerpts and close each step with a short summary instead of accumulating raw outputs.