Skip to content

AI usage coach — session collection hardening

Ticket #232: Follow up on AI usage coach — session collection hardening and external alternative experiment
Type: Automation / Workflow / Governance
Affected Component: .github/ai-coach.config.json, scripts/hooks/ai_coach_session_start.ps1, scripts/hooks/ai_coach_append_event.ps1, .github/agents/ai-coach.agent.md, .github/prompts/ai-coach-report.prompt.md, tests/test_ai_coach_infrastructure.py, logs/ai_coach_active_session.json


1. Context and objective

The AI usage coach introduced in ticket #224 was designed to analyze work sessions and help improve prompt quality, structure interactions with the AI more efficiently, and reduce token waste.

This session exposed a key limitation: the existing setup was correctly creating a session, but was not capturing enough information to produce a genuinely useful session report. The goal of this intervention was therefore to make the collection layer actually valuable, without switching to a permanent full transcript.

In parallel, an external track was opened with the installation of the GitHub Copilot usage insights extension, to experiment with an alternative to the custom solution under development.


2. Main finding

The checks performed during this session confirmed that the coach was working at the minimal technical level, but not yet at the expected analytical level.

Concretely:

  • the active session was being created correctly;
  • an event file was being generated;
  • but in practice that file contained only a single startup event (SessionStart);
  • it was therefore impossible to reconstruct the objective, the key actions, the validations, or the final outcome of the session.

The system could say that a session had started, but not explain what had happened during it.


3. Decision made

Three options were considered:

  1. keep the current mode as-is;
  2. enable a permanent full transcript;
  3. introduce a small number of structured checkpoints.

The third option was selected.

The principle is simple: instead of trying to record everything, the coach captures a few key milestones — enough to reconstruct the session flow and produce a useful report afterward.

Four checkpoint types were defined:

  1. GoalSet — the session's starting objective;
  2. MajorAction — an important action or significant change of direction;
  3. Validation — a verification or confirmation evidence;
  4. Outcome — the final result or closing decision.

This approach was judged to be the best trade-off between simplicity, readability, cost, and real coaching value.


4. Actions taken

The following actions were carried out during this session:

  1. Audit of the existing solution

    • review of the active session's AI coach artifacts;
    • confirmation that the actual event log contained only a SessionStart;
    • re-reading of the configuration, the startup hook, the agent, and the report prompt.
  2. AI coach architecture hardening

    • addition of a structured audit section in .github/ai-coach.config.json;
    • explicit definition of the recommended checkpoints: GoalSet, MajorAction, Validation, Outcome.
  3. Startup hook evolution

    • enhancement of scripts/hooks/ai_coach_session_start.ps1;
    • addition of a StructuredAuditReady signal;
    • addition of the metadata needed to indicate that a structured audit is available.
  4. Creation of a checkpoint-writing mechanism

    • creation of the script scripts/hooks/ai_coach_append_event.ps1;
    • this script appends a structured checkpoint to the active session;
    • it also enforces allowed event types to prevent incoherent logs.
  5. Update of the coach analysis layer

    • update of .github/agents/ai-coach.agent.md;
    • update of .github/prompts/ai-coach-report.prompt.md;
    • when checkpoints exist, they become the primary analysis source.
  6. Addition of non-regression tests

    • creation of tests/test_ai_coach_infrastructure.py;
    • a test validating that a structured checkpoint is correctly written;
    • a test validating that an unauthorized event type is rejected.
  7. Exploration of an external alternative

    • installation of the GitHub Copilot usage insights extension;
    • objective: compare, over time, the value of a specialized external solution against the custom coach developed in this repository.

5. Validation and result

The new approach was validated progressively.

  1. Isolated validation

    • creation of a test session in a temporary environment;
    • correct generation of both SessionStart and StructuredAuditReady events;
    • successful addition of a GoalSet checkpoint.
  2. Automated validation

    • execution of the tests in tests/test_ai_coach_infrastructure.py;
    • result: successful validation of the new mechanism.
  3. Integration validation

    • joint execution of AI coach tests and documentation governance tests;
    • final observed result: 10 tests passed across the verified scope.

The concrete outcome of this session is as follows:

  • the coach is no longer limited to recording the start of a session;
  • it can now retain a few useful structured milestones;
  • a credible foundation now exists to produce a real session report during the next actual use.

6. Limitations and points of attention

A few limitations remain despite this improvement:

  • the system is not yet fully automatic: to obtain a rich report, checkpoints must still be recorded during the session;
  • token measurement remains indirect;
  • the custom solution is progressing, but has not yet reached its final state;
  • the GitHub Copilot usage insights extension is installed, but its evaluation remains to be done.

7. Conclusion

This session transformed an initial base that was still too lightweight into a more credible setup.

The most important point is that the coach now has a simple, usable mechanism to document what actually happened during a session, without switching to a permanent full transcript. This choice makes the solution more realistic for daily use.

In parallel, the installation of the GitHub Copilot usage insights extension opens a second useful track: comparing, over time, a specialized external solution against the custom solution developed in this repository.