Task 7 of the automated tests framework - Traceability, retry, and safe CI/CD

Ticket #173: Hardening the E2E automated tests framework with self-healing, retry, and secure CI/CD
Type: Automation / CI-CD / Security / Reliability
Affected Component: e2e/playwright.config.ts, e2e/src/utils/selfHealingUtil.ts, e2e/src/pages/LoginPage.ts, e2e/package.json, .github/workflows/deploy.yml

1. Context and objective

This task is the seventh and final one in my E2E automated tests framework implementation plan.

After laying the foundations of the Playwright framework (Issue #167), structuring navigation (Issue #168), securing environments (Issue #169), industrializing test data (Issue #170), optimizing execution (Issue #171), and adding advanced network controls (Issue #172), the objective was to establish execution resilience for the automated test framework and integrate it into CI.

The scope was however adjusted to account for a structural constraint of the project: the application runs on a personal production environment containing real financial data. The priority was therefore twofold:

strengthen the robustness of the automated test framework;
integrate a meaningful CI execution without exposing production credentials or normalizing production as a test environment.

2. Phase 1 — Retry and execution traceability verification

The Playwright configuration was confirmed around a CI-adapted resilience model:

retries enabled only when CI=true;
trace collection on the first retry via trace: 'on-first-retry';
parallelism limited in CI to reduce interference;
HTML report preserved for post-execution analysis.

This foundation provides a sufficient level of traceability to diagnose intermittent failures without unnecessarily burdening local runs.

A selfHealingUtil utility was consolidated to try multiple selectors in a prioritized order and return the first usable one.

The final implementation provides:

a configurable timeout;
an explicit choice between attached and visible verification;
a strict variant that fails clearly if no selector resolves.

This utility was then wired into LoginPage with fallback selectors for:

the username field;
the password field;
the submit button.

Result: the login flow becomes less brittle against minor UI adjustments, while preserving an explicit failure if the screen genuinely changes its contract.

4. Phase 3 — Secure E2E gate integration in GitHub Actions

The main deployment workflow was extended with a dedicated E2E job that:

installs the e2e/ Node sub-project;
installs only Chromium on the CI runner;
runs a subset of Playwright tests compatible with a secrets-free execution;
publishes Playwright artifacts (playwright-report, test-results) for diagnostics.

Deployment is now gated by two verifications:

the existing backend tests;
the secure E2E gate.

The key decision of this phase was to exclude from the automated pipeline any scenarios that require real production credentials. CI therefore validates the automated test framework, but does not automatically connect to the sensitive personal environment.

5. Critical incident identified and governance resolution

Incident — Risk of indirect exposure of the production environment
The first version of the E2E CI job injected USER_ID and PASSWORD to reproduce the full authentication flow. Technically, this was feasible via GitHub Actions. From a governance standpoint, this choice was disproportionate for a personal project containing real financial data.
Resolution: the pipeline was refocused on a "safe" E2E execution with no production secrets, deferring sensitive scenarios to a manual suite or a future non-production environment.

This decision does not reduce the maturity of the project. It improves it by aligning automation with a correct risk assessment.

6. Validation and results

Final validations confirmed:

no structural errors in deploy.yml, LoginPage.ts, selfHealingUtil.ts, and e2e/package.json;
correct test detection by Playwright for the chromium project;
correct selection of the secure E2E subset run in CI;
effective integration of self-healing into the login flow.

The framework delivered today is therefore:

more resilient in execution;
more readable for diagnostics;
better aligned with the security requirements of the project.

7. Conclusion and next steps

Task 7 is complete within its deliverable scope for today: retry, execution traceability, login self-healing, and secure CI/CD integration.

Two complementary workstreams were explicitly moved out of scope for this delivery to be handled separately:

definition of a manual, protected workflow for optional production smoke tests;
development of a non-production E2E environment with synthetic data. The current /demo page will be used as the starting point.

These follow-ups are tracked in dedicated GitHub issues to maintain a clear roadmap without compromising the immediate security of the project.