Pipeline Reliability (D-1)
Ticket #70: Data Pipeline Reliability and Project Ecosystem Alignment
Type: Improvement / Refactoring
Affected Component: code_source_simule/pipeline.py, Cron Task, tests/test_pipeline.py, project_docs/
1. Context and Strategic Objective
In accordance with the project's non-functional requirements, particularly Reliability and Automation, the daily data pipeline must be infallible. The strategic objective is to ensure complete and lossless market data collection, even in the event of a minor technical incident.
2. Description of the Initial Approach (the Vulnerability)
The initial approach was based on strict temporal execution. A cron task was configured to launch the pipeline.py script every evening at 11:59 PM, with the mission of retrieving the closing prices for the same day (Day D). This architecture was fragile and risked irretrievable data loss in the event of a failure.
3. Implemented Architecture and Logic
The new architecture abandons time dependency in favor of explicit business logic. The script, now executed early in the morning of Day D, systematically requests the previous day's data (Day D-1) from the Marketstack API, ensuring data availability and finality.
4. Justification and Benefits
This architecture is fundamentally more robust, resilient to failures, and decouples the business logic from the execution infrastructure, thereby improving the system's reliability and maintainability.
5. Audit and Strengthening of the Test Suite
The change in logic necessitated an audit of tests/test_pipeline.py. The investigation revealed that the existing tests were insufficient and covered neither input data corruption nor the new temporal logic. The test suite was therefore strengthened with two critical security tests:
- An "Anti-Corruption" Test: To validate that the pipeline does not crash when faced with a malformed CSV.
- A "Time Guardian" Test: To lock in the D-1 data retrieval behavior.
6. Update of the Technical Documentation
A code change is only complete when its documentation is aligned. a complete audit of the documentation (project_docs/docs/features/) was conducted.
- Investigation Process: The analysis revealed that chapters 1, 2, and 3 contained obsolete descriptions concerning the pipeline's timing, the freshness of the data ("the latest prices"), and the manual commands.
- Root Cause: The documentation was no longer synchronized with the actual behavior of the code, creating a dangerous "documentation debt" for future maintenance.
- Implemented Solution: The entire documentation was meticulously rewritten to reflect the new D-1 logic, the increased robustness against file format errors, and the correct operational procedures (commands, cron configuration).
- Justification: This action ensures that the documentation remains a reliable source of truth, preventing future errors and facilitating the integration of new contributors.