Skip to content

ETL Pipeline Outage Resolution

Ticket #127: Resolution of a Complete ETL Data Pipeline Outage
Type: Debugging / Fix / Maintenance
Affected Component: code_source_simule/pipeline.py, code_source_simule/import_data.py, tests/test_pipeline.py


1. Context and Symptoms

A silent but complete ETL pipeline outage was detected: market prices (from the Marketstack API) were no longer refreshing in the UI, and portfolio composition changes (CSV updates to tickers or quantities) were not reflected anymore. The system appeared healthy at first glance (containers running, no obvious alerts), but database data was stale.


2. Investigation Process

The investigation was carried out through successive hypotheses, each validated or rejected with concrete evidence.

Hypothesis 1 - Working-directory issue (config.ini)
The initial diagnosis pointed to a relative path config.read('config.ini') in pipeline.py. When the script runs from /app (container root), Python looks for config.ini there, while the script itself is under /app/code_source_simule/. This was valid for the older code version.

Hypothesis 2 - The pipeline is not running at all
Rejected based on Marketstack metrics: 596 API requests recorded between March 10 and March 13, 2026 (exactly 149 requests per business day). The pipeline was running.

Key discovery - Drift between local code and production
The API request volume (149/day, i.e., 1 request per ticker) showed production was running an older script version using one call per ticker instead of batched calls by 100. The latest commit (3c476d1) fixed several behaviors but had not been fully reflected at first. In addition, that commit changed the date logic (from D-1 to current day) while tests were still expecting the old behavior, leaving CI red.

Production finding - Data-cleaning bug
During the first manual post-deployment run, MySQL stopped the pipeline with:

pymysql.err.DataError: (1366, "Incorrect integer value: '<0.1' for column `historique`.`quantite`")
The CSV No. of Shares column can contain textual values such as '<0.1', which were not cleaned before DB insertion, unlike the Price column.


3. Root Causes Identified

# Cause File Status
1 Relative path config.read('config.ini') fails depending on CWD pipeline.py Already fixed in commit 3c476d1
2 Tests not aligned with new date logic (D vs D-1) tests/test_pipeline.py Fixed
3 no_of_shares not cleaned before insertion (values like <0.1) pipeline.py Fixed

4. Implemented Solutions

4.1 - Align tests with current date logic
The pipeline behavior was changed (commit 3c476d1) to process current day instead of D-1. Two integration tests were still asserting the previous expectation and were updated.

4.2 - Temporary weekend-guard suspension, then restoration
To enable manual verification on a Saturday, the weekend early return was temporarily disabled and replaced with a warning log. After production verification, weekend blocking (Saturday/Sunday) was restored.

4.3 - no_of_shares cleaning
read_and_clean_csv() now applies numeric cleaning for quantities, matching the robustness already used for price.

if 'no_of_shares' in df.columns:
    df['no_of_shares'] = df['no_of_shares'].astype(str).str.replace(r'[^0-9.]', '', regex=True)
    df['no_of_shares'] = pd.to_numeric(df['no_of_shares'], errors='coerce').fillna(0.0)

4.4 - Test-suite and coverage-doc reinforcement
A dedicated unit test was added to lock the CSV quantity fix (<0.1, numbers with spaces): - test_read_and_clean_csv_no_of_shares_special_values_are_numeric

Coverage docs were updated accordingly: - Pipeline: 42% -> 60% - Global: 70% -> 76%


5. Verification and Results

  • Local test suite (tests/test_pipeline.py): 10/10 passing.
  • Manual production run: pipeline executed successfully, queried Marketstack for 149 tickers, and wrote data to DB.
  • UI verification: data is up to date and correctly rendered in the dashboard.
  • Post-validation stabilization: weekend import guard restored to prevent non-business-day imports.

6. Lessons and Watchpoints

  1. A silent pipeline is not necessarily a healthy pipeline. External metrics (API logs, request volumes) are crucial when app-level observability is limited.
  2. Any business-logic change requires synchronized test updates. Code/test drift kept CI red and obscured quality signals.
  3. Input robustness must be exhaustive. If one numeric CSV field is normalized (price), all equivalent numeric fields (no_of_shares) should follow the same rule.