Skip to content

Chapter 2: Technical Architecture

2.1. Architectural Overview

The architecture of the "Augmented Analyst" project was designed to ensure robustness, reliability, and automation. It is based on a clear separation of responsibilities between three distinct environments: the local development environment, the Continuous Integration/Continuous Deployment (CI/CD) platform, and the production environment.

The standard workflow is as follows:

  1. Development and initial testing are carried out in the local environment.
  2. Each validated and "pushed" change to the Git repository triggers the process in the CI/CD environment, which acts as a quality control gateway.
  3. If, and only if, quality is validated, the code is automatically deployed to the production environment, where it is accessible to the end-user.

This structure ensures that no changes can reach production without being tested and validated, thus ensuring maximum service stability.

2.2. The Main Application (Backend)

The logical core of the application is a backend developed in Python, orchestrated by the Flask micro-framework.

2.2.1. Framework

The application is built with Flask. The code follows the Application Factory design pattern (via the create_app function). This approach involves encapsulating the creation and configuration of the application in a function, which offers several advantages:

  • Testability: allows for the creation of different application instances for different contexts (production, testing) with distinct configurations.
  • Organization: avoids global application variables and promotes a cleaner, more modular code structure.

2.2.2. Models and Database

Interaction with the database is managed by the SQLAlchemy ORM. The schema consists of three main tables:

  • user: stores authentication information.
  • titres (stocks): a reference table containing the list of assets. It includes columns for 52-week thresholds, which follow a denormalization strategy for performance:
    • an_haut, an_bas (52w_high, 52w_low): store the raw threshold value, in the stock's original currency (USD, CAD, etc.).
    • an_haut_cad, an_bas_cad (52w_high_cad, 52w_low_cad): store the threshold value, systematically converted to CAD. These columns are pre-calculated by the pipeline to optimize dashboard queries.
  • historique (history): records a "snapshot" of each stock for each day. It contains:
    • valeur, devise (value, currency): the raw price of the stock and its original currency.
    • cad_value: a pre-calculated column containing the price value, systematically converted to CAD.
    • quantite (quantity): the quantity held.

2.2.3. Authentication Management

Access security is ensured by two key Flask extensions:

  • Flask-Login: manages the user session lifecycle (login, logout, protection of "private" routes).
  • Flask-Bcrypt: ensures the secure storage of passwords by saving only their cryptographic "hash," never the password in plain text.

2.3. The Data Pipeline (ETL)

The system is powered by a robust ETL (Extract, Transform, Load) pipeline, orchestrated by the pipeline.py script. This pipeline is designed to be independent of its execution time.

2.3.1. Temporal Logic: Robustness through D-1

The guiding principle of the pipeline is to never depend on the exact time of its execution. To ensure reliable and definitive closing data, the pipeline systematically processes data from the previous day (day D-1).

  • get_current_montreal_date(): the script first determines the current date in the Montreal time zone (America/Montreal).
  • date_a_traiter = current_date - 1 day: this date (D-1) becomes the source of truth for all pipeline operations (API calls, database insertions).
  • Weekend Management: the pipeline is designed not to run if launched on a Sunday or Monday, as the previous day's markets (Saturday and Sunday) were closed. It then stops cleanly.

2.3.2. Orchestration via run_full_pipeline

The entry point of the process is the run_full_pipeline function, which accepts a boolean argument fetch_market_data.

  1. fetch_market_data=True mode (behavior for the daily cronjob):

    • Extraction: The pipeline reads the reference CSV and calls the Marketstack API to retrieve the closing prices for the date_a_traiter (D-1).
    • Objective: To ensure the official price update with the definitive closing data from the previous day.
  2. fetch_market_data=False mode (for portfolio synchronization):

    • Extraction: The pipeline only reads the CSV file. No call to Marketstack is made.
    • Objective: Allows for immediate synchronization of the database structure after a portfolio modification (addition/removal of stocks). It inserts a record for the date_a_traiter (D-1) using the temporary price present in the CSV.

2.3.3. Transformation (Transform)

  • The read_and_clean_csv function standardizes column names, handles empty or corrupted files, and parses complex information (like the "52 Week Range" column).
  • The currency conversion logic is applied based on the Marketstack_Currency column and the usd_to_cad_rate from the config.ini file.

2.3.4. Loading (Load)

Transformed data is inserted into the MariaDB database via UPSERT queries (INSERT ... ON DUPLICATE KEY UPDATE).

  • The unique key (titre_id, date_releve) of the historique table is fundamental. It ensures that there can only be one record per stock and per day.
  • Update Scenario: if an execution in fetch_market_data=False mode created an entry for 2025-11-04 with a temporary price, and the next day's cronjob runs (also targeting 2025-11-04), the UPSERT command will not create a duplicate. It will update the existing entry with the official price from Marketstack.

2.4. The Production Infrastructure (VPS)

The application is hosted on a Virtual Private Server (VPS) under Ubuntu, with a modern architecture based on containerization.

2.4.1. Containerization (Docker)

  • app service: a custom-built Docker container (via a Dockerfile) that encapsulates the Python application. The application is served by a production WSGI server, Gunicorn, optimized to handle multiple simultaneous requests.
  • db service: an official MariaDB 10.6 container, ensuring a stable and isolated database environment.
  • Orchestration: all services are defined, configured, and linked by Docker Compose (via the docker-compose.yml file), which acts as the infrastructure's orchestrator.
  • Data Persistence: to ensure no database data is lost during updates or restarts, MariaDB's files are stored in a named Docker volume, which is independent of the container's lifecycle.

2.4.2. Web Server

Nginx is used as the main web server. It acts as a reverse proxy:

  • It receives all incoming web requests on ports 80 (HTTP) and 443 (HTTPS).
  • It forwards requests to the application running in the Docker container on port 8000.
  • It handles SSL/TLS termination, serving certificates managed by Certbot to ensure a secure connection (HTTPS).

2.5. The Quality and Deployment System (CI/CD)

Automation is at the heart of the project, managed by a CI/CD pipeline hosted on GitHub Actions.

2.5.1. Workflow (deploy.yml)

  • Trigger: the workflow is automatically launched with every push to the main branch.
  • test job (Continuous Integration): before any deployment, the code is retrieved in an ephemeral Linux environment. A full test suite is executed with Pytest. To ensure realistic validation, this job launches its own temporary MariaDB database service. If a single test fails, the entire pipeline stops.
  • deploy job (Continuous Deployment): only if the test job succeeds, the workflow connects to the production VPS via SSH. It then executes a script that orchestrates the deployment:
    1. git pull to retrieve the validated code.
    2. docker compose up -d --build to rebuild the application image with the new code and restart the services without major interruption.

2.5.2. Testing Strategy

  • Framework: Pytest, for its simplicity and power, as well as pytest-mock to simulate external calls.
  • Test Environments: Use of an in-memory SQLite database for fast local tests, and a real MariaDB database in CI for maximum reliability.
  • TDD Philosophy: Development follows a Test-Driven Development approach.
  • Test Management and Traceability: The project includes a "docs-as-code" test management system:
    • Each test case is a Markdown file stored in the test_cases/ directory.
    • A Python script (scripts/sync_tests.py) ensures traceability by linking each test case to its pytest implementation via a @pytest.mark.test_id marker.
    • A functional coverage report website is automatically generated with MkDocs, providing a clear view of the validation status of each feature.