Chapter 2: Technical Architecture
2.1. Architectural Overview
The architecture of the "Augmented Analyst" project was designed to ensure robustness, reliability, and automation. It is based on a clear separation of responsibilities between three distinct environments: the local development environment, the Continuous Integration/Continuous Deployment (CI/CD) platform, and the production environment.
The standard workflow is as follows:
- Development and initial testing are carried out in the local environment.
- Each validated and "pushed" change to the Git repository triggers the process in the CI/CD environment, which acts as a quality control gateway.
- If, and only if, quality is validated, the code is automatically deployed to the production environment, where it is accessible to the end-user.
This structure ensures that no changes can reach production without being tested and validated, thus ensuring maximum service stability.
2.2. The Main Application (Backend)
The logical core of the application is a backend developed in Python, orchestrated by the Flask micro-framework.
2.2.1. Framework
The application is built with Flask. The code follows the Application Factory design pattern (via the create_app function). This approach involves encapsulating the creation and configuration of the application in a function, which offers several advantages:
- Testability: allows for the creation of different application instances for different contexts (production, testing) with distinct configurations.
- Organization: avoids global application variables and promotes a cleaner, more modular code structure.
2.2.2. Models and Database
Interaction with the database is managed by the SQLAlchemy ORM. The schema consists of three main tables:
user: stores authentication information.titres(stocks): a reference table containing the list of assets. It includes columns for 52-week thresholds, which follow a denormalization strategy for performance:an_haut,an_bas(52w_high, 52w_low): store the raw threshold value, in the stock's original currency (USD, CAD, etc.).an_haut_cad,an_bas_cad(52w_high_cad, 52w_low_cad): store the threshold value, systematically converted to CAD. These columns are pre-calculated by the pipeline to optimize dashboard queries.
historique(history): records a "snapshot" of each stock for each day. It contains:valeur,devise(value, currency): the raw price of the stock and its original currency.cad_value: a pre-calculated column containing the price value, systematically converted to CAD.quantite(quantity): the quantity held.
2.2.3. Authentication Management
Access security is ensured by two key Flask extensions:
- Flask-Login: manages the user session lifecycle (login, logout, protection of "private" routes).
- Flask-Bcrypt: ensures the secure storage of passwords by saving only their cryptographic "hash," never the password in plain text.
2.3. The Data Pipeline (ETL)
The system is powered by a robust ETL (Extract, Transform, Load) pipeline, orchestrated by the pipeline.py script. This pipeline is designed to be independent of its execution time.
2.3.1. Temporal Logic: Robustness through D-1
The guiding principle of the pipeline is to never depend on the exact time of its execution. To ensure reliable and definitive closing data, the pipeline systematically processes data from the previous day (day D-1).
get_current_montreal_date(): the script first determines the current date in the Montreal time zone (America/Montreal).date_a_traiter = current_date - 1 day: this date (D-1) becomes the source of truth for all pipeline operations (API calls, database insertions).- Weekend Management: the pipeline is designed not to run if launched on a Sunday or Monday, as the previous day's markets (Saturday and Sunday) were closed. It then stops cleanly.
2.3.2. Orchestration via run_full_pipeline
The entry point of the process is the run_full_pipeline function, which accepts a boolean argument fetch_market_data.
-
fetch_market_data=Truemode (behavior for the daily cronjob):- Extraction: The pipeline reads the reference CSV and calls the Marketstack API to retrieve the closing prices for the
date_a_traiter(D-1). - Objective: To ensure the official price update with the definitive closing data from the previous day.
- Extraction: The pipeline reads the reference CSV and calls the Marketstack API to retrieve the closing prices for the
-
fetch_market_data=Falsemode (for portfolio synchronization):- Extraction: The pipeline only reads the CSV file. No call to Marketstack is made.
- Objective: Allows for immediate synchronization of the database structure after a portfolio modification (addition/removal of stocks). It inserts a record for the
date_a_traiter(D-1) using the temporary price present in the CSV.
2.3.3. Transformation (Transform)
- The
read_and_clean_csvfunction standardizes column names, handles empty or corrupted files, and parses complex information (like the "52 Week Range" column). - The currency conversion logic is applied based on the
Marketstack_Currencycolumn and theusd_to_cad_ratefrom theconfig.inifile.
2.3.4. Loading (Load)
Transformed data is inserted into the MariaDB database via UPSERT queries (INSERT ... ON DUPLICATE KEY UPDATE).
- The unique key
(titre_id, date_releve)of thehistoriquetable is fundamental. It ensures that there can only be one record per stock and per day. - Update Scenario: if an execution in
fetch_market_data=Falsemode created an entry for 2025-11-04 with a temporary price, and the next day's cronjob runs (also targeting 2025-11-04), theUPSERTcommand will not create a duplicate. It will update the existing entry with the official price from Marketstack.
2.4. The Production Infrastructure (VPS)
The application is hosted on a Virtual Private Server (VPS) under Ubuntu, with a modern architecture based on containerization.
2.4.1. Containerization (Docker)
appservice: a custom-built Docker container (via aDockerfile) that encapsulates the Python application. The application is served by a production WSGI server, Gunicorn, optimized to handle multiple simultaneous requests.dbservice: an official MariaDB 10.6 container, ensuring a stable and isolated database environment.- Orchestration: all services are defined, configured, and linked by Docker Compose (via the
docker-compose.ymlfile), which acts as the infrastructure's orchestrator. - Data Persistence: to ensure no database data is lost during updates or restarts, MariaDB's files are stored in a named Docker volume, which is independent of the container's lifecycle.
2.4.2. Web Server
Nginx is used as the main web server. It acts as a reverse proxy:
- It receives all incoming web requests on ports 80 (HTTP) and 443 (HTTPS).
- It forwards requests to the application running in the Docker container on port 8000.
- It handles SSL/TLS termination, serving certificates managed by Certbot to ensure a secure connection (HTTPS).
2.5. The Quality and Deployment System (CI/CD)
Automation is at the heart of the project, managed by a CI/CD pipeline hosted on GitHub Actions.
2.5.1. Workflow (deploy.yml)
- Trigger: the workflow is automatically launched with every
pushto themainbranch. testjob (Continuous Integration): before any deployment, the code is retrieved in an ephemeral Linux environment. A full test suite is executed with Pytest. To ensure realistic validation, this job launches its own temporary MariaDB database service. If a single test fails, the entire pipeline stops.deployjob (Continuous Deployment): only if thetestjob succeeds, the workflow connects to the production VPS via SSH. It then executes a script that orchestrates the deployment:git pullto retrieve the validated code.docker compose up -d --buildto rebuild the application image with the new code and restart the services without major interruption.
2.5.2. Testing Strategy
- Framework: Pytest, for its simplicity and power, as well as
pytest-mockto simulate external calls. - Test Environments: Use of an in-memory SQLite database for fast local tests, and a real MariaDB database in CI for maximum reliability.
- TDD Philosophy: Development follows a Test-Driven Development approach.
- Test Management and Traceability: The project includes a "docs-as-code" test management system:
- Each test case is a Markdown file stored in the
test_cases/directory. - A Python script (
scripts/sync_tests.py) ensures traceability by linking each test case to itspytestimplementation via a@pytest.mark.test_idmarker. - A functional coverage report website is automatically generated with MkDocs, providing a clear view of the validation status of each feature.
- Each test case is a Markdown file stored in the