Architecture Overview
- UI Backend is responsible for creation of task, triggering pipelines, downloading outputs and updating configurations for the algo run and reporting run status to the user.
- UI : Angular frontend
- Pipeline: There are 6 types of pipelines
- Algo run : Pipeline to run algorithms.
- File Ingestion : Validate upload files then transform into proper format usable by algorithm and reports.
- Create new project : Creates environment and schemas for new project.
- ETL Pipeline : ADF pipelines required to ingest data from SFTP and other sources. API integrations are also support in these pipelines. These pipelines hit the UI backend with the file. Multiple pipelines depending for a client depending on the use case.
- Parent creation pipeline: Creates the environment and schemas for parent data.
- Utility pipelines: Pipelines used to create/update/delete pipelines.
- MySQL DB : Stores configurations and other meta data, like algo run status and configuration logs.
- Data Lake : Stores files arranged via containers
- NoSQL Database (Trino) : Query files for reporting.
- Superset : Dashboards

Storage
We use Azure Data lake Gen 2 for storing our data in form of files.
Before we go into details of storage structure, we must look into two important concepts of Data Lake
- Storage Account
- Storage Container
Storage Account
A storage account contains all the data in the form of blobs. You can create storage containers within a storage account. Access is managed at the storage account level.
Storage Container
The storage account is used to organize blobs and data within a storage account. There can be any number of containers within a storage account, and a storage container can contain any number of blobs.
The structure of storage is as follows