MS Product Architecture

Architecture Overview

UI Backend is responsible for creation of task, triggering pipelines, downloading outputs and updating configurations for the algo run and reporting run status to the user.
UI : Angular frontend
Pipeline: There are 6 types of pipelines
- Algo run : Pipeline to run algorithms.
- File Ingestion : Validate upload files then transform into proper format usable by algorithm and reports.
- Create new project : Creates environment and schemas for new project.
- ETL Pipeline : ADF pipelines required to ingest data from SFTP and other sources. API integrations are also support in these pipelines. These pipelines hit the UI backend with the file. Multiple pipelines depending for a client depending on the use case.
- Parent creation pipeline: Creates the environment and schemas for parent data.
- Utility pipelines: Pipelines used to create/update/delete pipelines.
MySQL DB : Stores configurations and other meta data, like algo run status and configuration logs.
Data Lake : Stores files arranged via containers
NoSQL Database (Trino) : Query files for reporting.
Superset : Dashboards

MS Architecture Diagram.jpg

Storage

We use Azure Data lake Gen 2 for storing our data in form of files.

Before we go into details of storage structure, we must look into two important concepts of Data Lake

Storage Account
Storage Container

Storage Account

A storage account contains all the data in the form of blobs. You can create storage containers within a storage account. Access is managed at the storage account level.

Storage Container

The storage account is used to organize blobs and data within a storage account. There can be any number of containers within a storage account, and a storage container can contain any number of blobs.

The structure of storage is as follows