A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

Overview

A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects

Introduction

Modern Data Science environments often involve many independent projects, each spanning multiple accounts. In order to maintain a global overview of the activities within the projects, a mechanism to collect data from the different accounts into a central one is crucial.

In this example code, we show how one can leverage existing services (Amazon DynamoDB, AWS Lambda, Amazon EventBridge) to deploy a very lightweight infrastructure that allows the flow of relevant metrics from one or more Spoke accounts to one (or more) Hub accounts.

The quantities being monitored are called Metric in the following. We will focus here on scalar metrics (i.e. numbers, not vectors). Extension to multi-dimensional metrics is trivial. In this example we monitor quantities that are closely related to Amazon SageMaker. Of course, the same architecture can be extended to monitor any other metric.

General Architecture

The overview of the solution is presented in the diagram below:

Architecture

As already mentioned, we use Amazon EventBridge for the cross-account information exchange, and Amazon DynamoDB as data store in the Hub account. AWS Lambda functions are used to extract information from the Spoke accounts and to store it in the Hub. The red arrows are the configuration flow, which happens only once. Green lines describe the flow for requesting new data from the Spokes. Blue lines show the flow of data from the Spokes to the Hub account.

Configuration

The use of Amazon EventBridge as communication layer means that the permissions needed to operate the dashboard are minimal. The information extraction runs in the Spoke account, and the Hub account does not need to have any cross-account access. We also chose to allow the Hub to trigger a refresh of the values for all Spokes: this is done by generating a special event in an AWS Lambda function and sending it to the Spokes, where a rule will trigger the extraction function.

The only cross-account permission that needs to be set is therefore the one that configures the event forward from the Spoke/Hub to the Hub/Spoke account. This requires that:

  1. The Hub account must allow (in the resource policy of the receiving event bus) events:PutEvent from each of the spokes it is connected to. The Spokes must allow the same operation from the Hub.
  2. The Spoke account needs to define an Amazon EventBridge Rule that forwards events generated by the information extraction to the Hub account. The Hub must have a rule to forward the refresh command to the Spokes.

We use the AWS Systems Manager Parameter Store to store, within each account, the information needed to configure the event forwards. This offers the advantage that the information concerning the structure of hubs and spokes is explocitely stored in the accounts. A dedicated lambda function reads the configuration form the Parameter Store and applies the needed configuration in each account. The code is setup in such a way to allow any account to be connected to multiple monitors, and itself to serve (at the same time) as monitor for other accounts. A connection requires two parameters to be set: one in the Spoke (pointing it to the Hub) and one in the Hub (pointing it to the Spoke).

Extraction of information

An AWS Lambda function in each spoke account takes care of extracting the needed information. We chose to write this part of code to be highly modular, and to allow fine-grained, least-priviledge permissions management. In detail:

  • each metric is implemented in an independent python class.
  • all metrics inherit from a base class which implements core functionality, such as communication with the event bus.
  • all metrics also define, as class variable, the IAM permissions they need to extract the information from the account
  • when deploying the solution in the Spoke, the list of metrics to be monitored needs to be provided
  • the extraction function is given, when deploying, only the permissions it needs to extract the metrics that are requested
  • at runtime, the extraction function loops over the metrics, emitting one event for each of them

Fetching new data

In order to request new data from all Spokes, the Hub has to emit to its own event bus an event with contents:

{
    "source": "metric_extractor",
    "detail-type": "metric_extractor",
    "resources": [],
    "detail": "{}"
}

This event will be forwarded to all Spokes, which are configured to trigger a new extraction upon its reception. The results of the extractions are sent back to the Hub, again through Amazon EventBridge.

Archival of information

The Hub account receives events from all the Spokes it is connected to. It extracts the payload and stores it to an Amazon DynamoDB table. In this example, we use a simple schema for the event:

{
"source": "metric_extractor",
"resources": [],
"detail-type": "metric_extractor",
"detail":  {
        "MetricName": "aName",
        "MetricValue": "aValue",
        "ExtractionDate": "aTimeStamp",
        "Metadata": {"field1":"value1"},
        "Environment": "dev",
        "ProjectName": "aProject"
    }
}

Each MetricValue will be identified by its MetricName and its ExtractionDate. Filtering by ProjectName is also possible. To support the case when one single project owns more accounts, the additional field Environment is also stored. This will typically refer to the stages of the CI/CD pipeline within a project (dev/int/prod).

An additional field is also supported, to store metadata concerning this particular extraction.

The Amazon DynamoDB table in the Hub account is using MetricName as primary key, and ExtractionDate as sort key.

Deployment

We use the AWS Cloud Development Kit to deploy the solution in both Hub and Spokes.

For the deployment we will need 2 AWS Accounts:

Account one - the Hub account, will be used for the deployment of the HubStack. This stack contains the DynamoDB, EventBridge rules and associated Lambdas to receive events from the spoke accounts.

Account two the Spoke account, for the purposes of this demonstration we are going to use one spoke account - but this solution will scale to any number of spoke accounts.

For this guide we will assume that you have the following installed and or setup:

To get started, download the code attached to this guide on your local machine. The following steps must be executed from the folder where you downloaded the code.

First, prepare the local python environment. The code includes a file requirements.txt, with the packages you will need. Execute in a terminal:

pip install -r requirements.txt

Now you need to be authenticated into the AWS account you wish to use as the Hub account. For more information on how to authenticate into your AWS accounts, please refer to https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

To deploy the hub account infrastructure, run the following command:

cdk deploy --app "python3 hub.py"

If any prompts appear to approve adding the IAM policies - please approve them.

After that has succeeded, in the terminal assume a role of the AWS account you wish to use as the spoke account, and run the following command:

cdk deploy -c \
metrics=TotalCompletedTrainingJobs,NumberEndPointsInService,CompletedTrainingJobs24h\
 -c environment=dev \
-c project_name=Project1

This command has a -c flag, the -c is for context, and it is a way of passing in variables to the CDK code - more information can be found here. We will use these variables for the following purposes:

  • metrics:
    • The metrics variable is a comma separated list which allows the user to choose what metrics they wish to retrieve from a spoke account. More metrics can be added. The full list available in this example is:
      • TotalCompletedTrainingJobs
      • CompletedTrainingJobs24h
      • NumberEndPointsInService
  • environment:
    • This variable is mapped to the deployment environment you may have, for example development, pre-prod or production. It is a string and can be any value you would like.
  • project_name:
    • This variable is similar to the environment, it needs to be a string and is freeform, so you you can identify the particular ML project you want data from

Once the Hub and Spoke are deployed, we need to setup the connection between the two. We keep the connection step separated from deployment on purpose. The idea is to be able to add new spokes without having to redeploy resources. The following script summarizes the commands you need:

# run this in each Spoke account
aws ssm put-parameter \
--name "/monitors/TestHub" \
--type "String" \
--value "HUB_ACCOUNT_ID" \
--overwrite

# run this in the Hub account, once for each Spoke you want to connect
aws ssm put-parameter \
    --name "/monitored_projects/TestProject/dev" \
    --type "String" \
    --value "SPOKE_ACCOUNT_ID" \
    --overwrite
    
    

Now that the deployment is done and configuration data is stored, we can trigger the actual configuration of the accounts The only issue here is that we cannot configure a rule to send events to another account if the receiving account has not allowed the sender to put events first. So we need to first configure the cross-account events:PutEvent permission on both Hub and Spoke, then we can (on both Hub and Spoke), configure the event rule for forwarding

# in the Hub
aws lambda invoke --function-name ds-dashboard-connection \
    --payload "{ \"action\": \"EBPut\"}" lambda.out.json
    
# in the Spoke

aws lambda invoke --function-name ds-dashboard-connection \
    --payload "{ \"action\": \"EBPut\"}" lambda.out.json
aws lambda invoke --function-name ds-dashboard-connection \
    --payload "{ \"action\": \"EBRule\"}" lambda.out.json

# in hub, again, now we can create the event forward rule
aws lambda invoke --function-name ds-dashboard-connection \
    --payload "{ \"action\": \"EBRule\"}" lambda.out.json

Implementing a new metric

In order to implement a new metric, users need to add a class in the file metric.py. The new class must inherit from Metric, as defined in the same file. Here is the implementation for one of the example metrics we provide:

class NumberEndPointsInService(Metric):
    # this class variable defines the Action and Resource for the IAM
    # permissions needed for this metric
    
    _iam_permissions = Metric._iam_permissions + [
       { 
           "Action": "sagemaker:ListEndpoints",
            "Resource": "*"
       }
    ]
    # this internal method MUST be implemented. This is what computes returns the
    # actual value
    def _compute_value(self):
        eps = sagemaker_client.list_endpoints(
            StatusEquals='InService',
        )['Endpoints']
        return len(eps)

As you can see, the amount of code to be written is really minimal, since most of the operations are handled by the parent class. When specifying the IAM permissions for the metric, you are allowed to use **ACCOUNT_ID** and **REGION** as placeholders for the real account and region, which will only be known at deploy time. In case you need more fine-grained placeholders (for example, a bucket name in the Resource section), you can implement your own get_iam_permissions method in the new class, to override the one provided by Metric.

Example dashboard

The technology to use for analysis and visualization of the collected data depends on the constraints of the specific setup, i.e. what solutions are already available and in use within the environment. A detailed discussion is beyond the scope of this example. Instead, we connected two spokes to the hub and ran a few training jobs, deploying one model to production. The Amazon DynamoDB table was connected to Amazon QuickSight and here is a simple table visualization with two historical plots:

Example QuickSight Dashboard

Cleanup

How to remove the resources created to avoid unnecessary costs.

In the terminal assume a role in the Hub account and run the following command to remove the Hub stack

cdk destroy --app "python3 hub.py"

In the terminal assume a role in the Spoke account and run the following command to remove the Spoke stack

cdk destroy 

In addition, some resources were created by the connection lambda and need to be removed by you:

  • in the Hub and Spokes, go to the Amazon EventBridge console and delete rules whose name starts with forward.
  • In the Hub and Spoke, clean up the AWS Systems Manager Parameter Store
Owner
AWS Samples
AWS Samples
A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

Lehman Garrison 3 Aug 24, 2022
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Active Learning demo using two small datasets

ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea

3 Nov 10, 2021
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Dec 31, 2022
International Space Station data with Python research šŸŒŽ

International Space Station data with Python research šŸŒŽ Plotting ISS trajectory, calculating the velocity over the earth and more. Plotting trajector

Facundo Pedaccio 41 Jun 16, 2022
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

carlo paladino 1 Jan 02, 2022
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

4 Aug 02, 2022
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown

915 Dec 26, 2022
CRISP: Critical Path Analysis of Microservice Traces

CRISP: Critical Path Analysis of Microservice Traces This repo contains code to compute and present critical path summary from Jaeger microservice tra

Uber Research 110 Jan 06, 2023
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

qgrid Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your D

Quantopian, Inc. 2.9k Jan 08, 2023
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

DƩbora Mendes de Azevedo 1 Feb 03, 2022
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 01, 2021
Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

Juniper A. Lake 2 Feb 16, 2022