GCSMTTR - GitHub Code Scanning Mean Time to Remediate (Data Storage + API)

Welcome to the GCSMTTR Data Storage & API Product! 👋

Overview

GCSMTTR Data Storage & API product is an open-source initiative helping teams collect and report mean time to remediate (MTTR) data for GitHub organizations and repositories within their organizations.

How this works

A high-level design of the solution how this solution works is found below:

To explain the diagram above ...

A ' code_scanning_alert ' webhook is triggered whenever a Code Scanning Alert is created/fixed/manually closed; a code_scanning_alert webhook is triggered. An Amazon API Gateway then ingests that webhook payload. There are two factors of authentication that occur first.

Checking the IP Address of the webhook is a valid GitHub Hook IP.
The webhook secret is checked to ensure it matches the secret expected.

If one factor of authentication fails, the webhook is rejected. If both pass, the webhook is accepted, and the payload gets sent to an Amazon EventBridge Queue, which triggers an Amazon Step Function State Machine.

The state machine firstly checks if the code_scanning_alert action is either created or fixed/manually closed. If created, the data is structured and entered into the All Events Table. If fixed/manually closed, the data is sent for processing. Suppose the alert is already in the database (i.e. the alert has been created whilst this solution has been enabled). In that case, the data is updated in the database to reflect the new information. If the alert was not already in the database (i.e. the alert never was created before this solution was enabled), the data is not entered into the database and exits.

If data was entered/updated into the All Events Table, a DyanmoDB Event Stream is triggered and sent to a Lambda, which forwards the payload straight onto an Amazon SQS Queue (FIFO). The queue sends the data for processing and updates/creates a record within the Repository Overview Table.

Suppose data was entered/updated into the Repository Overview Table. In that case, a DyanmoDB Event Stream is triggered and sent to a Lambda, which forwards the payload straight onto an Amazon SQS Queue (FIFO). The queue sends the data for processing and updates/creates a record within the Organisation Overview Table.

For a non-technical description, see below.

Non-Technical Description

This solution allows users to query mean time to remediate data about a GitHub Organisation or a GitHub Repository. Data is stored within three formats:

All Events: This is the raw data that is collected from GitHub. Each row reflects an individual code scanning alert event. This table has code scanning events from multiple GitHub Repositories and Organisations.
Repository Overview: This is the next level up from the All Events table. Each row reflects an individual GitHub Repository. This table shows the total mean time to remediate (MTTR) for a specific repository.
** Organization Overview**: This is the next level up from the Repository Overview table. Each row reflects an individual GitHub Organisation. This table shows the total mean time to remediate (MTTR) for a specific organization.

This allows for total flexibility when querying for data across different formats.

Getting Started

This is a solution which you need to deploy yourself. Due to this solution ingesting and processing webhook data, a custom deployed solution is required. Specifically, you will need to deploy this into an AWS* environment. The good news is there is an Infrastructure as Code (IaC) file that deploys the whole solution for you, meaning it's a one-click deployment. The guide on deploying this to AWS can be found below.

*This solution right now is specific to AWS. However, the IaC file could be copied and edited to work with Azure/GCP. I would love contributions to this.

Querying Data (GraphQL)

This service exposes data via a GraphQL API. See the schema.graphql to understand how you can get data from this service. The below shows some example queries which can be run to get data out of this service.