Skip to main content

Command Palette

Search for a command to run...

Building an Agentic CI/CD Pipeline with Amazon Bedrock, GitLab CI, and AWS CDK

Published
7 min read
Building an Agentic CI/CD Pipeline with Amazon Bedrock, GitLab CI, and AWS CDK

Overview

What if your CI/CD pipeline could think? Not just run lint and tests, but actually understand your code changes, generate tests for new functions, and write a risk assessment with inline code review comments -- all automatically on every merge request?

This blog walks through building exactly that: an agentic CI/CD pipeline that integrates Amazon Bedrock's foundation models directly into GitLab CI, turning a traditional pipeline into an intelligent code review and quality assurance system.

The result is a 6-stage pipeline where two stages are AI-powered -- the agent generates tests for changed code and performs a holistic merge request review with risk scoring, all deployed as infrastructure-as-code with AWS CDK.

How AI Can Transform CI/CD Productivity

Traditional CI/CD pipelines are deterministic: they run the same checks the same way every time. Lint passes or fails. Tests pass or fail. There's no interpretation, no context, no judgement.

AI agents change this fundamentally:

  • Automated test generation -- When a developer adds a new function, the agent can inspect the code and generate meaningful test cases, not just import checks but actual logic tests with edge cases.

  • Holistic code review -- Instead of isolated tool outputs, an AI agent can correlate lint violations, test failures, security findings, and the actual diff to produce a unified risk assessment.

  • Contextual inline comments -- The agent can point to specific lines of code with architectural suggestions, anti-pattern warnings, or security concerns that static tools miss entirely.

  • Risk-based merge gating -- Rather than binary pass/fail, the pipeline produces a risk score (0-100) with weighted drivers, giving reviewers actionable context to make merge decisions.

The key insight is that AI agents don't replace existing CI tools -- they augment them. Ruff still lints, pytest still tests, bandit still scans. The agent sits on top, consuming all their outputs plus the raw diff to produce something none of them could alone.

What This Solution Solves

In a typical development workflow, code review is a bottleneck. Reviewers have to:

  1. Manually read every diff line

  2. Cross-reference lint, test, and security outputs

  3. Assess overall risk based on what changed

  4. Write comments and suggest improvements

  5. Decide whether to approve

This solution automates steps 1-4 completely and provides structured input for step 5:

  • AI-generated tests catch untested functions before the MR is even reviewed

  • Automated risk scoring quantifies merge risk on a 0-100 scale with weighted risk drivers

  • MR summary notes are posted directly to GitLab with risk score, review summary, and artifact links

  • Inline code comments highlight specific issues on the diff itself

  • Deterministic fallbacks ensure the pipeline never breaks even if the AI is unavailable

Running It Yourself

Prerequisites

  • Docker & Docker Compose

  • AWS CLI configured with Bedrock access

  • Node.js + AWS CDK CLI

  • Python 3.12

For a POC, self-hosted GitLab CE via Docker Compose gives full control over CI/CD configuration, runner setup, and API access without SaaS limitations. The Docker executor runs jobs in isolated containers with the Python 3.12 image.

Solution Architecture

The system has three main components: a self-hosted GitLab instance with CI/CD runners, an AWS backend with API Gateway + Lambda + Bedrock Agent, and CI scripts that orchestrate everything.

Architecture Overview

Component Breakdown

GitLab (Docker Compose)

  • Self-hosted GitLab CE on port 8080

  • GitLab Runner with Docker executor

  • Python 3.12 base image with ruff, pytest, bandit, pip-audit

AWS Cloud (CDK-deployed)

  • API Gateway (REST) with API key authentication

  • Lambda function (Python 3.12, 512MB, 120s timeout)

  • Amazon Bedrock Agent using Nova Pro v1:0 foundation model

  • IAM roles for agent invocation and model access

CI Scripts (Python)

  • detect_changes.py -- git diff analysis

  • generate_tests_from_diff.py -- AI test generation with fallback

  • validate_generated_tests.py -- AST safety validation

  • persist_generated_tests.py -- optional bot commit

  • mr_review_agent.py -- MR review with risk scoring and GitLab API posting

The 6-Stage Pipeline

The pipeline runs on every merge request update. Two stages are AI-powered (Prepare and Agent Review), while the middle three are traditional CI tools.

Stage 1: PREPARE (AI-Powered)

This stage runs four jobs sequentially:

detect_changes -- Runs git diff against the target branch to produce three artifacts: the list of changed files, the list of changed Python files, and the full diff text. These artifacts feed every downstream job.

generate_tests -- Sends the diff and changed file list to the Bedrock Agent via API Gateway. The agent analyzes the code changes and generates pytest test files. If the agent returns malformed JSON, the Lambda retries up to 2 times. If all retries fail, the CI script falls back to AST-based test generation that produces import + function existence tests.

validate_generated_tests -- Parses every generated test file through Python's AST module to ensure they're syntactically valid and safe to run. No os.system, no subprocess, no eval.

persist_generated_tests -- Optionally commits the generated tests back to the feature branch (controlled by AUTO_COMMIT_GENERATED_TESTS variable).

Stage 2: LINT

Runs ruff check . and saves the output to artifacts/lint.txt. Ruff is a fast Python linter that catches style violations, import errors, and anti-patterns.

Stage 3: TEST

Runs pytest -v --tb=short --junitxml=artifacts/junit.xml to execute all tests -- both manually written tests and AI-generated tests from Stage 1. Results are saved as a JUnit XML report that GitLab renders in the MR UI.

Stage 4: SECURITY

Runs bandit (static security analysis) and pip-audit (dependency vulnerability scanning). Output is saved to artifacts/security.txt.

Stage 5: AGENT REVIEW (AI-Powered)

The most sophisticated stage. The mr_review_agent.py script:

  1. Collects all artifacts from previous stages (lint.txt, test output, security.txt, changes.diff)

  2. Calls the Bedrock Agent via API Gateway with the full context

  3. Computes a deterministic baseline risk score from the artifacts (test failures, lint violations, security findings)

  4. Merges AI and deterministic scores -- the final risk score is always >= the deterministic baseline

  5. Posts results to GitLab as an MR summary note with risk drivers table, review summary, and artifact links

  6. Posts inline comments on specific lines of the diff for critical findings

Stage 6: DEPLOY

Manual deployment to staging, only available on the main branch.

The AWS Stack

The entire AWS infrastructure is defined in a single CDK stack (AgentGateStack), making it fully reproducible with cdk deploy.

Data Flow: End to End

Here's how data flows through the entire system from developer push to MR comment:

  1. Developer pushes to a feature branch and opens/updates an MR

  2. GitLab triggers the pipeline with merge_request_event source

  3. detect_changes produces the diff and file lists as CI artifacts

  4. generate_tests sends the diff to Bedrock and writes test files

  5. Lint, test, and security stages run independently, producing their own artifacts

  6. mr_review collects ALL artifacts, calls Bedrock for a holistic review

  7. The review result is posted as an MR note with risk score and inline comments

Risk Scoring: How It Works

The risk scoring system combines deterministic analysis with AI judgement:

AI Enhancement

The Bedrock Agent analyzes the actual code changes and produces its own risk score with drivers. The final score is the maximum of the AI score and the deterministic baseline -- ensuring that hard signals (failing tests, security vulnerabilities) can never be downplayed by the AI.

Agent response in merge request of gitlab merge screen

Conclusion

This project demonstrates that AI agents can be meaningfully integrated into CI/CD pipelines today -- not as replacements for existing tools, but as an intelligent layer that synthesizes their outputs into actionable insights.

project repo used for this POC is at agentic ci/cd pipeline