Building an Agentic CI/CD Pipeline with Amazon Bedrock, GitLab CI, and AWS CDK

Overview
What if your CI/CD pipeline could think? Not just run lint and tests, but actually understand your code changes, generate tests for new functions, and write a risk assessment with inline code review comments -- all automatically on every merge request?
This blog walks through building exactly that: an agentic CI/CD pipeline that integrates Amazon Bedrock's foundation models directly into GitLab CI, turning a traditional pipeline into an intelligent code review and quality assurance system.
The result is a 6-stage pipeline where two stages are AI-powered -- the agent generates tests for changed code and performs a holistic merge request review with risk scoring, all deployed as infrastructure-as-code with AWS CDK.
How AI Can Transform CI/CD Productivity
Traditional CI/CD pipelines are deterministic: they run the same checks the same way every time. Lint passes or fails. Tests pass or fail. There's no interpretation, no context, no judgement.
AI agents change this fundamentally:
Automated test generation -- When a developer adds a new function, the agent can inspect the code and generate meaningful test cases, not just import checks but actual logic tests with edge cases.
Holistic code review -- Instead of isolated tool outputs, an AI agent can correlate lint violations, test failures, security findings, and the actual diff to produce a unified risk assessment.
Contextual inline comments -- The agent can point to specific lines of code with architectural suggestions, anti-pattern warnings, or security concerns that static tools miss entirely.
Risk-based merge gating -- Rather than binary pass/fail, the pipeline produces a risk score (0-100) with weighted drivers, giving reviewers actionable context to make merge decisions.
The key insight is that AI agents don't replace existing CI tools -- they augment them. Ruff still lints, pytest still tests, bandit still scans. The agent sits on top, consuming all their outputs plus the raw diff to produce something none of them could alone.
What This Solution Solves
In a typical development workflow, code review is a bottleneck. Reviewers have to:
Manually read every diff line
Cross-reference lint, test, and security outputs
Assess overall risk based on what changed
Write comments and suggest improvements
Decide whether to approve
This solution automates steps 1-4 completely and provides structured input for step 5:
AI-generated tests catch untested functions before the MR is even reviewed
Automated risk scoring quantifies merge risk on a 0-100 scale with weighted risk drivers
MR summary notes are posted directly to GitLab with risk score, review summary, and artifact links
Inline code comments highlight specific issues on the diff itself
Deterministic fallbacks ensure the pipeline never breaks even if the AI is unavailable
Running It Yourself
Prerequisites
Docker & Docker Compose
AWS CLI configured with Bedrock access
Node.js + AWS CDK CLI
Python 3.12
For a POC, self-hosted GitLab CE via Docker Compose gives full control over CI/CD configuration, runner setup, and API access without SaaS limitations. The Docker executor runs jobs in isolated containers with the Python 3.12 image.
Solution Architecture
The system has three main components: a self-hosted GitLab instance with CI/CD runners, an AWS backend with API Gateway + Lambda + Bedrock Agent, and CI scripts that orchestrate everything.
Architecture Overview
Component Breakdown
GitLab (Docker Compose)
Self-hosted GitLab CE on port 8080
GitLab Runner with Docker executor
Python 3.12 base image with ruff, pytest, bandit, pip-audit
AWS Cloud (CDK-deployed)
API Gateway (REST) with API key authentication
Lambda function (Python 3.12, 512MB, 120s timeout)
Amazon Bedrock Agent using Nova Pro v1:0 foundation model
IAM roles for agent invocation and model access
CI Scripts (Python)
detect_changes.py-- git diff analysisgenerate_tests_from_diff.py-- AI test generation with fallbackvalidate_generated_tests.py-- AST safety validationpersist_generated_tests.py-- optional bot commitmr_review_agent.py-- MR review with risk scoring and GitLab API posting
The 6-Stage Pipeline
The pipeline runs on every merge request update. Two stages are AI-powered (Prepare and Agent Review), while the middle three are traditional CI tools.
Stage 1: PREPARE (AI-Powered)
This stage runs four jobs sequentially:
detect_changes -- Runs git diff against the target branch to produce three artifacts: the list of changed files, the list of changed Python files, and the full diff text. These artifacts feed every downstream job.
generate_tests -- Sends the diff and changed file list to the Bedrock Agent via API Gateway. The agent analyzes the code changes and generates pytest test files. If the agent returns malformed JSON, the Lambda retries up to 2 times. If all retries fail, the CI script falls back to AST-based test generation that produces import + function existence tests.
validate_generated_tests -- Parses every generated test file through Python's AST module to ensure they're syntactically valid and safe to run. No os.system, no subprocess, no eval.
persist_generated_tests -- Optionally commits the generated tests back to the feature branch (controlled by AUTO_COMMIT_GENERATED_TESTS variable).
Stage 2: LINT
Runs ruff check . and saves the output to artifacts/lint.txt. Ruff is a fast Python linter that catches style violations, import errors, and anti-patterns.
Stage 3: TEST
Runs pytest -v --tb=short --junitxml=artifacts/junit.xml to execute all tests -- both manually written tests and AI-generated tests from Stage 1. Results are saved as a JUnit XML report that GitLab renders in the MR UI.
Stage 4: SECURITY
Runs bandit (static security analysis) and pip-audit (dependency vulnerability scanning). Output is saved to artifacts/security.txt.
Stage 5: AGENT REVIEW (AI-Powered)
The most sophisticated stage. The mr_review_agent.py script:
Collects all artifacts from previous stages (lint.txt, test output, security.txt, changes.diff)
Calls the Bedrock Agent via API Gateway with the full context
Computes a deterministic baseline risk score from the artifacts (test failures, lint violations, security findings)
Merges AI and deterministic scores -- the final risk score is always >= the deterministic baseline
Posts results to GitLab as an MR summary note with risk drivers table, review summary, and artifact links
Posts inline comments on specific lines of the diff for critical findings
Stage 6: DEPLOY
Manual deployment to staging, only available on the main branch.
The AWS Stack
The entire AWS infrastructure is defined in a single CDK stack (AgentGateStack), making it fully reproducible with cdk deploy.
Data Flow: End to End
Here's how data flows through the entire system from developer push to MR comment:
Developer pushes to a feature branch and opens/updates an MR
GitLab triggers the pipeline with
merge_request_eventsourcedetect_changesproduces the diff and file lists as CI artifactsgenerate_testssends the diff to Bedrock and writes test filesLint, test, and security stages run independently, producing their own artifacts
mr_reviewcollects ALL artifacts, calls Bedrock for a holistic reviewThe review result is posted as an MR note with risk score and inline comments
Risk Scoring: How It Works
The risk scoring system combines deterministic analysis with AI judgement:
AI Enhancement
The Bedrock Agent analyzes the actual code changes and produces its own risk score with drivers. The final score is the maximum of the AI score and the deterministic baseline -- ensuring that hard signals (failing tests, security vulnerabilities) can never be downplayed by the AI.
Agent response in merge request of gitlab merge screen
Conclusion
This project demonstrates that AI agents can be meaningfully integrated into CI/CD pipelines today -- not as replacements for existing tools, but as an intelligent layer that synthesizes their outputs into actionable insights.
project repo used for this POC is at agentic ci/cd pipeline



