The Use Case
Estimated time: 10–15 minutes |
Acme Inc. is facing issues within their CI/CD system, and they’re asking you to build a Gen AI based solution for them.
The Challenge
Acme Inc. has a critical CI/CD pipeline that frequently fails due to various issues such as:
-
Build failures due to dependency conflicts
-
Deployment errors caused by configuration mismatches
-
Resource constraints leading to timeouts
-
Integration test failures requiring manual investigation
The current process requires developers to:
-
Manually investigate error logs
-
Search through documentation for solutions
-
Create GitHub issues for tracking problems
-
Manually restart failed pipelines
-
Coordinate with team members for resolution
This manual intervention is:
-
Time-consuming and delays releases
-
Error-prone due to human oversight
-
Not scalable as the team grows
-
Lacking consistent documentation of solutions
How to think about the problem (DORA and MTTR)
-
DORA metrics are industry-standard measures of software delivery performance used to identify high performing teams: Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Recovery (MTTR). See https://dora.dev for background.
-
MTTR is the average time it takes to restore service after a failure. Lower MTTR means faster recovery and less downtime.
-
In later modules you will codify the agent and wire it into the pipeline’s finally step so failures automatically trigger the agent—helping drive down MTTR.
The Solution: AI-Powered CI/CD Agent
We will build an intelligent agent that can:
-
Detect and analyze CI/CD failures
-
Interact with OpenShift to query cluster resources and pod status
-
Search for relevant solutions using web search and documentation
-
Provide actionable recommendations to developers
-
Create GitHub issue in the specified repository for tracking