Skip to main content

Command Palette

Search for a command to run...

How 10 Engineers Built a Production-Grade DevOps Pipeline in 2 Weeks

Updated
4 min read
G
I am an IT Practitioner passionate about Automation

Introduction

Two weeks. Ten engineers. One shared goal: deploy a production-grade microservices application on AWS with full CI/CD, GitOps, Kubernetes, and observability — from scratch.

This is the story of Team Achievers11, DMI Cohort 2, Group 11. I served as both Team Lead and Technical Lead. Here is what we built, how we built it, and what we learned.

What We Built

We deployed Spring PetClinic — an 8-service microservices application — to AWS EKS. The full stack included:

•      GitHub Actions CI/CD pipeline: Maven build, Trivy security scan, Docker build, ECR push, and ArgoCD GitOps sync — fully automated on every commit to main

•      AWS infrastructure provisioned entirely with Terraform: VPC with public/private subnets across 2 AZs, EKS managed node groups, RDS MySQL, 8 ECR repositories, S3 + DynamoDB for Terraform remote state

•      GitOps deployment via ArgoCD: the cluster always reflects what is in Git, with selfHeal and prune enabled — no manual kubectl apply in production

•      Full observability: Prometheus scraping all services via ServiceMonitor CRDs, Grafana dashboards for cluster health and application performance, Zipkin distributed tracing, and Alertmanager firing to Slack on CrashLoopBackOff, high error rate, and CPU pressure

•      Security: OIDC authentication for GitHub Actions (no static AWS keys stored anywhere), IRSA for pod identity, Trivy blocking on CRITICAL/HIGH vulnerabilities

The Team and How We Divided the Work

Ten people, each owning a domain completely:

•      Greg (me) — Team Lead / Technical Lead: architecture design, PR reviews, merge authority, Terraform validation, presentation lead

•      Idah Makena — Scrum Master: Jira sprint management, standup facilitation, burndown tracking

•      Sandra Olisama — PMO / Demo Lead: flow coordination, pre-demo checklist, presentation timing

•      Aderinto Adedayo — CI/CD Lead: GitHub Actions pipeline, Docker multi-stage builds, Trivy integration

•      Osenat Alonge — ArgoCD / Repo Owner: GitOps configuration, sync policies, rollback demonstrations

•      Anthonia Adekunle — Cloud Infrastructure Lead: Terraform modules for VPC, EKS, RDS, and IAM

•      Ogonna Umeh — Kubernetes Lead: all manifests, HPA, liveness/readiness probes, resource limits

•      Bigben GH — SRE / Observability: Prometheus, Grafana dashboards, Alertmanager rules, Zipkin, SLO definition

•      Olabode Aderoju — QA Lead: 47 test cases, Postman collections, Newman in CI, QA sign-off report

•      Ally Buruhani — Documentation Lead: README, Architecture Decision Records, Deployment Guide, post-mortem

The Hardest Moment

Sprint 2. Aderinto's CI pipeline was trying to push to ECR before Anthonia's Terraform had finished provisioning the repositories. The pipeline failed. The repositories did not exist yet.

We caught it in standup — not in production. Idah's daily sync had surfaced the cross-team dependency. The fix was sequencing: Terraform apply runs first as a prerequisite job in the pipeline. The ECR push job depends_on the infrastructure job. Thirty minutes to diagnose, thirty minutes to fix.

That one standup saved us a full sprint of debugging.

What the Agentic Workflow Added

I used Claude Code — an agentic AI tool — to accelerate the infrastructure work. Instead of writing every Terraform module from scratch, I gave Claude Code the goal and the existing directory structure. It generated the VPC, EKS, and RDS modules with correct syntax, variable references, and remote state configuration. What would have taken 3-4 hours took under 45 minutes.

The key lesson: agentic AI is strongest when you give it specific goals and real output to analyse — not vague requests. Paste the actual error. Describe the actual state. Let it reason from real data.

Key Lessons

•      Start security from day zero. Retrofitting IAM least-privilege and Trivy scanning is 10x harder than building it in from Sprint 1.

•      GitOps removes an entire class of human error. After switching to ArgoCD, we had zero deployment-caused incidents. Declarative is safer than imperative.

•      Document as you build. Ally's README, ADRs, and runbooks saved hours during demo prep. Documentation is not optional.

•      Cross-functional standups matter. The ECR dependency story proves it. Surface dependencies early — not in production.

This Is Part of DMI

This project was built as part of the DevOps Micro Internship (DMI) programme run by Pravin Mishra. DMI Cohort 3 starts 27 June 2026. If you want to build real DevOps skills — not just watch tutorials — apply here:

https://docs.google.com/forms/d/e/1FAIpQLSel7ai7nyb0P1qLW4vEyfB_nEsD4lUF1XG88vmAaFGBOb6hPA/viewform

#DMI #DevOps #AWS #Kubernetes #GitOps #CloudComputing #TheCloudAdvisory