Kubernetes GitOps Platform - Complete EKS Infrastructure & Platform Toolkit

Production-ready AWS EKS cluster with complete GitOps platform toolkit, automated deployment, monitoring, and observability

Published on Jan 15, 2025

Reading time: 8 minutes.


Built with


What is the significance of this?

  • Complete Infrastructure as Code: Provision entire AWS EKS cluster and all platform services with a single command
  • GitOps Automation: ArgoCD automatically deploys and manages all platform applications from Git repositories
  • Production-Ready Platform: Comprehensive observability stack with monitoring, logging, and testing tools
  • High Availability: Multi-AZ deployment with auto-scaling worker nodes and Horizontal Pod Autoscaler (HPA)
  • Zero-Downtime Deployments: Kubernetes rolling updates ensure continuous service availability
  • Complete Observability: Prometheus, Grafana, Loki, and Promtail for metrics, logs, and dashboards
  • SRE-Ready: Availability testing and sanity checks built-in

How is automation accomplished?

  • Terraform Infrastructure: Complete AWS infrastructure provisioning (VPC, EKS, IAM, Security Groups) as code
  • Docker Containerization: Terraform runs in Docker for consistent execution across team members
  • Helm Charts: Application deployment automation with Helm templates and values management
  • ArgoCD GitOps: App-of-apps pattern automatically deploys entire platform toolkit from Git
  • Makefile Automation: One-command deployment (make deploy) for entire infrastructure stack
  • Auto-Scaling: HPA for pod scaling and AWS Auto Scaling Groups for worker node scaling
  • Automated Sync: ArgoCD continuously syncs Git changes to cluster with self-healing capabilities

Prerequisites

  • AWS Account: Programmatic access with IAM user
  • IAM Policies: AmazonEC2FullAccess, IAMFullAccess, AutoScalingFullAccess, AmazonEKSClusterPolicy, AmazonEKSWorkerNodePolicy, AmazonVPCFullAccess, AmazonEKSServicePolicy, AmazonEKS_CNI_Policy
  • Docker: Installed and running for Terraform containerization
  • AWS CLI: Configured with credentials (~/.aws/credentials or AWS_PROFILE)
  • Git: For cloning repositories

Source Code

Infrastructure Repository: https://github.com/Lforlinux/k8s-infrastructure-as-code
Platform Toolkit Repository: https://github.com/Lforlinux/k8s-platform-toolkit

How to provision the infrastructure

Quick Start Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Clone the infrastructure repository
git clone https://github.com/Lforlinux/k8s-infrastructure-as-code.git
cd k8s-infrastructure-as-code

# Configure AWS credentials
aws configure
# OR set AWS_PROFILE environment variable
export AWS_PROFILE=your-profile

# Deploy entire stack with one command
make deploy

Review the Terraform plan and type yes to proceed. The deployment includes:

  • VPC and networking components (subnets, route tables, internet gateway)
  • EKS cluster with managed node groups
  • Metrics Server for HPA
  • NodeJS application (via Helm)
  • ArgoCD for GitOps
  • Platform toolkit applications (automatically deployed via ArgoCD)

Deployment Workflow

EKS GitOps Workflow

The infrastructure repository deploys ArgoCD, which then automatically references the platform toolkit repository through the app-of-apps pattern, deploying all platform applications.

Architecture

Architecture Diagram

Infrastructure Components

  • AWS VPC: Multi-AZ network with public and private subnets
  • EKS Cluster: Managed Kubernetes control plane
  • Worker Nodes: Auto-scaling node groups in private subnets
  • Application Load Balancer: External access to services
  • Security Groups: Network security and access control
  • IAM Roles: Service accounts and permissions

Platform Services (via ArgoCD)

The platform toolkit repository provides:

Monitoring Stack

  • Prometheus: Metrics collection and alerting
  • Grafana: Visualization dashboards
  • kube-state-metrics: Kubernetes object metrics
  • node-exporter: Node-level system metrics

Logging Stack

  • Loki: Centralized log aggregation
  • Promtail: Log collection agent (DaemonSet)

Testing & Validation

  • Sanity Test: Automated health check testing for microservices
  • Availability Test: SRE-style availability and reliability testing

Demo Applications

  • Online Boutique: Google’s microservices demo (11 services)
  • NodeJS Application: Sample application deployed via Helm

Key Features

Infrastructure as Code

  • Terraform Modules: Reusable infrastructure components
  • Docker-Based Execution: Consistent Terraform version across team
  • State Management: Remote state storage for team collaboration
  • Output Management: Automatic kubeconfig generation and access information

GitOps with ArgoCD

  • App-of-Apps Pattern: Hierarchical application management
  • Automated Sync: Continuous deployment from Git repositories
  • Self-Healing: Automatic reconciliation of cluster state
  • Multi-Repository: Infrastructure and platform toolkit separation
  • Sync Waves: Ordered deployment of dependent applications

High Availability & Auto-Scaling

  • Multi-AZ Deployment: High availability across availability zones
  • Horizontal Pod Autoscaler: Automatic pod scaling based on CPU/memory
  • Worker Node Auto-Scaling: AWS Auto Scaling Groups for node capacity
  • Load Balancing: Application Load Balancer for traffic distribution

Observability

  • Metrics Collection: Prometheus scrapes metrics from all services
  • Log Aggregation: Centralized logging with Loki and Promtail
  • Dashboards: Pre-configured Grafana dashboards for Kubernetes and applications
  • Alerting: Prometheus alerting rules for proactive monitoring

Testing & Validation

  • Sanity Testing: Automated health checks for all microservices
  • Availability Testing: SRE metrics (uptime %, MTTR) with real user simulation
  • Performance Testing: Integrated Locust container for load testing

Platform Toolkit Applications

1. Monitoring Stack

Purpose: Comprehensive observability and metrics collection
Namespace: monitoring

  • Prometheus: Metrics collection, storage, and alerting
  • Grafana: Visualization with pre-configured dashboards
  • kube-state-metrics: Kubernetes object state tracking
  • node-exporter: Node-level system metrics

Grafana Online Boutique Dashboard

The Grafana dashboard provides real-time monitoring of the Online Boutique microservices, including CPU usage by pod, pod status metrics, and application logs for comprehensive observability.

2. Logging Stack

Purpose: Centralized log aggregation and analysis
Namespace: monitoring

  • Loki: Log aggregation server with Prometheus-inspired storage
  • Promtail: Log collector agent (DaemonSet) for all pods

3. Sanity Test

Purpose: Automated health check testing
Namespace: sanity-test

  • Periodic health checks for all microservices (every 60 seconds)
  • Web UI dashboard with test results
  • REST API for programmatic access
  • Response time metrics and error tracking

Sanity Test Health Check Dashboard

The Sanity Test dashboard provides a comprehensive health check overview for all Online Boutique microservices, displaying total test runs, pass/fail status, and service health metrics in a Jenkins-like interface.

4. Availability Test

Purpose: SRE-style availability and reliability testing
Namespace: availability-test

  • Real user workflow simulation (add to cart, remove from cart)
  • Automated tests every 5 minutes
  • SRE metrics calculation (uptime %, MTTR)
  • Jenkins-like dashboard with green/red status

Availability Test Build Status

The Availability Test dashboard displays detailed build status and test case results, including complete user journey validation (visit → browse → add to cart → remove from cart) and microservices integration verification, providing comprehensive SRE-style reliability testing.

5. Online Boutique

Purpose: Microservices demonstration application
Namespace: online-boutique

  • 11 microservices (frontend, cart, checkout, payment, shipping, etc.)
  • Redis for cart storage
  • gRPC and REST API communication
  • Real-world microservices architecture patterns

Accessing the Cluster

Kubernetes Access

Option 1: Using kubectl

1
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name)

Option 2: Using kubeconfig file If running make deploy locally, a kubeconfig.yaml file is created in the project directory. Use this with Kubernetes tools like Lens, k9s, or other IDEs.

Application Access

View all LoadBalancer services:

1
kubectl get svc --all-namespaces -o wide | grep LoadBalancer

Available Services:

  • ArgoCD Server (argocd namespace): GitOps UI and API
  • NodeJS Application (default namespace): Main application
  • Grafana (monitoring namespace): Monitoring dashboards
  • Prometheus (monitoring namespace): Metrics collection
  • Microservices Demo Frontend (online-boutique namespace): Demo application
  • Availability Test (availability-test namespace): Testing service
  • Sanity Test (sanity-test namespace): Health check service

ArgoCD Access

ArgoCD is deployed with a LoadBalancer service. Access information is displayed after make deploy:

  • URL: Provided in deployment output
  • Username: admin
  • Password: Generated automatically (displayed in output)

To get password manually:

1
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

Performance Testing

Load testing can be performed using the Locust container:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
export TARGET_HOST=<your-alb-url>

docker run -i --rm \
  -v $PWD/reports:/opt/reports \
  -v ~/.aws:/root/.aws \
  -v $PWD/:/opt/script \
  -v $PWD/credentials:/meta/credentials \
  -p 8089:8089 \
  -e ROLE=standalone \
  -e TARGET_HOST=$TARGET_HOST \
  -e LOCUST_FILE=https://raw.githubusercontent.com/zalando-incubator/docker-locust/master/example/simple.py \
  -e SLAVE_MUL=4 \
  -e AUTOMATIC=False \
  registry.opensource.zalan.do/tip/docker-locust
  1. Open http://localhost:8089 in your browser
  2. Configure load test parameters (recommended: 1000+ concurrent users)
  3. Monitor pod scaling via HPA:
    1
    2
    
    kubectl get hpa
    kubectl get pods -w
    

Application Lifecycle Management

Zero-Downtime Deployments

The infrastructure supports zero-downtime deployments through Kubernetes rolling updates:

  1. Update Application Code

    • Make changes to the NodeJS application in Nodejs-Docker/
    • Build and push Docker image to your registry
  2. Update Helm Chart

    • Create a feature branch
    • Update image tag in charts/helm-nodejs-app/values.yaml
    • Create a Pull Request
  3. Deploy Changes

    1
    
    make deploy
    
    • Helm performs rolling updates
    • Kubernetes ensures zero downtime during deployment

GitOps Workflow (ArgoCD)

For GitOps-based deployments:

  1. Push application manifests to Git repository
  2. ArgoCD automatically syncs changes
  3. Rolling updates handled by Kubernetes
  4. Monitor deployment status in ArgoCD UI

Project Structure

Infrastructure Repository (k8s-infrastructure-as-code)

.
├── argocd/                 # ArgoCD application manifests
│   └── app-of-apps.yaml
├── charts/                 # Helm charts
│   └── helm-nodejs-app/
├── Nodejs-Docker/          # NodeJS application source
├── *.tf                    # Terraform configuration files
├── Makefile               # Automation scripts
└── README.md              # Project documentation

Platform Toolkit Repository (k8s-platform-toolkit)

.
├── application/          # Demo microservices applications
│   ├── k8s-demo/        # Platform dashboard application
│   └── online-boutique/ # Google's microservices demo
├── argocd/              # ArgoCD configuration and application definitions
│   ├── apps/            # Individual ArgoCD application manifests
│   └── install/         # ArgoCD installation manifests
├── availability-test/   # SRE availability testing application
├── dashboards/          # Grafana dashboard configurations
├── monitoring/          # Monitoring stack (Prometheus, Grafana, Loki)
└── sanity-test/         # Health check testing application

Repository Relationship

  • k8s-infrastructure-as-code: Contains complete Kubernetes infrastructure (EKS cluster, networking, security groups, IAM, etc.) and deploys ArgoCD
  • k8s-platform-toolkit: Supplies the app-of-apps repository location and stores all application source code and manifests

When the infrastructure repository deploys ArgoCD, it automatically references the platform toolkit repository through the app-of-apps pattern, which then deploys all platform applications defined there.

Troubleshooting

Cluster Access Issues

1
2
3
4
5
# Verify AWS credentials
aws sts get-caller-identity

# Check cluster status
aws eks describe-cluster --name <cluster-name> --region <region>

ArgoCD Access Issues

1
2
3
4
5
# Get ArgoCD password manually
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# Check ArgoCD server status
kubectl get svc -n argocd argocd-server

Application Not Accessible

1
2
3
4
5
6
7
8
# Check service status
kubectl get svc

# Check pod status
kubectl get pods

# View pod logs
kubectl logs <pod-name>

ArgoCD Sync Issues

1
2
3
4
5
6
7
8
# View all applications
kubectl get applications -n argocd

# Check application sync status
kubectl describe application <app-name> -n argocd

# Manual sync
argocd app sync <app-name>

Security Notes

  • Ensure IAM policies follow least privilege principles
  • Rotate ArgoCD admin password after first login
  • Use AWS Secrets Manager for sensitive data
  • Enable VPC flow logs for network monitoring
  • Regularly update container images and dependencies
  • Implement network policies for pod-to-pod communication
  • Use RBAC for Kubernetes access control

Future Enhancements

Planned Features

  • Service Mesh: Istio or Linkerd integration for advanced traffic management
  • CI/CD Integration: Jenkins/GitHub Actions pipeline automation
  • Multi-Cluster: Support for multiple EKS clusters
  • Disaster Recovery: Automated backup and restore procedures

Technical Improvements

  • Helm Chart Registry: Separate registry for Helm chart distribution
  • GitOps Automation: Automated PR-based deployments via GitHooks
  • Advanced Monitoring: Custom metrics and alerting rules
  • Security Hardening: Enhanced network policies and pod security standards

Contributing

Development Setup

  1. Fork the repositories
  2. Create feature branch: git checkout -b feature/your-feature
  3. Make changes and test locally
  4. Commit changes: git commit -m "Add your feature"
  5. Push to branch: git push origin feature/your-feature
  6. Create Pull Request

Code Standards

  • Terraform: Follow HashiCorp best practices and module structure
  • Kubernetes: Adhere to Kubernetes resource naming conventions
  • Helm: Follow Helm chart best practices and versioning
  • Documentation: Clear setup and troubleshooting guides

Conclusion

This Kubernetes GitOps Platform project demonstrates enterprise-grade infrastructure and platform operations practices, showcasing:

  • Complete Infrastructure as Code with Terraform for AWS EKS
  • GitOps Automation with ArgoCD app-of-apps pattern
  • Production-Ready Observability with Prometheus, Grafana, and Loki
  • SRE Best Practices with availability testing
  • High Availability with multi-AZ deployment and auto-scaling
  • Zero-Downtime Deployments through Kubernetes rolling updates

The project serves as both a functional Kubernetes platform and a comprehensive example of modern DevOps and GitOps practices, making it an excellent addition to any infrastructure engineer’s portfolio.

Live Demo: Kubernetes GitOps Platform
Infrastructure Repository: https://github.com/Lforlinux/k8s-infrastructure-as-code
Platform Toolkit Repository: https://github.com/Lforlinux/k8s-platform-toolkit


Comments