What is the significance of this?
- Complete Infrastructure as Code: Provision entire AWS EKS cluster and all platform services with a single command
- GitOps Automation: ArgoCD automatically deploys and manages all platform applications from Git repositories
- Production-Ready Platform: Comprehensive observability stack with monitoring, logging, and testing tools
- High Availability: Multi-AZ deployment with auto-scaling worker nodes and Horizontal Pod Autoscaler (HPA)
- Zero-Downtime Deployments: Kubernetes rolling updates ensure continuous service availability
- Complete Observability: Prometheus, Grafana, Loki, and Promtail for metrics, logs, and dashboards
- SRE-Ready: Availability testing and sanity checks built-in
How is automation accomplished?
- Terraform Infrastructure: Complete AWS infrastructure provisioning (VPC, EKS, IAM, Security Groups) as code
- Docker Containerization: Terraform runs in Docker for consistent execution across team members
- Helm Charts: Application deployment automation with Helm templates and values management
- ArgoCD GitOps: App-of-apps pattern automatically deploys entire platform toolkit from Git
- Makefile Automation: One-command deployment (
make deploy) for entire infrastructure stack - Auto-Scaling: HPA for pod scaling and AWS Auto Scaling Groups for worker node scaling
- Automated Sync: ArgoCD continuously syncs Git changes to cluster with self-healing capabilities
Prerequisites
- AWS Account: Programmatic access with IAM user
- IAM Policies: AmazonEC2FullAccess, IAMFullAccess, AutoScalingFullAccess, AmazonEKSClusterPolicy, AmazonEKSWorkerNodePolicy, AmazonVPCFullAccess, AmazonEKSServicePolicy, AmazonEKS_CNI_Policy
- Docker: Installed and running for Terraform containerization
- AWS CLI: Configured with credentials (
~/.aws/credentialsorAWS_PROFILE) - Git: For cloning repositories
Source Code
Infrastructure Repository: https://github.com/Lforlinux/k8s-infrastructure-as-code
Platform Toolkit Repository: https://github.com/Lforlinux/k8s-platform-toolkit
How to provision the infrastructure
Quick Start Deployment
|
|
Review the Terraform plan and type yes to proceed. The deployment includes:
- VPC and networking components (subnets, route tables, internet gateway)
- EKS cluster with managed node groups
- Metrics Server for HPA
- NodeJS application (via Helm)
- ArgoCD for GitOps
- Platform toolkit applications (automatically deployed via ArgoCD)
Deployment Workflow

The infrastructure repository deploys ArgoCD, which then automatically references the platform toolkit repository through the app-of-apps pattern, deploying all platform applications.
Architecture

Infrastructure Components
- AWS VPC: Multi-AZ network with public and private subnets
- EKS Cluster: Managed Kubernetes control plane
- Worker Nodes: Auto-scaling node groups in private subnets
- Application Load Balancer: External access to services
- Security Groups: Network security and access control
- IAM Roles: Service accounts and permissions
Platform Services (via ArgoCD)
The platform toolkit repository provides:
Monitoring Stack
- Prometheus: Metrics collection and alerting
- Grafana: Visualization dashboards
- kube-state-metrics: Kubernetes object metrics
- node-exporter: Node-level system metrics
Logging Stack
- Loki: Centralized log aggregation
- Promtail: Log collection agent (DaemonSet)
Testing & Validation
- Sanity Test: Automated health check testing for microservices
- Availability Test: SRE-style availability and reliability testing
Demo Applications
- Online Boutique: Google’s microservices demo (11 services)
- NodeJS Application: Sample application deployed via Helm
Key Features
Infrastructure as Code
- Terraform Modules: Reusable infrastructure components
- Docker-Based Execution: Consistent Terraform version across team
- State Management: Remote state storage for team collaboration
- Output Management: Automatic kubeconfig generation and access information
GitOps with ArgoCD
- App-of-Apps Pattern: Hierarchical application management
- Automated Sync: Continuous deployment from Git repositories
- Self-Healing: Automatic reconciliation of cluster state
- Multi-Repository: Infrastructure and platform toolkit separation
- Sync Waves: Ordered deployment of dependent applications
High Availability & Auto-Scaling
- Multi-AZ Deployment: High availability across availability zones
- Horizontal Pod Autoscaler: Automatic pod scaling based on CPU/memory
- Worker Node Auto-Scaling: AWS Auto Scaling Groups for node capacity
- Load Balancing: Application Load Balancer for traffic distribution
Observability
- Metrics Collection: Prometheus scrapes metrics from all services
- Log Aggregation: Centralized logging with Loki and Promtail
- Dashboards: Pre-configured Grafana dashboards for Kubernetes and applications
- Alerting: Prometheus alerting rules for proactive monitoring
Testing & Validation
- Sanity Testing: Automated health checks for all microservices
- Availability Testing: SRE metrics (uptime %, MTTR) with real user simulation
- Performance Testing: Integrated Locust container for load testing
Platform Toolkit Applications
1. Monitoring Stack
Purpose: Comprehensive observability and metrics collection
Namespace: monitoring
- Prometheus: Metrics collection, storage, and alerting
- Grafana: Visualization with pre-configured dashboards
- kube-state-metrics: Kubernetes object state tracking
- node-exporter: Node-level system metrics

The Grafana dashboard provides real-time monitoring of the Online Boutique microservices, including CPU usage by pod, pod status metrics, and application logs for comprehensive observability.
2. Logging Stack
Purpose: Centralized log aggregation and analysis
Namespace: monitoring
- Loki: Log aggregation server with Prometheus-inspired storage
- Promtail: Log collector agent (DaemonSet) for all pods
3. Sanity Test
Purpose: Automated health check testing
Namespace: sanity-test
- Periodic health checks for all microservices (every 60 seconds)
- Web UI dashboard with test results
- REST API for programmatic access
- Response time metrics and error tracking

The Sanity Test dashboard provides a comprehensive health check overview for all Online Boutique microservices, displaying total test runs, pass/fail status, and service health metrics in a Jenkins-like interface.
4. Availability Test
Purpose: SRE-style availability and reliability testing
Namespace: availability-test
- Real user workflow simulation (add to cart, remove from cart)
- Automated tests every 5 minutes
- SRE metrics calculation (uptime %, MTTR)
- Jenkins-like dashboard with green/red status

The Availability Test dashboard displays detailed build status and test case results, including complete user journey validation (visit → browse → add to cart → remove from cart) and microservices integration verification, providing comprehensive SRE-style reliability testing.
5. Online Boutique
Purpose: Microservices demonstration application
Namespace: online-boutique
- 11 microservices (frontend, cart, checkout, payment, shipping, etc.)
- Redis for cart storage
- gRPC and REST API communication
- Real-world microservices architecture patterns
Accessing the Cluster
Kubernetes Access
Option 1: Using kubectl
|
|
Option 2: Using kubeconfig file
If running make deploy locally, a kubeconfig.yaml file is created in the project directory. Use this with Kubernetes tools like Lens, k9s, or other IDEs.
Application Access
View all LoadBalancer services:
|
|
Available Services:
- ArgoCD Server (
argocdnamespace): GitOps UI and API - NodeJS Application (
defaultnamespace): Main application - Grafana (
monitoringnamespace): Monitoring dashboards - Prometheus (
monitoringnamespace): Metrics collection - Microservices Demo Frontend (
online-boutiquenamespace): Demo application - Availability Test (
availability-testnamespace): Testing service - Sanity Test (
sanity-testnamespace): Health check service
ArgoCD Access
ArgoCD is deployed with a LoadBalancer service. Access information is displayed after make deploy:
- URL: Provided in deployment output
- Username:
admin - Password: Generated automatically (displayed in output)
To get password manually:
|
|
Performance Testing
Load testing can be performed using the Locust container:
|
|
- Open
http://localhost:8089in your browser - Configure load test parameters (recommended: 1000+ concurrent users)
- Monitor pod scaling via HPA:
1 2kubectl get hpa kubectl get pods -w
Application Lifecycle Management
Zero-Downtime Deployments
The infrastructure supports zero-downtime deployments through Kubernetes rolling updates:
-
Update Application Code
- Make changes to the NodeJS application in
Nodejs-Docker/ - Build and push Docker image to your registry
- Make changes to the NodeJS application in
-
Update Helm Chart
- Create a feature branch
- Update image tag in
charts/helm-nodejs-app/values.yaml - Create a Pull Request
-
Deploy Changes
1make deploy- Helm performs rolling updates
- Kubernetes ensures zero downtime during deployment
GitOps Workflow (ArgoCD)
For GitOps-based deployments:
- Push application manifests to Git repository
- ArgoCD automatically syncs changes
- Rolling updates handled by Kubernetes
- Monitor deployment status in ArgoCD UI
Project Structure
Infrastructure Repository (k8s-infrastructure-as-code)
.
├── argocd/ # ArgoCD application manifests
│ └── app-of-apps.yaml
├── charts/ # Helm charts
│ └── helm-nodejs-app/
├── Nodejs-Docker/ # NodeJS application source
├── *.tf # Terraform configuration files
├── Makefile # Automation scripts
└── README.md # Project documentation
Platform Toolkit Repository (k8s-platform-toolkit)
.
├── application/ # Demo microservices applications
│ ├── k8s-demo/ # Platform dashboard application
│ └── online-boutique/ # Google's microservices demo
├── argocd/ # ArgoCD configuration and application definitions
│ ├── apps/ # Individual ArgoCD application manifests
│ └── install/ # ArgoCD installation manifests
├── availability-test/ # SRE availability testing application
├── dashboards/ # Grafana dashboard configurations
├── monitoring/ # Monitoring stack (Prometheus, Grafana, Loki)
└── sanity-test/ # Health check testing application
Repository Relationship
- k8s-infrastructure-as-code: Contains complete Kubernetes infrastructure (EKS cluster, networking, security groups, IAM, etc.) and deploys ArgoCD
- k8s-platform-toolkit: Supplies the app-of-apps repository location and stores all application source code and manifests
When the infrastructure repository deploys ArgoCD, it automatically references the platform toolkit repository through the app-of-apps pattern, which then deploys all platform applications defined there.
Troubleshooting
Cluster Access Issues
|
|
ArgoCD Access Issues
|
|
Application Not Accessible
|
|
ArgoCD Sync Issues
|
|
Security Notes
- Ensure IAM policies follow least privilege principles
- Rotate ArgoCD admin password after first login
- Use AWS Secrets Manager for sensitive data
- Enable VPC flow logs for network monitoring
- Regularly update container images and dependencies
- Implement network policies for pod-to-pod communication
- Use RBAC for Kubernetes access control
Future Enhancements
Planned Features
- Service Mesh: Istio or Linkerd integration for advanced traffic management
- CI/CD Integration: Jenkins/GitHub Actions pipeline automation
- Multi-Cluster: Support for multiple EKS clusters
- Disaster Recovery: Automated backup and restore procedures
Technical Improvements
- Helm Chart Registry: Separate registry for Helm chart distribution
- GitOps Automation: Automated PR-based deployments via GitHooks
- Advanced Monitoring: Custom metrics and alerting rules
- Security Hardening: Enhanced network policies and pod security standards
Contributing
Development Setup
- Fork the repositories
- Create feature branch:
git checkout -b feature/your-feature - Make changes and test locally
- Commit changes:
git commit -m "Add your feature" - Push to branch:
git push origin feature/your-feature - Create Pull Request
Code Standards
- Terraform: Follow HashiCorp best practices and module structure
- Kubernetes: Adhere to Kubernetes resource naming conventions
- Helm: Follow Helm chart best practices and versioning
- Documentation: Clear setup and troubleshooting guides
Conclusion
This Kubernetes GitOps Platform project demonstrates enterprise-grade infrastructure and platform operations practices, showcasing:
- Complete Infrastructure as Code with Terraform for AWS EKS
- GitOps Automation with ArgoCD app-of-apps pattern
- Production-Ready Observability with Prometheus, Grafana, and Loki
- SRE Best Practices with availability testing
- High Availability with multi-AZ deployment and auto-scaling
- Zero-Downtime Deployments through Kubernetes rolling updates
The project serves as both a functional Kubernetes platform and a comprehensive example of modern DevOps and GitOps practices, making it an excellent addition to any infrastructure engineer’s portfolio.
Live Demo: Kubernetes GitOps Platform
Infrastructure Repository: https://github.com/Lforlinux/k8s-infrastructure-as-code
Platform Toolkit Repository: https://github.com/Lforlinux/k8s-platform-toolkit