Mar 2025 – Apr 2025•DevOps / Cloud Infrastructure

AWS ECS CI/CD Deployment for Node.js App

Production-style deployment pipeline for a Node.js backend using Docker, GitHub Actions, ECR, ECS, ALB, and CloudWatch.

The Problem

Deployments were manual and error-prone. There was no consistent way to build, tag, or roll back container images. Every release required SSH access to the server.

The Solution

Automate the entire path from a git push to a running ECS task behind a load balancer, with logs centralized in CloudWatch and secrets managed outside the codebase.

Implementation Details

Problem

Before this project, deploying a Node.js service meant SSH-ing into an EC2 instance, pulling the latest code, and restarting the process manually. There was no audit trail, no rollback plan, and no way to detect a failed deployment early.

Architecture

GitHub Push
    │
    ▼
GitHub Actions CI
    ├── Build Docker image
    ├── Tag with commit SHA
    └── Push to Amazon ECR
            │
            ▼
    ECS Service Update (rolling deploy)
            │
            ▼
    Application Load Balancer
            │
            ▼
    ECS Task (Node.js container)
            │
            ▼
    CloudWatch Logs

Tech Stack

Runtime: Node.js (Express)
Containerization: Docker
CI/CD: GitHub Actions
Registry: Amazon ECR
Compute: Amazon ECS (Fargate)
Networking: Application Load Balancer, VPC, Security Groups
Observability: CloudWatch Logs, CloudWatch Metrics
Secrets: AWS Secrets Manager, GitHub Actions Secrets
IAM: Task execution role with least-privilege policies

What I Built

Dockerized a Node.js Express API with a multi-stage build to keep the image lean
Configured Amazon ECR as the private image registry; images are tagged with the git commit SHA for traceability
Set up an ECS cluster (Fargate launch type) with a task definition referencing the ECR image
Placed an Application Load Balancer in front of the ECS service for health-checked routing
Wrote a GitHub Actions workflow that builds, tags, pushes, and triggers a new ECS deployment on every push to main
Forwarded container stdout/stderr to CloudWatch Logs using the awslogs log driver
Stored database credentials and API keys in AWS Secrets Manager; injected at runtime as environment variables in the task definition

Deployment Flow

Developer pushes to main on GitHub
GitHub Actions checks out the code and builds the Docker image
The workflow authenticates to AWS using OIDC (no long-lived access keys stored)
Image is pushed to ECR tagged as latest and <commit-sha>
ECS service is updated via aws ecs update-service --force-new-deployment
ECS performs a rolling replacement: new tasks start, health checks pass, old tasks drain
CloudWatch Logs capture all container output in real time

Monitoring / Logs

CloudWatch Log Groups are created per ECS service. I set up a basic metric filter to alert when the error rate exceeds a threshold. Logs are retained for 30 days.

Security Decisions

GitHub Actions uses OIDC federation to assume an IAM role — no static AWS credentials stored in GitHub Secrets
ECS task execution role is scoped to only the required ECR, Secrets Manager, and CloudWatch permissions
The ALB security group only allows inbound 80/443; the ECS task security group only allows traffic from the ALB
Secrets are never baked into the Docker image or the task definition in plain text

Failure / Rollback Handling

If the new ECS tasks fail their health checks, ECS stops the rolling deployment and leaves the previous tasks running. To roll back intentionally, the workflow can be re-run against an older commit SHA, which re-pushes that image and triggers a new deployment.

What I Learned

ECS rolling deployments are safe by default but require properly configured health checks on the ALB target group
OIDC-based AWS authentication from GitHub Actions eliminates the security risk of rotating long-lived IAM keys
CloudWatch log groups do not auto-delete; setting a retention policy from the start avoids unexpected storage costs
Tagging images with the commit SHA makes it easy to correlate a production issue with the exact code that caused it