General

Deploying MCP Client/Server Architecture on AWS ECS: A Case Study

Introduction

There’s been a lot of hype around MCP (Model Context Protocol) lately — and rightfully so. As more companies explore scalable architectures for tool-augmented AI applications, MCP is becoming a promising approach to bridge LLM intelligence with external tools and real-time data sources. I recently worked on deploying an MCP client/server setup on Amazon ECS, and I’d like to share a few practical lessons and observations from the journey.

What is MCP?

MCP is an open protocol that defines how AI assistants can communicate with external tools and resources. The protocol enables LLMs to discover available tools, understand their capabilities, and invoke them when needed to complete user requests.

Why Use MCP?

The importance of MCP lies in its ability to transform LLM applications into dynamic, tool-augmented systems. Instead of being limited to their training data, LLMs can now access real-time information, perform calculations, interact with databases, and execute complex workflows. This capability is essential for developing production-ready AI applications that can effectively handle real-world scenarios.

Key benefits include:

Extensibility: New tools can be added without modifying the core LLM application
Security: Tools run in isolated environments with controlled access to resources
Modularity: Different tools can be developed and maintained independently

Use Case Overview

Scenario Description

For this tutorial, I’ve chosen a document question-answering scenario that showcases MCP’s practical value.

Imagine a company with diverse data storage systems — S3 buckets for documents, relational databases for structured data, NoSQL databases for user information, and various other data sources. Users want to ask questions about specific files or data without needing to know where the information is stored or how to access it. They ask natural language questions such as “What is the contact information in contract.pdf?” or “What is the timeline for the cloud migration Star project?” and receive intelligent, contextual answers.

Users want to ask questions about specific files or data without needing to know where the information is stored or how to access it.

The system needs to:

Understand the user’s question and identify which document they’re referring to
Retrieve the document from secure cloud storage
Process the document to extract readable text (handling PDFs, scanned documents, etc.)
Analyze the content to find relevant information
Present a coherent answer to the user

Solution Architecture

Component Overview

The solution I built consists of two main containerized services deployed on AWS ECS, with a clear separation between user-facing and backend components:

MCP Client (Public-facing):

Is a FastAPI application exposing REST endpoints
Integrates with AWS Bedrock for LLM capabilities
Communicates with MCP server for tool execution
Is deployed behind a public Application Load Balancer (ALB)

MCP Server (Private):

Is a FastMCP server providing document processing tools
Integrates various data sources and services — such as Amazon S3 and Amazon Textract — which will be the primary focus of this demo
Deployed behind an internal ALB for security isolation
Is accessible only from within the VPC

Originally, I considered deploying both the MCP Client and Server on separate ECS services, under the same ECS instance, allowing internal communication between them. While this is possible using ECS Service Connect, Service Discovery, or VPC Lattice, each introduces additional setup complexity, which can be overkill for this tutorial demo.

In a production environment where multiple containerized components need to work closely together, Kubernetes may be a better fit than ECS — it allows for grouping related containers, provides built-in service discovery, and simplifies coordination between modular AI components.

For now, to keep things simple, cost-effective, and focused on testing, I’ve deployed the MCP Client and Server independently on ECS Fargate, without tight coupling between the two.

Deep Dive: Application Logic

MCP Server Implementation

The MCP server provides tools that the client can invoke. The key tool I’ll demonstrate in this demo is get_pdf_content, which handles S3 document retrieval and processing:

@mcp.tool()
def get_pdf_content(pdf_file_name: str) -> dict:
"""Search for a PDF file in an S3 bucket and, if found, analyze it with Amazon Textract async API."""
# Check if file exists in S3
try:
s3.head_object(Bucket=S3_PDF_BUCKET_NAME, Key=pdf_file_name)
except ClientError as e:
if e.response['Error']['Code'] == '404':
return {}
return {"error": str(e)}
# Start Textract async job for text extraction
response = textract.start_document_analysis(
DocumentLocation={'S3Object': {'Bucket': S3_PDF_BUCKET_NAME, 'Name': pdf_file_name}},
FeatureTypes=['TABLES', 'FORMS']
)
# Poll for completion and return extracted text by page
# … polling logic …
return {"text": text_by_page}

Containerization

In order to deploy the MCP server on ECS, I need to dockerize it first. Here’s the Dockerfile for the server:

FROM python:3.11-slim
WORKDIR /app
# Install uv for faster package management
RUN pip install uv
# Copy requirements file
COPY requirements.txt .
# Install dependencies using uv
RUN uv venv
RUN uv pip install -r requirements.txt
# Copy application code
COPY server.py .
# Expose the port the server runs on
EXPOSE 8050
# Command to run the server
CMD ["uv", "run", "server.py"]

Requirements file

mcp[cli]==1.10.1
fastapi
uvicorn
boto3>=1.37.28
langchain-aws>=0.2.18
langchain-community>=0.3.21
langchain-mcp-adapters>=0.0.7
python-dotenv
ipykernel
httpx>=0.27
httpcore>=1.0.7
anyio>=4.5

MCP Client Implementation

The client provides the main endpoint:

@app.post("/query", response_model=QueryResponse)
async def query_endpoint(request: QueryRequest):
    try:
        response = await llm_client.process_query(request.query)
        return QueryResponse(response=response)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

The llm_client.process_query method orchestrates the conversation between the user, LLM, and MCP tools. It takes the user’s query, sends it to Claude via Bedrock, checks if the LLM wants to use any tools, extracts the tool name and arguments from the LLM’s response, invokes the appropriate tool through the MCP server, receives the tool’s results, and then sends the original query, LLM response, and tool results back to the LLM to generate the final answer.

   async def process_query(self, query: str) -> str:
        logger.info(f"Processing query: {query}")
        messages = [HumanMessage(content=query)]
        response = await self.model.ainvoke(messages)
        tool_calls = getattr(response, "tool_calls", None)
        if tool_calls:
            for tool_call in tool_calls:
                tool_name = tool_call.get("name")
                tool_args = tool_call.get("args")
                tool_id = tool_call.get("id", "tool1")
                requested_tool = next((t for t in self.tools if t.name == tool_name), None)
                if not requested_tool:
                    logger.warning(f"Requested tool '{tool_name}' not found.")
                    continue
                logger.info(f"Invoking tool: {tool_name} with args: {tool_args}")
                tool_result = await requested_tool.ainvoke(tool_args)
                new_messages = [HumanMessage(content=query)]
                new_messages.append(response)
                new_messages.append(
                    ToolMessage(
                        content=str(tool_result),
                        tool_call_id=tool_id,
                        name=tool_name,
                    )
                )
                final_response = await self.model.ainvoke(new_messages)
                logger.info(f"Final response after tool call: {final_response.content}")
                return str(final_response.content)
        logger.info(f"Response without tool call: {response.content}")
        return str(response.content)

LLM Integration

To enable the LLMs to use tools effectively, I provided a carefully crafted system prompt that guides them on how and when to use each tool. For example, in my use case, the prompt looked like this:

SYSTEM_PROMPT = """
You are a helpful assistant that has access to various tools.
Your job is to:
- Decide if a user's request requires using a tool, or if you can answer directly.
- If a tool is needed, extract the correct parameters from the user's message and call the tool.
- If you receive a tool result, process it and return a clear, helpful answer.
If the user asks a question about a specific PDF file, use the get_pdf_content tool.
Extract the file name from the user's message and pass it as the pdf_file_name parameter.
"""

Tool Calling Flow

User Query: “What is the contact information in deployment.pdf?”
LLM Decision: Determines a tool is needed and extracts “deployment.pdf”
Tool Extraction: Code extracts the tool suggested by the LLM
Tool Invocation: Calls get_pdf_content with extracted filename
Context Provision: Tool response (extracted file content) is provided back to the LLM as context
Intelligent Answer: LLM uses the additional context to answer the user’s question intelligently

Infrastructure as Code (IaC)

ECR & OIDC Setup

First, I need to start by provisioning two container registries (ECRs) and secure CI/CD configuration:

# IAM Role for GitHub Actions
resource "aws_iam_role" "github_actions_push_imag_role" {
  name = "${local.environment}-gha-role" 

  assume_role_policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : {
          "Federated" : data.aws_iam_openid_connect_provider.github_actions.arn
        },
        "Action" : "sts:AssumeRoleWithWebIdentity",
        "Condition" : {
          "StringEquals" : {
            "token.actions.githubusercontent.com:aud" : "sts.amazonaws.com"
          },
          "StringLike" : {
            # Adjust the repository identifier to match your GitHub org and repo.
            "token.actions.githubusercontent.com:sub" : [
              "repo:Tiger4Code/Deploy-MCP-Server-Client-App-on-ECS:*"
            ]
          }
        }
      }
    ]
  })
}

Key Features:

Separate ECR repositories for client and server images
OIDC integration eliminates need for long-lived AWS credentials
Repository-specific permissions for security

ECS & Networking

I implemented a multi-tier architecture for the core infrastructure. I created dedicated task definitions for each service with proper IAM roles and security groups. Here is the MCP Server ECS, service, and task definition:

resource "aws_ecs_cluster" "mcp_server_ecs_cluster" {
  name = local.mcp_server_ecs_cluster_name
}

resource "aws_ecs_task_definition" "mcp_server_ecs_task" {
  family                   = local.mcp_server_ecs_task_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.mcp_server_ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.mcp_server_ecs_task_execution_role.arn
  container_definitions = jsonencode([
    {
      name      = "server-app"
      image     = "${local.account_id}.dkr.ecr.${var.region}.amazonaws.com/${var.mcp_server_ecr_repo_name}:${var.server_image_tag}"
      essential = true
      portMappings = [
        {
          containerPort = 8050
          protocol      = "tcp"
        }
      ]
      environment = [
        {
          name  = "S3_BUCKET_NAME"
          value = module.pdf_s3_bucket.s3_bucket_id
        }
      ]
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          awslogs-group         = aws_cloudwatch_log_group.mcp_server_ecs_log_group.name
          awslogs-region        = var.region
          awslogs-stream-prefix = "ecs"
        }
      }
    }
  ])
}

resource "aws_ecs_service" "mcp_server_ecs_service" {
  name            = local.mcp_server_ecs_service_name
  cluster         = aws_ecs_cluster.mcp_server_ecs_cluster.id
  task_definition = aws_ecs_task_definition.mcp_server_ecs_task.arn
  launch_type     = "FARGATE"
  desired_count   = 1

  network_configuration {
    subnets         = module.vpc.private_subnets
    security_groups = [aws_security_group.ecs_server_sg.id]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.ecs_mcp_server_alb_tg.arn
    container_name   = "server-app"
    container_port   = 8050
  }

  depends_on = [aws_iam_role_policy_attachment.mcp_server_ecs_task_execution_role_policy]
}

Deployment & Operations

CI/CD Pipeline

I set up a GitHub Actions workflow that leverages the OIDC role for secure deployments. The pipeline automatically builds, tags, and pushes Docker images to ECR when code changes are pushed to the main branch.

Authentication: Uses OIDC to assume the AWS role without storing credentials
Build: Creates Docker images for both client and server components
Push: Uploads images to the respective ECR repositories
Deploy: Updates ECS services with new task definitions

Running the System

Deployment Steps:

Deploy ECR Infrastructure
Deploy Main Infrastructure
Verify Deployment:

Check ECS services are running healthy
Verify ALB health checks are passing
Test client endpoint accessibility

Testing the Endpoints

Once deployed, I can test the system using the client ALB DNS name (I uploaded deployment.pdf to S3):

Query Endpoint:

curl -X POST "http://<client-alb-dns-name>:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is the contact information in deployment.pdf?"}'

Response:

{
"response": "Based on the content of deployment.pdf, here is the contact information I found:Contact Information: Email: contact@company.com - Phone: +1 (555) 123–4567 - Address: 122 Business Street, Suite 110, City, State 12345 - Support Hours: Monday-Friday, 9:00 AM - 6:00 PM EST. Emergency Contact: - After Hours: +1 (555) 987–6543. This information is located on page 2 of the deployment document under the 'Contact Details' section."
}

Production Considerations

In production, the MCP client ALB should be behind a user-friendly DNS name with SSL/TLS certificates. I skipped this for the demo to focus on core MCP functionality.

Summary

I’ve successfully deployed a full MCP client/server architecture on AWS ECS, showcasing a practical application of MCP in bridging LLM intelligence with external tools and cloud services.

This implementation can be extended to include database access, API integrations, and support for various document formats. For production workloads, consider deploying on Kubernetes for improved orchestration and scalability.

If you have any questions, please reach out to me or us at New Math Data!

Stay tuned for more tutorials on AWS AI Services in action.

Author: Noor Sabahi | Senior AI & Cloud Engineer | AWS Ambassador

#AWS #AWSAmbassador #MCPClient #MCPServer #ECS #MCPDeployment

Mcp Server

Mcp Client

Mcp Deployment

Ai Tools

ECS Fargate