General

GenAI

Building Intelligent Data Query Systems with Amazon Bedrock Knowledge Base and Redshift

Introduction

In today’s data-driven world, organizations are constantly seeking ways to make their data more accessible to non-technical users. Traditional SQL queries require specialized knowledge and can be intimidating for business users who need quick insights from their data. Amazon Bedrock Knowledge Base with Redshift integration offers a revolutionary solution that bridges this gap by enabling natural language queries against structured data warehouses.

This article explores how to leverage Amazon Bedrock’s Knowledge Base feature with Amazon Redshift to create intelligent, conversational interfaces that can understand human-readable questions and automatically generate and execute SQL queries against your data warehouse.

What is Amazon Bedrock Knowledge Base with Redshift?

Amazon Bedrock Knowledge Base is a managed service that allows you to create intelligent systems capable of understanding natural language queries and converting them into structured database operations. When integrated with Amazon Redshift, it provides a powerful combination of:

Natural Language Processing: Understanding human-readable questions
SQL Generation: Automatically creating optimized SQL queries
Query Execution: Running queries directly against your Redshift cluster
Contextual Responses: Providing meaningful answers with proper formatting

Key Benefits

No SQL Required — Business users query data using plain English
Instant Self-Service Analytics — Get immediate insights without waiting for technical teams
Auto-Generated Optimized Queries — Converts natural language to error-free SQL
Serverless & Scalable — No infrastructure management, handles concurrent users
Conversational Context — Maintains conversation history and understands follow-up questions

Architecture Overview

The Knowledge Base with Redshift integration follows this architecture:

Query Systems with Amazon Knowledge Base and Redshift

Implementation Guide

Prerequisites

Before setting up your Knowledge Base, ensure you have:

An active AWS account with appropriate permissions
A Redshift cluster with your data loaded
IAM roles configured for Bedrock access
Descriptions of the Tables and Column in Redshift

Step 1: Prepare Your Data Schema (Optional but Recommended)

While not strictly required, providing detailed descriptions of your tables and columns significantly improves the Knowledge Base’s understanding of your data structure and query accuracy.
This configuration also allows you to control which tables and columns to include or exclude.

Create a JSON configuration that describes your schema:

# Define table configurations directly for our sample schema
table_configs = [
        {
            "name": "dev.public.products",
            "description": "Product catalog containing item details, pricing, and inventory information",
            "inclusion": "INCLUDE",
            "columns": [
                {
                    "name": "product_id",
                    "description": "Unique product identifier",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "product_name",
                    "description": "Product display name",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "category",
                    "description": "Product category classification",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "price",
                    "description": "Current selling price in USD",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "stock_quantity",
                    "description": "Available inventory count",
                    "inclusion": "INCLUDE"
                }
            ]
        },
        {
            "name": "dev.public.orders",
            "description": "Customer order transactions with status and fulfillment details",
            "inclusion": "INCLUDE",
            "columns": [
                {
                    "name": "order_id",
                    "description": "Unique order identifier",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "customer_id",
                    "description": "Customer who placed the order",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "order_date",
                    "description": "Date and time when order was placed",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "total_amount",
                    "description": "Total order value including taxes and shipping",
                    "inclusion": "INCLUDE"
                },
                {
                    "name": "order_status",
                    "description": "Current order status: pending, shipped, delivered, cancelled",
                    "inclusion": "INCLUDE"
                }
            ]
        }
    ]

Step 2: Create the Knowledge Base

Use the AWS SDK or console to create your Knowledge Base with Redshift configuration:

import boto3
def create_knowledge_base():
    bedrock_client = boto3.client('bedrock-agent')
    
    # Knowledge Base configuration using correct AWS structure
    kb_config = {
        "type": "SQL",
        "sqlKnowledgeBaseConfiguration": {
            "type": "REDSHIFT",
            "redshiftConfiguration": {
                "queryEngineConfiguration": {
                    "type": "PROVISIONED",
                    "provisionedConfiguration": {
                        "authConfiguration": {
                            "type": "IAM"
                        },
                        "clusterIdentifier": "your-redshift-cluster"
                    }
                },
                "queryGenerationConfiguration": {
                    "executionTimeoutSeconds": 200,
                    "generationContext": {
                        "tables": table_configs
                    }
                },
                "storageConfigurations": [{
                    "type": "REDSHIFT",
                    "redshiftConfiguration": {
                        "databaseName": "dev"
                    }
                }]
            }
        }
    }
    
    # Create the Knowledge Base
    response = bedrock_client.create_knowledge_base(
        name="ecommerce-redshift-kb",
        description="Knowledge Base for e-commerce Redshift data queries",
        roleArn="arn:aws:iam::YOUR_ACCOUNT:role/BedrockKnowledgeBaseRole",
        knowledgeBaseConfiguration=kb_config
    )
    
    return response['knowledgeBase']['knowledgeBaseId']

# Create the Knowledge Base
kb_id = create_knowledge_base()
print(f"Knowledge Base created with ID: {kb_id}")

Step 3: Query the Knowledge Base

Once your Knowledge Base is created and active, you can query it using natural language:

import boto3

def query_knowledge_base(question, kb_id, session_id=None):
    bedrock_client = boto3.client('bedrock-agent-runtime')
    
    # Foundation model ARN (adjust based on your region and model)
    foundation_model_arn = "arn:aws:bedrock:us-east-1:001262071246:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0"
    
    # Build the request parameters
    request_params = {
        "input": {"text": question},
        "retrieveAndGenerateConfiguration": {
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": kb_id,
                "modelArn": foundation_model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults": 5
                    }
                }
            }
        }
    }
    
    # Only add sessionId if it's not None
    if session_id is not None:
        request_params["sessionId"] = session_id
    
    response = bedrock_client.retrieve_and_generate(**request_params)
    
    # Extract session ID from response (new or existing)
    response_session_id = response.get('sessionId')
    
    return {
        'response': response,
        'session_id': response_session_id,
        'output': response['output']['text']
    }

# Example queries
questions = [
    "What are the top 5 customers by order volume?",
    "Show me all orders that are currently pending",
    "Which products have the highest stock quantity?",
    "What's the average order value for products in the electronics category?"
]

# Initialize session ID for conversation continuity
session_id = None

for question in questions:
    result = query_knowledge_base(question, kb_id, session_id)
    session_id = result['session_id']  # Use session ID for next iteration
    
    print(f"Question: {question}")
    print(f"Answer: {result['output']}")
    print(f"Generated SQL: {result['response']['citations'][0]['retrievedReferences'][0]['location']['sqlLocation']['query']}")
    print("-" * 50)

Here, I added the questions in a list, but this could be extended for use with Amazon Lex or a custom chatbot

Step 4: Session Management with DynamoDB

For conversational interfaces, you’ll want to maintain context across multiple questions. Here’s how to implement session tracking:

Session Storage Strategy:

Create a DynamoDB table with session_id as the partition key
Store session metadata, including creation time, last accessed time, and TTL for automatic cleanup
Optionally store conversation history (questions and answers) for audit trails or enhanced context

This approach ensures that users can have natural, contextual conversations with your Knowledge Base while maintaining proper session isolation and automatically cleaning up old data.

Enhanced Context with Curated Queries

To further improve the query engine’s ability to generate accurate SQL queries, you can optionally provide curated query examples in your generationContext. This helps the system understand common query patterns and business scenarios specific to your data.

"queryGenerationConfiguration": {
    "executionTimeoutSeconds": 200,
    "generationContext": {
        "tables": table_configs,
        "curatedQueries": [
            {
                "naturalLanguage": "What are the top 5 customers by order volume?",
                "sql": "SELECT customer_id, COUNT(*) as order_count FROM orders GROUP BY customer_id ORDER BY order_count DESC LIMIT 5"
            },
            {
                "naturalLanguage": "Show me all pending orders",
                "sql": "SELECT * FROM orders WHERE order_status = 'pending'"
            }
        ]
    }
}

Benefits of Curated Queries:

Helps the query engine understand your specific business terminology
Provides examples of common query patterns (aggregations, filtering, joins)
Improves accuracy for domain-specific queries
Reduces the need for query refinement iterations

Conclusion

Amazon Bedrock Knowledge Base with Redshift integration represents a significant advancement in making data accessible to non-technical users. By combining the power of natural language processing with the robustness of Amazon Redshift, organizations can democratize data access, reduce the time to insight, and enhance the user experience with conversational interfaces.

The implementation is straightforward, but the impact on organizational productivity and data utilization can be transformative.

References

AWS Bedrock Knowledge Base API Documentation — Official AWS documentation for creating Knowledge Bases with Bedrock Agent

Author

Noor Sabahi | Senior AI & Cloud Engineer | AWS Ambassador

#Redshift #KnowledgeBase #GenerativeAI #AmazonBedrock #LLM #QueryGeneration

AWS

Redshift

Amazon Bedrock

Sql

Cloud Architecture

Data Engineering

AWS Offerings

Cloud Migration

Data Science & Analytics

Generative AI

MLOps

Financial Services

Energy & Utilities

Healthcare & Life Sciences

Education

Careers

Resource Documents

Leadership

About Us

Texas DBITS

Events