General

Scaling NL-to-SQL Agents with GraphRAG on AWS

Introduction

This article addresses a specific problem in natural language to SQL generation: avoiding context overflow and LLM confusion when generating SQL queries against databases with a large number of tables and complex foreign key dependencies.

The described solution uses GraphRAG to improve LLM accuracy by eliminating guesswork and reducing context pollution.

The Problem

Consider: an RDS PostgreSQL instance on AWS. A single database contains multiple schemas (one per department or customer). Each schema has 100+ tables with foreign key dependencies between them. The goal is to build an agent that converts natural language questions into SQL and retrieves the correct data.

The agent has a fixed set of tools:

List tables in a schema
Inspect table metadata (column types, descriptions)
Sample rows from a table to understand the data
Execute a generated SQL query

Now imagine the schema has tables with cryptic names like rm_product_ts or l_finishes. How should the agent decide which tables to query? Should it inspect all 100+ tables? How does it discover the join paths between them? It could try and fail — get an error, adjust, and eventually figure it out. This works, but leads to real problems: significant slowdown from excessive tool calls and retries, context overflow from loading metadata for dozens of irrelevant tables, and missed dependencies that produce incorrect JOINs.

The Solution

GraphRAG offers a straightforward way to solve this. The idea is to use vector similarity search combined with a foreign key relationship graph to navigate table metadata intelligently.

Here is how it works in practice.

In the same database schema, we create a pgvector table to store embeddings of table metadata. Each embedding captures a rich description: the table name, its description, column names, data types, and column comments. When a table is created, updated, or dropped, its embedding is created, updated, or deleted accordingly.

cursor.execute(f"""
    INSERT INTO {table} (table_name, description, embedding)
    VALUES (%s, %s, %s::vector)
    ON CONFLICT (table_name)
    DO UPDATE SET description = EXCLUDED.description, embedding = EXCLUDED.embedding
""", (table_name, description, str(embedding)))
conn.commit()

This upserts each table’s metadata into the pgvector embeddings table. The description text and its vector embedding are inserted together — and if the table already has an entry, both are updated in place, keeping the embeddings always in sync with the actual schema.

When the agent container starts, it builds an in-memory graph of foreign key relationships by reading all FK constraints from information_schema. The graph is a simple structure: nodes are tables, edges are FK constraints.

cursor.execute("""
        SELECT
            tc.table_name       AS source_table,
            ccu.table_name      AS target_table
        FROM information_schema.table_constraints tc
        JOIN information_schema.key_column_usage kcu
            ON tc.constraint_name = kcu.constraint_name
            AND tc.table_schema = kcu.table_schema
        JOIN information_schema.constraint_column_usage ccu
            ON tc.constraint_name = ccu.constraint_name
            AND tc.table_schema = ccu.table_schema
        WHERE tc.constraint_type = 'FOREIGN KEY'
            AND tc.table_schema = %s
    """, (schema,))

Each row gives a source → target pair, which maps directly to a graph edge. The two joins are needed because the FK metadata is split across three information_schema views: table_constraints identifies the source table, constraint_column_usage identifies the target table, and key_column_usage ties them together at the column level. This graph is cached for the lifetime of the process. (Note: this can also be implemented with a dedicated graph database — more on that below.)

The graph needs to be refreshed in the situations: when a table is created /modified/dropped, and when a FK constraint is added/modified/removed. In these cases, a request is sent to the agent’s API to rebuild the graph.

When a user asks a question, the agent performs a similarity search against the vector table and retrieves the most relevant tables:

cursor.execute(f"""
    SELECT table_name, 1 - (embedding <=> %s::vector) AS similarity
    FROM {qualified}
    ORDER BY embedding <=> %s::vector
    LIMIT %s
""", (str(query_embedding), str(query_embedding), top_k))

The <=> operator is pgvector’s cosine distance — ordering by it gives us the closest matches. With top_k=5, only the five most relevant tables are returned as seed tables.

The agent then traverses the FK graph from those seeds to discover dependent tables needed for correct JOINs:

frontier = set(seed_tables)
for _ in range(max_hops):
    next_frontier = set()
    for table in frontier:
        neighbors = graph.get(table, set())
        new_neighbors = neighbors - expanded
        next_frontier.update(new_neighbors)
        expanded.update(new_neighbors)
    frontier = next_frontier
    if not frontier:
        break

This is a simple breadth-first traversal — at each hop, it follows FK edges to find neighboring tables not yet in the set. With max_hops=2, it captures both direct and second-degree relationships. Finally, the agent loads column schemas only for this filtered set of tables into the LLM context. The result is a faster, more accurate agent that works reliably even with large, complex schemas.

Challenges and alternative approaches

The main challenge with the in-memory graph approach is cache invalidation. When the database schema changes — a table is created or dropped, a foreign key is added or removed — the running agents’ in-memory graph becomes stale. The agent must be notified to rebuild it, either through an API call or a container restart. In a multi-container deployment, every instance needs to be notified independently, since each holds its own copy of the graph.

Multi-tenancy adds another dimension. Each schema represents a separate tenant with its own tables and relationships, requiring a separate graph per schema. The in-memory cache must be keyed by schema and invalidated independently — a DDL change in one tenant’s schema should not force a rebuild for all others.

An alternative is to offload graph management to a dedicated graph database. AWS Neptune can store FK relationships as a persistent, shared graph, eliminating per-container cache management. Combined with pgvector for similarity search, this gives you a two-system architecture where pgvector handles embeddings and Neptune handles graph traversal. Multiple agent containers share the same graph without synchronization concerns.

A more unified alternative is Neptune Analytics, which supports both vector similarity search and graph traversal in a single system. This eliminates the need for pgvector entirely: embeddings and relationships live in one place, with no cross-system synchronization required. It is the most expensive option, but also the lowest maintenance: one system to update, no cache invalidation, no split between vector and graph stores.

Conclusion

Large schemas with many tables and complex foreign key relationships present a real challenge for NL-to-SQL agents: context overflow, incorrect table selection, and missed join paths. GraphRAG addresses this by combining vector similarity search over table metadata with foreign key graph traversal, narrowing the schema to only the tables relevant to each query.

This article provided a high-level overview of the problem and one practical solution using AWS services: RDS PostgreSQL with pgvector for embeddings, Bedrock AgentCore for the agent runtime, and Neptune Database or Neptune Analytics as alternative approaches for graph management. The choice between in-memory graphs, Neptune Database, and Neptune Analytics comes down to a tradeoff between cost, maintenance effort, and multi-container consistency.

The core idea is simple: don’t ask the LLM to find a needle in a haystack. Use semantic search and graph traversal to hand it only the needles it needs.

Don’t hesitate to reach out if you would like to discuss GraphRAG or similar topics with us!

AWS

Retrieval Augmented Gen

postgresql

database

Cloud Architecture

Data Engineering

AWS Competency

Cloud Migration

Data Science & Analytics

Generative AI

MLOps

AI Path to Production

Production-Grade Agentic Solutions

Migration and Operation

Enterprise Adoption

Enterprise Intelligence

Financial Services

Energy & Utilities

Healthcare & Life Sciences

Education

Careers

Resource Documents

Leadership

About Us

Texas DBITS

Events

News

Scaling NL-to-SQL Agents with GraphRAG on AWS

Introduction

The Problem

The Solution

Challenges and alternative approaches

Conclusion

Table of Contents

Services

Multipliers

Industries

About