Core Concepts

Understanding the key concepts and architecture of GraphRAG.js.

Architecture Overview

GraphRAG.js is built on four core abstractions:

┌─────────────────────────────────────────────────┐
│              Graph (User API)                    │
│  insert() | query() | entities | relations      │
└─────────────────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
  ┌─────────┐  ┌──────────┐  ┌─────────┐
  │ Provider│  │ Storage  │  │   AI    │
  │         │  │          │  │  SDK    │
  └─────────┘  └──────────┘  └─────────┘

1. Graph Class

The Graph class is your main interface. It provides methods for:

Inserting documents to build the knowledge graph
Querying with natural language questions
Accessing entities and relationships
Exporting the graph in various formats

2. Provider

The provider implements the graph RAG algorithm. It determines:

How entities and relationships are extracted
How the graph is structured
How context is retrieved during queries
Which query modes are available

Providers implement the GraphProvider interface and can be swapped without changing your code.

3. Storage

Storage backends persist your data. GraphRAG.js uses three storage types:

GraphStore: Stores entities, relationships, and graph structure
VectorStore: Stores embeddings for similarity search
KVStore: Stores key-value metadata

All three are pluggable and can use different backends.

4. AI SDK Integration

GraphRAG.js uses the Vercel AI SDK for:

LLM calls (answer generation, entity extraction)
Embeddings (vector representations)
Streaming (real-time response generation)

This gives you access to any LLM provider supported by the AI SDK.

The Insert Pipeline

When you insert documents, the following happens:

Input Text
    │
    ▼
Chunking (optional)
    │
    ▼
Provider.extendGraph(chunks)
    │
    ├─> Entity Extraction (via LLM)
    ├─> Relationship Extraction (via LLM)
    ├─> Graph Construction
    ├─> Embedding Generation
    └─> Storage (GraphStore + VectorStore + KVStore)

Chunking

By default, documents are split into chunks of ~1200 tokens with 100-token overlap. You can customize this:

typescript

const graph = createGraph({
  // ...
  chunking: {
    size: 800,
    overlap: 100,
  },
});

Or provide your own chunking function:

typescript

const graph = createGraph({
  // ...
  chunking: {
    fn: (text: string) => {
      // Custom chunking logic
      return chunks;
    },
  },
});

Extraction

Different providers extract different structures:

Similarity: No extraction, chunks become nodes
LightRAG: Entities + relationships with dual-level vectors
Microsoft: Entities + relationships + communities + reports
Fast: Entities + relationships with PageRank
AWS: Chunks → Statements → Facts → Entities (hierarchical)

The Query Pipeline

When you query the graph, the following happens:

Query Text
    │
    ▼
Embed Query
    │
    ▼
Provider.retrieveContext(query, params)
    │
    ├─> Vector Search (initial retrieval)
    ├─> Graph Traversal (expansion)
    ├─> Context Assembly
    └─> Return Context
    │
    ▼
LLM Answer Generation (via AI SDK)
    │
    ▼
Return Response

Query Modes

Different providers support different query modes:

LightRAG:

local: Entity-focused retrieval
global: Relationship-focused retrieval
hybrid: Combined approach (default)
naive: Pure vector search

Microsoft GraphRAG:

local: Entity neighborhoods + community context
global: High-level community reports
naive: Pure vector search

Fast GraphRAG:

pagerank: Personalized PageRank expansion
naive: Pure vector search

AWS GraphRAG:

traversal: Top-down + bottom-up graph traversal
semantic: Beam search through fact chains

Similarity Graph:

Uses maxDepth parameter (0 = vector only, >0 = graph expansion)

Entities and Relationships

Entities

Entities are the nodes in your knowledge graph. They represent:

People
Organizations
Locations
Concepts
Events
Custom types (algorithm-dependent)

Each entity has:

ID: Unique identifier
Name: Human-readable name
Type: Entity type (e.g., "person", "organization")
Description: Summary of what's known about the entity
Metadata: Additional properties

Relationships

Relationships are the edges connecting entities. They capture:

How entities relate to each other
The nature of connections
Directionality (source → target)

Each relationship has:

Source: Origin entity ID
Target: Destination entity ID
Type: Relationship type (e.g., "works_at", "located_in")
Description: Details about the relationship
Strength/Weight: Connection importance (optional)

Storage Abstraction

GraphRAG.js uses three storage interfaces:

GraphStore

Stores graph structure:

typescript

interface GraphStore {
  upsertNode(node: GNode): Promise<void>;
  upsertEdge(edge: GEdge): Promise<void>;
  getNode(id: string): Promise<GNode | null>;
  getNeighbors(id: string, direction?: "in" | "out" | "both"): Promise<GNode[]>;
  query(cypher: string): Promise<any>;
  close(): Promise<void>;
}

VectorStore

Stores embeddings for similarity search:

typescript

interface VectorStore {
  upsert(vectors: VectorRecord[]): Promise<void>;
  query(vector: number[], topK: number, filter?: any): Promise<VectorQueryResult[]>;
  delete(ids: string[]): Promise<void>;
  close(): Promise<void>;
}

KVStore

Stores key-value metadata:

typescript

interface KVStore {
  get(key: string): Promise<any>;
  set(key: string, value: any): Promise<void>;
  delete(key: string): Promise<void>;
  keys(pattern?: string): Promise<string[]>;
  close(): Promise<void>;
}

Provider Interface

Custom providers implement GraphProvider:

typescript

interface GraphProvider<TQueryParams = any, TResult = any> {
  type: string;

  setupGraph(ctx: ProviderContext): Promise<void>;

  extendGraph(chunks: GDocument[]): Promise<void>;

  retrieveContext(query: string, params: TQueryParams): Promise<TResult>;
}

Methods:

setupGraph: Initialize the provider (called once)
extendGraph: Process new chunks and update the graph
retrieveContext: Retrieve context for a query

Provider Context:

typescript

interface ProviderContext {
  storage: {
    graph: GraphStore;
    vector: VectorStore;
    kv: KVStore;
  };
  model: LanguageModel;
  embedding: EmbeddingModel;
  cheapModel?: LanguageModel;
  namespace: string;
  domain?: string;
  exampleQueries?: string[];
}

Multi-Tenancy with Namespaces

Use namespaces to isolate data in shared storage:

typescript

const graph1 = createGraph({
  // ...
  namespace: "project-a",
});

const graph2 = createGraph({
  // ...
  namespace: "project-b",
});

Different namespaces can share the same storage backend without interference.

Context and Provenance

Query results include context and provenance:

typescript

const result = await graph.query("Your question");

// Generated answer
console.log(result.text);

// Source context used
console.log(result.context);

// Token usage
console.log(result.usage);

// Response metadata
console.log(result.metadata);

Best Practices

1. Choose the Right Algorithm

Prototyping: Start with similarityGraph()
General purpose: Use lightrag() (balanced cost/performance)
Deep analysis: Use microsoftGraph() (expensive but thorough)
Fast/cheap: Use fastGraph() (good for large datasets)
Multi-hop reasoning: Use awsGraph() (complex queries)

2. Storage Selection

Development: Use memoryStorage() (fast, no setup)
Production: Use external databases (Neo4j, Qdrant, PostgreSQL)
Hybrid: Mix and match (e.g., Neo4j + Qdrant + Redis)

3. Model Selection

LLM: Use faster models (gpt-4o-mini) for cost efficiency
Embeddings: Use smaller models (text-embedding-3-small) when possible
Cheap model: Provide a cheaper model for summarization tasks

typescript

const graph = createGraph({
  model: openai("gpt-4o-mini"),           // main model
  cheapModel: openai("gpt-4o-mini"),      // for summarization
  embedding: openai.embedding("text-embedding-3-small"),
  // ...
});

4. Chunking Strategy

Small chunks (500-800 tokens): Better precision, more graph nodes
Large chunks (1200-1500 tokens): Better context, fewer nodes
Overlap: 10-15% of chunk size for continuity

5. Query Optimization

Use appropriate query modes for your question type
Use contextOnly: true to inspect retrieved context
Adjust topK and maxDepth based on your needs

Next Steps

Algorithms Overview - Choose an algorithm
API Reference - Detailed API documentation
Storage Options - Configure storage backends

Core Concepts ​

Architecture Overview ​

1. Graph Class ​

2. Provider ​

3. Storage ​

4. AI SDK Integration ​

The Insert Pipeline ​

Chunking ​

Extraction ​

The Query Pipeline ​

Query Modes ​

Entities and Relationships ​

Entities ​

Relationships ​

Storage Abstraction ​

GraphStore ​

VectorStore ​

KVStore ​

Provider Interface ​

Multi-Tenancy with Namespaces ​

Context and Provenance ​

Best Practices ​

1. Choose the Right Algorithm ​

2. Storage Selection ​

3. Model Selection ​

4. Chunking Strategy ​

5. Query Optimization ​

Next Steps ​

Core Concepts

Architecture Overview

1. Graph Class

2. Provider

3. Storage

4. AI SDK Integration

The Insert Pipeline

Chunking

Extraction

The Query Pipeline

Query Modes

Entities and Relationships

Entities

Relationships

Storage Abstraction

GraphStore

VectorStore

KVStore

Provider Interface

Multi-Tenancy with Namespaces

Context and Provenance

Best Practices

1. Choose the Right Algorithm

2. Storage Selection

3. Model Selection

4. Chunking Strategy

5. Query Optimization

Next Steps