GraphProvider Interface

The GraphProvider interface defines how graph RAG algorithms are implemented in GraphRAG.js.

Overview

A provider controls three key aspects:

Extraction: How entities and relationships are extracted from documents
Structure: How the graph is organized and stored
Retrieval: How context is retrieved during queries

All providers implement the same interface, allowing you to swap algorithms without changing your application code.

Interface Definition

typescript

interface GraphProvider<TQueryParams = any, TResult = any> {
  type: string;

  setupGraph(ctx: ProviderContext): Promise<void>;

  extendGraph(chunks: GDocument[]): Promise<void>;

  retrieveContext(query: string, params: TQueryParams): Promise<TResult>;
}

Methods

`setupGraph()`

Called once when the graph is initialized. Use this to:

Set up storage schemas
Initialize indexes
Prepare provider state

typescript

async setupGraph(ctx: ProviderContext): Promise<void>

Parameters:

ctx: ProviderContext - Provider context with storage, models, and configuration

`extendGraph()`

Called when new documents are inserted. Use this to:

Extract entities and relationships
Build graph structures
Generate embeddings
Store data

typescript

async extendGraph(chunks: GDocument[]): Promise<void>

Parameters:

chunks: GDocument[] - Chunked documents to process

`retrieveContext()`

Called during queries. Use this to:

Perform vector search
Traverse the graph
Assemble context
Return results

typescript

async retrieveContext(query: string, params: TQueryParams): Promise<TResult>

Parameters:

query: string - The user's question
params: TQueryParams - Provider-specific query parameters

Returns:

Context that will be passed to the LLM for answer generation

Provider Context

The ProviderContext provides access to storage, models, and configuration:

typescript

interface ProviderContext {
  storage: {
    graph: GraphStore;
    vector: VectorStore;
    kv: KVStore;
  };
  model: LanguageModel;
  embedding: EmbeddingModel;
  cheapModel?: LanguageModel;
  namespace: string;
  domain?: string;
  exampleQueries?: string[];
}

Fields:

storage - Access to graph, vector, and key-value stores
model - Main language model for generation and extraction
embedding - Embedding model for vector representations
cheapModel - Optional cheaper model for summarization tasks
namespace - Multi-tenancy namespace
domain - Natural language domain description
exampleQueries - Example queries to guide extraction

Built-in Providers

Similarity Graph

Simple baseline using cosine similarity and BFS expansion.

typescript

import { similarityGraph } from "@graphrag-js/similarity";

provider: similarityGraph({
  similarityThreshold?: number;  // default: 0.7
  maxDepth?: number;             // default: 2
})

How it works:

Chunks become graph nodes
Edges created from cosine similarity above threshold
BFS expansion from seed nodes during retrieval

Best for: Quick prototyping, simple use cases

Learn more →

LightRAG

Dual-level retrieval with entity and relationship vectors.

typescript

import { lightrag } from "@graphrag-js/lightrag";

provider: lightrag({
  entityTypes?: string[];        // default: ["person", "organization", "location", "event", "concept"]
  maxGleanings?: number;         // default: 1
  summarizeThreshold?: number;   // default: 8
  summaryMaxTokens?: number;     // default: 500
  concurrency?: number;          // default: 8
  topK?: number;                 // default: 60
  maxEntityTokens?: number;      // default: 6000
  maxRelationTokens?: number;    // default: 8000
  maxTotalTokens?: number;       // default: 30000
})

Query modes:

local - Entity-focused retrieval
global - Relationship-focused retrieval
hybrid - Combined (default)
naive - Pure vector search

Best for: General purpose, balanced cost/performance

Learn more →

Microsoft GraphRAG

Community detection with hierarchical reports.

typescript

import { microsoftGraph } from "@graphrag-js/microsoft";

provider: microsoftGraph({
  entityTypes?: string[];                     // default: ["person", "organization", "location", "event"]
  relationTypes?: string[];                   // optional
  entityExtractMaxGleaning?: number;          // default: 1
  entitySummaryMaxTokens?: number;            // default: 500
  graphClusterAlgorithm?: "leiden" | "louvain";  // default: "leiden"
  maxGraphClusterSize?: number;               // default: 10
  communityReportMaxTokens?: number;          // default: 1500
  similarityThreshold?: number;               // default: 0.7
  nodeEmbeddingAlgorithm?: "node2vec";        // default: "node2vec"
})

Query modes:

local - Entity neighborhoods + community context
global - High-level community reports
naive - Pure vector search

Best for: Deep thematic analysis, understanding communities

Learn more →

Fast GraphRAG

PageRank-based retrieval without communities.

typescript

import { fastGraph } from "@graphrag-js/fast";

provider: fastGraph({
  entityTypes?: string[];       // default: auto-detect
  domain?: string;              // natural language domain description
  exampleQueries?: string[];    // help LLM optimize extraction
  maxGleanings?: number;        // default: 1
  concurrency?: number;         // default: 8
  pagerank?: {
    damping?: number;           // default: 0.85
    maxIterations?: number;     // default: 100
    tolerance?: number;         // default: 1e-6
    maxEntities?: number;       // default: 128
    scoreThreshold?: number;    // default: 0.05
  };
  tokenBudgets?: {
    entities?: number;          // default: 4000
    relations?: number;         // default: 3000
    chunks?: number;            // default: 9000
  };
  mergePolicy?: {
    maxNodeDescriptionSize?: number;  // default: 512
    edgeMergeThreshold?: number;      // default: 5
  };
})

Query modes:

pagerank - Personalized PageRank expansion (default)
naive - Pure vector search

Best for: Fast, cheap, incremental updates

Learn more →

AWS GraphRAG

Fact-centric hierarchical graphs.

typescript

import { awsGraph } from "@graphrag-js/aws";

provider: awsGraph({
  entityTypes?: string[];          // default: auto-detect
  maxGleanings?: number;           // default: 1
  concurrency?: number;            // default: 4
  traversal?: {
    maxSearchResults?: number;     // default: 10
    reranker?: "tfidf" | "model";  // default: "tfidf"
  };
  semantic?: {
    beamWidth?: number;            // default: 5
    maxPaths?: number;             // default: 10
    diversityWeight?: number;      // default: 0.3
  };
})

Query modes:

traversal - Top-down + bottom-up graph traversal
semantic - Beam search through fact chains

Best for: Multi-hop reasoning, cross-document connections

Learn more →

Creating Custom Providers

You can create custom providers by implementing the GraphProvider interface:

typescript

import { GraphProvider, ProviderContext, GDocument } from "@graphrag-js/core";

export function myCustomGraph(config: MyConfig): GraphProvider {
  return {
    type: "my-custom-graph",

    async setupGraph(ctx: ProviderContext) {
      // Initialize storage, create indexes, etc.
      await ctx.storage.vector.createIndex({
        dimension: 1536,
        metric: "cosine",
      });
    },

    async extendGraph(chunks: GDocument[]) {
      // Extract entities, build graph, create embeddings
      for (const chunk of chunks) {
        // Your extraction logic
        const entities = await extractEntities(chunk);

        // Store in graph
        for (const entity of entities) {
          await this.ctx.storage.graph.upsertNode(entity);
        }

        // Create embeddings
        const embedding = await this.ctx.embedding.embed(chunk.content);
        await this.ctx.storage.vector.upsert([{
          id: chunk.id,
          vector: embedding,
          metadata: { chunkId: chunk.id },
        }]);
      }
    },

    async retrieveContext(query: string, params: any) {
      // Vector search
      const queryEmbedding = await this.ctx.embedding.embed(query);
      const results = await this.ctx.storage.vector.query(
        queryEmbedding,
        params.topK || 10
      );

      // Graph traversal
      const expandedNodes = await expandNodes(results);

      // Assemble context
      return {
        context: formatContext(expandedNodes),
        metadata: { resultCount: results.length },
      };
    },
  };
}

Example: Domain-Specific Provider

typescript

import { GraphProvider } from "@graphrag-js/core";

export function medicalGraphProvider(): GraphProvider {
  return {
    type: "medical-graph",

    async setupGraph(ctx) {
      // Medical-specific initialization
    },

    async extendGraph(chunks) {
      // Extract medical entities (diseases, drugs, symptoms, etc.)
      // Build medical knowledge graph with ICD-10 codes
      // Create specialized embeddings for medical terminology
    },

    async retrieveContext(query, params) {
      // Medical-specific retrieval:
      // 1. Normalize medical terminology
      // 2. Search with medical synonyms
      // 3. Include ICD-10 hierarchies
      // 4. Return evidence-based context
    },
  };
}

Provider Comparison

Feature	Similarity	LightRAG	Microsoft	Fast	AWS
Entity extraction	❌	✅	✅	✅	✅
Relationship extraction	❌	✅	✅	✅	✅
Community detection	❌	❌	✅	❌	❌
Dual-level vectors	❌	✅	❌	❌	❌
PageRank	❌	❌	❌	✅	❌
Fact extraction	❌	❌	❌	❌	✅
Cost	Low	Medium	High	Low	Medium-High
Best for	Prototyping	General use	Deep analysis	Fast/cheap	Multi-hop

Type Safety

TypeScript automatically infers query parameter types:

typescript

import { createGraph } from "@graphrag-js/core";
import { lightrag } from "@graphrag-js/lightrag";

const graph = createGraph({
  model: openai("gpt-4o-mini"),
  embedding: openai.embedding("text-embedding-3-small"),
  provider: lightrag(),
});

// TypeScript knows the valid modes
await graph.query("question", { mode: "hybrid" });  // ✅
await graph.query("question", { mode: "invalid" }); // ❌ Type error

Best Practices

1. Choose Based on Use Case

Prototyping: Start with similarityGraph()
Production: Use lightrag() for balanced performance
Deep analysis: Use microsoftGraph() when you need communities
Cost-sensitive: Use fastGraph() to minimize LLM calls
Complex queries: Use awsGraph() for multi-hop reasoning

2. Configure Appropriately

Match provider settings to your data:

typescript

provider: lightrag({
  entityTypes: ["person", "company", "product"],  // Domain-specific
  maxGleanings: 1,                               // More = better quality, higher cost
  topK: 60,                                      // More = better recall, higher cost
})

3. Test Query Modes

Different modes work better for different questions:

typescript

// Specific entity questions → local mode
await graph.query("Who is John Doe?", { mode: "local" });

// Broad thematic questions → global mode
await graph.query("What are the main themes?", { mode: "global" });

// Complex questions → hybrid mode
await graph.query("How do the entities interact?", { mode: "hybrid" });

GraphProvider Interface ​

Overview ​

Interface Definition ​

Methods ​

setupGraph() ​

extendGraph() ​

retrieveContext() ​

Provider Context ​

Built-in Providers ​

Similarity Graph ​

LightRAG ​

Microsoft GraphRAG ​

Fast GraphRAG ​

AWS GraphRAG ​

Creating Custom Providers ​

Example: Domain-Specific Provider ​

Provider Comparison ​

Type Safety ​

Best Practices ​

1. Choose Based on Use Case ​

2. Configure Appropriately ​

3. Test Query Modes ​

See Also ​

GraphProvider Interface

Overview

Interface Definition

Methods

`setupGraph()`

`extendGraph()`

`retrieveContext()`

Provider Context

Built-in Providers

Similarity Graph

LightRAG

Microsoft GraphRAG

Fast GraphRAG

AWS GraphRAG

Creating Custom Providers

Example: Domain-Specific Provider

Provider Comparison

Type Safety

Best Practices

1. Choose Based on Use Case

2. Configure Appropriately

3. Test Query Modes

See Also