Skip to content

GraphProvider Interface

The GraphProvider interface defines how graph RAG algorithms are implemented in GraphRAG.js.

Overview

A provider controls three key aspects:

  1. Extraction: How entities and relationships are extracted from documents
  2. Structure: How the graph is organized and stored
  3. Retrieval: How context is retrieved during queries

All providers implement the same interface, allowing you to swap algorithms without changing your application code.

Interface Definition

typescript
interface GraphProvider<TQueryParams = any, TResult = any> {
  type: string;

  setupGraph(ctx: ProviderContext): Promise<void>;

  extendGraph(chunks: GDocument[]): Promise<void>;

  retrieveContext(query: string, params: TQueryParams): Promise<TResult>;
}

Methods

setupGraph()

Called once when the graph is initialized. Use this to:

  • Set up storage schemas
  • Initialize indexes
  • Prepare provider state
typescript
async setupGraph(ctx: ProviderContext): Promise<void>

Parameters:

  • ctx: ProviderContext - Provider context with storage, models, and configuration

extendGraph()

Called when new documents are inserted. Use this to:

  • Extract entities and relationships
  • Build graph structures
  • Generate embeddings
  • Store data
typescript
async extendGraph(chunks: GDocument[]): Promise<void>

Parameters:

  • chunks: GDocument[] - Chunked documents to process

retrieveContext()

Called during queries. Use this to:

  • Perform vector search
  • Traverse the graph
  • Assemble context
  • Return results
typescript
async retrieveContext(query: string, params: TQueryParams): Promise<TResult>

Parameters:

  • query: string - The user's question
  • params: TQueryParams - Provider-specific query parameters

Returns:

  • Context that will be passed to the LLM for answer generation

Provider Context

The ProviderContext provides access to storage, models, and configuration:

typescript
interface ProviderContext {
  storage: {
    graph: GraphStore;
    vector: VectorStore;
    kv: KVStore;
  };
  model: LanguageModel;
  embedding: EmbeddingModel;
  cheapModel?: LanguageModel;
  namespace: string;
  domain?: string;
  exampleQueries?: string[];
}

Fields:

  • storage - Access to graph, vector, and key-value stores
  • model - Main language model for generation and extraction
  • embedding - Embedding model for vector representations
  • cheapModel - Optional cheaper model for summarization tasks
  • namespace - Multi-tenancy namespace
  • domain - Natural language domain description
  • exampleQueries - Example queries to guide extraction

Built-in Providers

Similarity Graph

Simple baseline using cosine similarity and BFS expansion.

typescript
import { similarityGraph } from "@graphrag-js/similarity";

provider: similarityGraph({
  similarityThreshold?: number;  // default: 0.7
  maxDepth?: number;             // default: 2
})

How it works:

  • Chunks become graph nodes
  • Edges created from cosine similarity above threshold
  • BFS expansion from seed nodes during retrieval

Best for: Quick prototyping, simple use cases

Learn more →


LightRAG

Dual-level retrieval with entity and relationship vectors.

typescript
import { lightrag } from "@graphrag-js/lightrag";

provider: lightrag({
  entityTypes?: string[];        // default: ["person", "organization", "location", "event", "concept"]
  maxGleanings?: number;         // default: 1
  summarizeThreshold?: number;   // default: 8
  summaryMaxTokens?: number;     // default: 500
  concurrency?: number;          // default: 8
  topK?: number;                 // default: 60
  maxEntityTokens?: number;      // default: 6000
  maxRelationTokens?: number;    // default: 8000
  maxTotalTokens?: number;       // default: 30000
})

Query modes:

  • local - Entity-focused retrieval
  • global - Relationship-focused retrieval
  • hybrid - Combined (default)
  • naive - Pure vector search

Best for: General purpose, balanced cost/performance

Learn more →


Microsoft GraphRAG

Community detection with hierarchical reports.

typescript
import { microsoftGraph } from "@graphrag-js/microsoft";

provider: microsoftGraph({
  entityTypes?: string[];                     // default: ["person", "organization", "location", "event"]
  relationTypes?: string[];                   // optional
  entityExtractMaxGleaning?: number;          // default: 1
  entitySummaryMaxTokens?: number;            // default: 500
  graphClusterAlgorithm?: "leiden" | "louvain";  // default: "leiden"
  maxGraphClusterSize?: number;               // default: 10
  communityReportMaxTokens?: number;          // default: 1500
  similarityThreshold?: number;               // default: 0.7
  nodeEmbeddingAlgorithm?: "node2vec";        // default: "node2vec"
})

Query modes:

  • local - Entity neighborhoods + community context
  • global - High-level community reports
  • naive - Pure vector search

Best for: Deep thematic analysis, understanding communities

Learn more →


Fast GraphRAG

PageRank-based retrieval without communities.

typescript
import { fastGraph } from "@graphrag-js/fast";

provider: fastGraph({
  entityTypes?: string[];       // default: auto-detect
  domain?: string;              // natural language domain description
  exampleQueries?: string[];    // help LLM optimize extraction
  maxGleanings?: number;        // default: 1
  concurrency?: number;         // default: 8
  pagerank?: {
    damping?: number;           // default: 0.85
    maxIterations?: number;     // default: 100
    tolerance?: number;         // default: 1e-6
    maxEntities?: number;       // default: 128
    scoreThreshold?: number;    // default: 0.05
  };
  tokenBudgets?: {
    entities?: number;          // default: 4000
    relations?: number;         // default: 3000
    chunks?: number;            // default: 9000
  };
  mergePolicy?: {
    maxNodeDescriptionSize?: number;  // default: 512
    edgeMergeThreshold?: number;      // default: 5
  };
})

Query modes:

  • pagerank - Personalized PageRank expansion (default)
  • naive - Pure vector search

Best for: Fast, cheap, incremental updates

Learn more →


AWS GraphRAG

Fact-centric hierarchical graphs.

typescript
import { awsGraph } from "@graphrag-js/aws";

provider: awsGraph({
  entityTypes?: string[];          // default: auto-detect
  maxGleanings?: number;           // default: 1
  concurrency?: number;            // default: 4
  traversal?: {
    maxSearchResults?: number;     // default: 10
    reranker?: "tfidf" | "model";  // default: "tfidf"
  };
  semantic?: {
    beamWidth?: number;            // default: 5
    maxPaths?: number;             // default: 10
    diversityWeight?: number;      // default: 0.3
  };
})

Query modes:

  • traversal - Top-down + bottom-up graph traversal
  • semantic - Beam search through fact chains

Best for: Multi-hop reasoning, cross-document connections

Learn more →

Creating Custom Providers

You can create custom providers by implementing the GraphProvider interface:

typescript
import { GraphProvider, ProviderContext, GDocument } from "@graphrag-js/core";

export function myCustomGraph(config: MyConfig): GraphProvider {
  return {
    type: "my-custom-graph",

    async setupGraph(ctx: ProviderContext) {
      // Initialize storage, create indexes, etc.
      await ctx.storage.vector.createIndex({
        dimension: 1536,
        metric: "cosine",
      });
    },

    async extendGraph(chunks: GDocument[]) {
      // Extract entities, build graph, create embeddings
      for (const chunk of chunks) {
        // Your extraction logic
        const entities = await extractEntities(chunk);

        // Store in graph
        for (const entity of entities) {
          await this.ctx.storage.graph.upsertNode(entity);
        }

        // Create embeddings
        const embedding = await this.ctx.embedding.embed(chunk.content);
        await this.ctx.storage.vector.upsert([{
          id: chunk.id,
          vector: embedding,
          metadata: { chunkId: chunk.id },
        }]);
      }
    },

    async retrieveContext(query: string, params: any) {
      // Vector search
      const queryEmbedding = await this.ctx.embedding.embed(query);
      const results = await this.ctx.storage.vector.query(
        queryEmbedding,
        params.topK || 10
      );

      // Graph traversal
      const expandedNodes = await expandNodes(results);

      // Assemble context
      return {
        context: formatContext(expandedNodes),
        metadata: { resultCount: results.length },
      };
    },
  };
}

Example: Domain-Specific Provider

typescript
import { GraphProvider } from "@graphrag-js/core";

export function medicalGraphProvider(): GraphProvider {
  return {
    type: "medical-graph",

    async setupGraph(ctx) {
      // Medical-specific initialization
    },

    async extendGraph(chunks) {
      // Extract medical entities (diseases, drugs, symptoms, etc.)
      // Build medical knowledge graph with ICD-10 codes
      // Create specialized embeddings for medical terminology
    },

    async retrieveContext(query, params) {
      // Medical-specific retrieval:
      // 1. Normalize medical terminology
      // 2. Search with medical synonyms
      // 3. Include ICD-10 hierarchies
      // 4. Return evidence-based context
    },
  };
}

Provider Comparison

FeatureSimilarityLightRAGMicrosoftFastAWS
Entity extraction
Relationship extraction
Community detection
Dual-level vectors
PageRank
Fact extraction
CostLowMediumHighLowMedium-High
Best forPrototypingGeneral useDeep analysisFast/cheapMulti-hop

Type Safety

TypeScript automatically infers query parameter types:

typescript
import { createGraph } from "@graphrag-js/core";
import { lightrag } from "@graphrag-js/lightrag";

const graph = createGraph({
  model: openai("gpt-4o-mini"),
  embedding: openai.embedding("text-embedding-3-small"),
  provider: lightrag(),
});

// TypeScript knows the valid modes
await graph.query("question", { mode: "hybrid" });  // ✅
await graph.query("question", { mode: "invalid" }); // ❌ Type error

Best Practices

1. Choose Based on Use Case

  • Prototyping: Start with similarityGraph()
  • Production: Use lightrag() for balanced performance
  • Deep analysis: Use microsoftGraph() when you need communities
  • Cost-sensitive: Use fastGraph() to minimize LLM calls
  • Complex queries: Use awsGraph() for multi-hop reasoning

2. Configure Appropriately

Match provider settings to your data:

typescript
provider: lightrag({
  entityTypes: ["person", "company", "product"],  // Domain-specific
  maxGleanings: 1,                               // More = better quality, higher cost
  topK: 60,                                      // More = better recall, higher cost
})

3. Test Query Modes

Different modes work better for different questions:

typescript
// Specific entity questions → local mode
await graph.query("Who is John Doe?", { mode: "local" });

// Broad thematic questions → global mode
await graph.query("What are the main themes?", { mode: "global" });

// Complex questions → hybrid mode
await graph.query("How do the entities interact?", { mode: "hybrid" });

See Also

Released under the Elastic License 2.0.