GraphProvider Interface
The GraphProvider interface defines how graph RAG algorithms are implemented in GraphRAG.js.
Overview
A provider controls three key aspects:
- Extraction: How entities and relationships are extracted from documents
- Structure: How the graph is organized and stored
- Retrieval: How context is retrieved during queries
All providers implement the same interface, allowing you to swap algorithms without changing your application code.
Interface Definition
interface GraphProvider<TQueryParams = any, TResult = any> {
type: string;
setupGraph(ctx: ProviderContext): Promise<void>;
extendGraph(chunks: GDocument[]): Promise<void>;
retrieveContext(query: string, params: TQueryParams): Promise<TResult>;
}Methods
setupGraph()
Called once when the graph is initialized. Use this to:
- Set up storage schemas
- Initialize indexes
- Prepare provider state
async setupGraph(ctx: ProviderContext): Promise<void>Parameters:
ctx: ProviderContext- Provider context with storage, models, and configuration
extendGraph()
Called when new documents are inserted. Use this to:
- Extract entities and relationships
- Build graph structures
- Generate embeddings
- Store data
async extendGraph(chunks: GDocument[]): Promise<void>Parameters:
chunks: GDocument[]- Chunked documents to process
retrieveContext()
Called during queries. Use this to:
- Perform vector search
- Traverse the graph
- Assemble context
- Return results
async retrieveContext(query: string, params: TQueryParams): Promise<TResult>Parameters:
query: string- The user's questionparams: TQueryParams- Provider-specific query parameters
Returns:
- Context that will be passed to the LLM for answer generation
Provider Context
The ProviderContext provides access to storage, models, and configuration:
interface ProviderContext {
storage: {
graph: GraphStore;
vector: VectorStore;
kv: KVStore;
};
model: LanguageModel;
embedding: EmbeddingModel;
cheapModel?: LanguageModel;
namespace: string;
domain?: string;
exampleQueries?: string[];
}Fields:
storage- Access to graph, vector, and key-value storesmodel- Main language model for generation and extractionembedding- Embedding model for vector representationscheapModel- Optional cheaper model for summarization tasksnamespace- Multi-tenancy namespacedomain- Natural language domain descriptionexampleQueries- Example queries to guide extraction
Built-in Providers
Similarity Graph
Simple baseline using cosine similarity and BFS expansion.
import { similarityGraph } from "@graphrag-js/similarity";
provider: similarityGraph({
similarityThreshold?: number; // default: 0.7
maxDepth?: number; // default: 2
})How it works:
- Chunks become graph nodes
- Edges created from cosine similarity above threshold
- BFS expansion from seed nodes during retrieval
Best for: Quick prototyping, simple use cases
LightRAG
Dual-level retrieval with entity and relationship vectors.
import { lightrag } from "@graphrag-js/lightrag";
provider: lightrag({
entityTypes?: string[]; // default: ["person", "organization", "location", "event", "concept"]
maxGleanings?: number; // default: 1
summarizeThreshold?: number; // default: 8
summaryMaxTokens?: number; // default: 500
concurrency?: number; // default: 8
topK?: number; // default: 60
maxEntityTokens?: number; // default: 6000
maxRelationTokens?: number; // default: 8000
maxTotalTokens?: number; // default: 30000
})Query modes:
local- Entity-focused retrievalglobal- Relationship-focused retrievalhybrid- Combined (default)naive- Pure vector search
Best for: General purpose, balanced cost/performance
Microsoft GraphRAG
Community detection with hierarchical reports.
import { microsoftGraph } from "@graphrag-js/microsoft";
provider: microsoftGraph({
entityTypes?: string[]; // default: ["person", "organization", "location", "event"]
relationTypes?: string[]; // optional
entityExtractMaxGleaning?: number; // default: 1
entitySummaryMaxTokens?: number; // default: 500
graphClusterAlgorithm?: "leiden" | "louvain"; // default: "leiden"
maxGraphClusterSize?: number; // default: 10
communityReportMaxTokens?: number; // default: 1500
similarityThreshold?: number; // default: 0.7
nodeEmbeddingAlgorithm?: "node2vec"; // default: "node2vec"
})Query modes:
local- Entity neighborhoods + community contextglobal- High-level community reportsnaive- Pure vector search
Best for: Deep thematic analysis, understanding communities
Fast GraphRAG
PageRank-based retrieval without communities.
import { fastGraph } from "@graphrag-js/fast";
provider: fastGraph({
entityTypes?: string[]; // default: auto-detect
domain?: string; // natural language domain description
exampleQueries?: string[]; // help LLM optimize extraction
maxGleanings?: number; // default: 1
concurrency?: number; // default: 8
pagerank?: {
damping?: number; // default: 0.85
maxIterations?: number; // default: 100
tolerance?: number; // default: 1e-6
maxEntities?: number; // default: 128
scoreThreshold?: number; // default: 0.05
};
tokenBudgets?: {
entities?: number; // default: 4000
relations?: number; // default: 3000
chunks?: number; // default: 9000
};
mergePolicy?: {
maxNodeDescriptionSize?: number; // default: 512
edgeMergeThreshold?: number; // default: 5
};
})Query modes:
pagerank- Personalized PageRank expansion (default)naive- Pure vector search
Best for: Fast, cheap, incremental updates
AWS GraphRAG
Fact-centric hierarchical graphs.
import { awsGraph } from "@graphrag-js/aws";
provider: awsGraph({
entityTypes?: string[]; // default: auto-detect
maxGleanings?: number; // default: 1
concurrency?: number; // default: 4
traversal?: {
maxSearchResults?: number; // default: 10
reranker?: "tfidf" | "model"; // default: "tfidf"
};
semantic?: {
beamWidth?: number; // default: 5
maxPaths?: number; // default: 10
diversityWeight?: number; // default: 0.3
};
})Query modes:
traversal- Top-down + bottom-up graph traversalsemantic- Beam search through fact chains
Best for: Multi-hop reasoning, cross-document connections
Creating Custom Providers
You can create custom providers by implementing the GraphProvider interface:
import { GraphProvider, ProviderContext, GDocument } from "@graphrag-js/core";
export function myCustomGraph(config: MyConfig): GraphProvider {
return {
type: "my-custom-graph",
async setupGraph(ctx: ProviderContext) {
// Initialize storage, create indexes, etc.
await ctx.storage.vector.createIndex({
dimension: 1536,
metric: "cosine",
});
},
async extendGraph(chunks: GDocument[]) {
// Extract entities, build graph, create embeddings
for (const chunk of chunks) {
// Your extraction logic
const entities = await extractEntities(chunk);
// Store in graph
for (const entity of entities) {
await this.ctx.storage.graph.upsertNode(entity);
}
// Create embeddings
const embedding = await this.ctx.embedding.embed(chunk.content);
await this.ctx.storage.vector.upsert([{
id: chunk.id,
vector: embedding,
metadata: { chunkId: chunk.id },
}]);
}
},
async retrieveContext(query: string, params: any) {
// Vector search
const queryEmbedding = await this.ctx.embedding.embed(query);
const results = await this.ctx.storage.vector.query(
queryEmbedding,
params.topK || 10
);
// Graph traversal
const expandedNodes = await expandNodes(results);
// Assemble context
return {
context: formatContext(expandedNodes),
metadata: { resultCount: results.length },
};
},
};
}Example: Domain-Specific Provider
import { GraphProvider } from "@graphrag-js/core";
export function medicalGraphProvider(): GraphProvider {
return {
type: "medical-graph",
async setupGraph(ctx) {
// Medical-specific initialization
},
async extendGraph(chunks) {
// Extract medical entities (diseases, drugs, symptoms, etc.)
// Build medical knowledge graph with ICD-10 codes
// Create specialized embeddings for medical terminology
},
async retrieveContext(query, params) {
// Medical-specific retrieval:
// 1. Normalize medical terminology
// 2. Search with medical synonyms
// 3. Include ICD-10 hierarchies
// 4. Return evidence-based context
},
};
}Provider Comparison
| Feature | Similarity | LightRAG | Microsoft | Fast | AWS |
|---|---|---|---|---|---|
| Entity extraction | ❌ | ✅ | ✅ | ✅ | ✅ |
| Relationship extraction | ❌ | ✅ | ✅ | ✅ | ✅ |
| Community detection | ❌ | ❌ | ✅ | ❌ | ❌ |
| Dual-level vectors | ❌ | ✅ | ❌ | ❌ | ❌ |
| PageRank | ❌ | ❌ | ❌ | ✅ | ❌ |
| Fact extraction | ❌ | ❌ | ❌ | ❌ | ✅ |
| Cost | Low | Medium | High | Low | Medium-High |
| Best for | Prototyping | General use | Deep analysis | Fast/cheap | Multi-hop |
Type Safety
TypeScript automatically infers query parameter types:
import { createGraph } from "@graphrag-js/core";
import { lightrag } from "@graphrag-js/lightrag";
const graph = createGraph({
model: openai("gpt-4o-mini"),
embedding: openai.embedding("text-embedding-3-small"),
provider: lightrag(),
});
// TypeScript knows the valid modes
await graph.query("question", { mode: "hybrid" }); // ✅
await graph.query("question", { mode: "invalid" }); // ❌ Type errorBest Practices
1. Choose Based on Use Case
- Prototyping: Start with
similarityGraph() - Production: Use
lightrag()for balanced performance - Deep analysis: Use
microsoftGraph()when you need communities - Cost-sensitive: Use
fastGraph()to minimize LLM calls - Complex queries: Use
awsGraph()for multi-hop reasoning
2. Configure Appropriately
Match provider settings to your data:
provider: lightrag({
entityTypes: ["person", "company", "product"], // Domain-specific
maxGleanings: 1, // More = better quality, higher cost
topK: 60, // More = better recall, higher cost
})3. Test Query Modes
Different modes work better for different questions:
// Specific entity questions → local mode
await graph.query("Who is John Doe?", { mode: "local" });
// Broad thematic questions → global mode
await graph.query("What are the main themes?", { mode: "global" });
// Complex questions → hybrid mode
await graph.query("How do the entities interact?", { mode: "hybrid" });See Also
- Algorithms Overview - Detailed algorithm explanations
- createGraph() - Graph configuration
- Storage Interfaces - Storage backends