DozerDB Graph Storage
The @graphrag-js/dozerdb package provides production-ready graph storage using DozerDB, an open-source Neo4j-compatible graph database.
Installation
pnpm add @graphrag-js/dozerdbFeatures
- Neo4j-Compatible - Uses Bolt protocol and Cypher query language
- Label Propagation Clustering - Built-in community detection (no GDS required)
- Optional GDS Support - Leiden/Louvain algorithms when GDS is installed
- ACID Transactions - Data consistency guarantees
- Multi-Tenant Support - Label-based namespace isolation
- Open Source - Fully open-source database
Prerequisites
DozerDB Database
You need a running DozerDB instance:
Option 1: Docker (Recommended)
docker run -d \
--name dozerdb \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=dozerdb/password \
graphstack/dozerdb:latestOption 2: Docker Compose
version: '3.8'
services:
dozerdb:
image: graphstack/dozerdb:latest
ports:
- "7474:7474"
- "7687:7687"
environment:
NEO4J_AUTH: dozerdb/password
volumes:
- dozerdb-data:/data
volumes:
dozerdb-data:Verify Connection
Connect using the Neo4j Browser at http://localhost:7474 or via Cypher:
RETURN "DozerDB connected!" AS messageQuick Start
import { createGraph } from '@graphrag-js/core';
import { microsoftGraph } from '@graphrag-js/microsoft';
import { dozerDBGraph } from '@graphrag-js/dozerdb';
import { openai } from '@ai-sdk/openai';
const graph = createGraph({
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
provider: microsoftGraph(),
storage: {
graph: dozerDBGraph({
url: 'bolt://localhost:7687',
username: 'dozerdb',
password: 'password',
database: 'neo4j',
hasGDS: false, // Set true if GDS plugin is installed
}),
}
});
await graph.insert('Your documents...');
const result = await graph.query('Your question?');Configuration
dozerDBGraph(config)
interface DozerDBGraphConfig {
url: string; // DozerDB connection URL (bolt://host:port)
username: string; // Database username
password: string; // Database password
database?: string; // Database name (default: 'neo4j')
workingDir?: string; // Namespace prefix (default: 'default')
maxGraphClusterSize?: number; // Max clustering levels (default: 10)
graphClusterSeed?: number; // Clustering random seed (default: 42)
hasGDS?: boolean; // GDS plugin available (default: false)
}Connection URL Formats
// Local instance
url: 'bolt://localhost:7687'
// Remote instance
url: 'bolt://my-dozerdb-server.com:7687'
// Custom port
url: 'bolt://my-server.com:17687'Usage Examples
Basic Graph Operations
import { dozerDBGraph } from '@graphrag-js/dozerdb';
const graphStore = dozerDBGraph({
url: 'bolt://localhost:7687',
username: 'dozerdb',
password: 'password',
})('my-namespace');
// Add nodes
await graphStore.upsertNode('entity-1', {
entity_type: 'person',
description: 'John Doe, software engineer',
source_id: 'doc-1',
});
// Add edges
await graphStore.upsertEdge('entity-1', 'entity-2', {
relationship: 'works_with',
weight: 0.8,
description: 'Collaborates on projects',
});
// Run label propagation clustering (no GDS required)
await graphStore.clustering('label_propagation');
// Get community structure
const communities = await graphStore.communitySchema();With Microsoft GraphRAG
DozerDB works well with Microsoft GraphRAG's community detection:
import { microsoftGraph } from '@graphrag-js/microsoft';
import { dozerDBGraph } from '@graphrag-js/dozerdb';
const graph = createGraph({
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
provider: microsoftGraph({
entityTypes: ['person', 'organization', 'location', 'event'],
graphClusterAlgorithm: 'label_propagation', // Works without GDS
maxGraphClusterSize: 10,
}),
storage: {
graph: dozerDBGraph({
url: 'bolt://localhost:7687',
username: 'dozerdb',
password: 'password',
}),
}
});With GDS Plugin
If you have the GDS plugin installed, you can use Leiden clustering:
const graphStore = dozerDBGraph({
url: 'bolt://localhost:7687',
username: 'dozerdb',
password: 'password',
hasGDS: true, // Enable GDS features
})('my-namespace');
// Now you can use Leiden clustering
await graphStore.clustering('leiden');
// And node embeddings
const [embeddings, nodeIds] = await graphStore.embedNodes('node2vec');Community Detection
Label Propagation (Default)
Label propagation works without any additional plugins:
await graphStore.clustering('label_propagation');
const communities = await graphStore.communitySchema();
// Structure:
{
"community-123": {
level: 0,
title: "Cluster 123",
nodes: ["entity-1", "entity-2", ...],
edges: [["entity-1", "entity-2"], ...],
chunkIds: ["doc-1", "doc-2", ...],
occurrence: 0.85,
subCommunities: [],
}
}Leiden Algorithm (Requires GDS)
When GDS is available, Leiden provides hierarchical clustering:
const graphStore = dozerDBGraph({
// ... config
hasGDS: true,
maxGraphClusterSize: 10,
graphClusterSeed: 42,
})('my-namespace');
await graphStore.clustering('leiden');Supported Algorithms
| Algorithm | Requires GDS | Hierarchical | Best For |
|---|---|---|---|
label_propagation | No | No | General use, no setup |
leiden | Yes | Yes | High-quality communities |
louvain | Yes | Yes | Large graphs |
Node Embeddings (GDS Required)
When GDS is installed, you can generate graph-based node embeddings:
const graphStore = dozerDBGraph({
// ... config
hasGDS: true,
})('my-namespace');
// Node2Vec embeddings
const [embeddings, nodeIds] = await graphStore.embedNodes('node2vec');
// FastRP embeddings (faster)
const [embeddings, nodeIds] = await graphStore.embedNodes('fastRP');Advanced Features
Knowledge Graph Extraction
const kg = await graphStore.getKnowledgeGraph(
'person', // Node label to start from
2, // Max depth
1, // Min degree (connectivity)
true // Include nodes at exactly minDegree
);
// Returns:
{
nodes: [
{ id: "entity-1", labels: ["person"], properties: {...} },
...
],
edges: [
{ id: "edge-1", type: "RELATED", source: "entity-1", target: "entity-2", properties: {...} },
...
]
}Direct Cypher Queries
For advanced use cases, access the driver directly:
import neo4j from 'neo4j-driver';
const driver = neo4j.driver(
'bolt://localhost:7687',
neo4j.auth.basic('dozerdb', 'password')
);
const session = driver.session();
const result = await session.run(`
MATCH (n:person)-[r:works_with]->(m:person)
WHERE n.id = $nodeId
RETURN m.id AS colleague
`, { nodeId: 'entity-1' });
await session.close();
await driver.close();Neo4j vs DozerDB
| Feature | Neo4j | DozerDB |
|---|---|---|
| Protocol | Bolt | Bolt |
| Query Language | Cypher | Cypher |
| GDS Support | Built-in (Enterprise) | Optional plugin |
| License | GPL / Commercial | Open Source |
| Clustering | Required for GDS | Label propagation built-in |
DozerDB is a great choice when you need:
- A fully open-source graph database
- Neo4j compatibility without licensing concerns
- Basic community detection without GDS
Use Neo4j when you need:
- Advanced GDS algorithms (PageRank, centrality, etc.)
- Enterprise support
- Neo4j Aura cloud hosting
Performance Optimization
Indexes
DozerDB automatically creates indexes on node IDs. For better performance:
-- Index on entity descriptions
CREATE INDEX entity_description
FOR (n:Node)
ON (n.description)
-- Composite index for frequent queries
CREATE INDEX entity_type_source
FOR (n:Node)
ON (n.entity_type, n.source_id)Query Optimization
-- Use EXPLAIN to analyze query plans
EXPLAIN MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m
-- Use PROFILE for execution stats
PROFILE MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN mProduction Deployment
Docker Compose
version: '3.8'
services:
dozerdb:
image: graphstack/dozerdb:latest
ports:
- "7474:7474"
- "7687:7687"
environment:
NEO4J_AUTH: dozerdb/your-secure-password
NEO4J_dbms_memory_heap_initial__size: 2G
NEO4J_dbms_memory_heap_max__size: 4G
NEO4J_dbms_memory_pagecache_size: 2G
volumes:
- dozerdb-data:/data
- dozerdb-logs:/logs
volumes:
dozerdb-data:
dozerdb-logs:Memory Configuration
For production workloads:
# Heap size (for query execution)
NEO4J_dbms_memory_heap_initial_size=4G
NEO4J_dbms_memory_heap_max_size=8G
# Page cache (for graph data)
NEO4J_dbms_memory_pagecache_size=4GTroubleshooting
Connection Refused
Error: ServiceUnavailable: Connection refused
Solution:
- Verify DozerDB is running:
docker ps - Check port mapping:
7687for Bolt - Test with Browser at
http://localhost:7474
Clustering Algorithm Not Supported
Error: Clustering algorithm leiden not supported
Solution:
- Use
label_propagationinstead (no GDS required) - Or install GDS plugin and set
hasGDS: true
Out of Memory
Error: OutOfMemoryError: Java heap space
Solution:
- Increase heap size in Docker environment variables
- Use pagination for large result sets
- Add LIMIT clauses to queries
Cost Considerations
| Deployment | Cost | Best For |
|---|---|---|
| Self-hosted (Docker) | Free + hosting | All use cases |
| Kubernetes | Free + infrastructure | Production scale |
| Managed cloud | Variable | Simplified operations |
DozerDB is fully open-source, so there are no licensing fees regardless of scale.
Next Steps
- Neo4j Storage - For GDS integration
- FalkorDB Storage - Redis-based alternative
- Qdrant Storage - For vector search
- Microsoft GraphRAG Algorithm