Skip to content

DozerDB Graph Storage

The @graphrag-js/dozerdb package provides production-ready graph storage using DozerDB, an open-source Neo4j-compatible graph database.

Installation

bash
pnpm add @graphrag-js/dozerdb

Features

  • Neo4j-Compatible - Uses Bolt protocol and Cypher query language
  • Label Propagation Clustering - Built-in community detection (no GDS required)
  • Optional GDS Support - Leiden/Louvain algorithms when GDS is installed
  • ACID Transactions - Data consistency guarantees
  • Multi-Tenant Support - Label-based namespace isolation
  • Open Source - Fully open-source database

Prerequisites

DozerDB Database

You need a running DozerDB instance:

Option 1: Docker (Recommended)

bash
docker run -d \
  --name dozerdb \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=dozerdb/password \
  graphstack/dozerdb:latest

Option 2: Docker Compose

yaml
version: '3.8'
services:
  dozerdb:
    image: graphstack/dozerdb:latest
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: dozerdb/password
    volumes:
      - dozerdb-data:/data

volumes:
  dozerdb-data:

Verify Connection

Connect using the Neo4j Browser at http://localhost:7474 or via Cypher:

cypher
RETURN "DozerDB connected!" AS message

Quick Start

typescript
import { createGraph } from '@graphrag-js/core';
import { microsoftGraph } from '@graphrag-js/microsoft';
import { dozerDBGraph } from '@graphrag-js/dozerdb';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph(),
  storage: {
    graph: dozerDBGraph({
      url: 'bolt://localhost:7687',
      username: 'dozerdb',
      password: 'password',
      database: 'neo4j',
      hasGDS: false, // Set true if GDS plugin is installed
    }),
  }
});

await graph.insert('Your documents...');
const result = await graph.query('Your question?');

Configuration

dozerDBGraph(config)

typescript
interface DozerDBGraphConfig {
  url: string;                    // DozerDB connection URL (bolt://host:port)
  username: string;               // Database username
  password: string;               // Database password
  database?: string;              // Database name (default: 'neo4j')
  workingDir?: string;            // Namespace prefix (default: 'default')
  maxGraphClusterSize?: number;   // Max clustering levels (default: 10)
  graphClusterSeed?: number;      // Clustering random seed (default: 42)
  hasGDS?: boolean;               // GDS plugin available (default: false)
}

Connection URL Formats

typescript
// Local instance
url: 'bolt://localhost:7687'

// Remote instance
url: 'bolt://my-dozerdb-server.com:7687'

// Custom port
url: 'bolt://my-server.com:17687'

Usage Examples

Basic Graph Operations

typescript
import { dozerDBGraph } from '@graphrag-js/dozerdb';

const graphStore = dozerDBGraph({
  url: 'bolt://localhost:7687',
  username: 'dozerdb',
  password: 'password',
})('my-namespace');

// Add nodes
await graphStore.upsertNode('entity-1', {
  entity_type: 'person',
  description: 'John Doe, software engineer',
  source_id: 'doc-1',
});

// Add edges
await graphStore.upsertEdge('entity-1', 'entity-2', {
  relationship: 'works_with',
  weight: 0.8,
  description: 'Collaborates on projects',
});

// Run label propagation clustering (no GDS required)
await graphStore.clustering('label_propagation');

// Get community structure
const communities = await graphStore.communitySchema();

With Microsoft GraphRAG

DozerDB works well with Microsoft GraphRAG's community detection:

typescript
import { microsoftGraph } from '@graphrag-js/microsoft';
import { dozerDBGraph } from '@graphrag-js/dozerdb';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph({
    entityTypes: ['person', 'organization', 'location', 'event'],
    graphClusterAlgorithm: 'label_propagation', // Works without GDS
    maxGraphClusterSize: 10,
  }),
  storage: {
    graph: dozerDBGraph({
      url: 'bolt://localhost:7687',
      username: 'dozerdb',
      password: 'password',
    }),
  }
});

With GDS Plugin

If you have the GDS plugin installed, you can use Leiden clustering:

typescript
const graphStore = dozerDBGraph({
  url: 'bolt://localhost:7687',
  username: 'dozerdb',
  password: 'password',
  hasGDS: true, // Enable GDS features
})('my-namespace');

// Now you can use Leiden clustering
await graphStore.clustering('leiden');

// And node embeddings
const [embeddings, nodeIds] = await graphStore.embedNodes('node2vec');

Community Detection

Label Propagation (Default)

Label propagation works without any additional plugins:

typescript
await graphStore.clustering('label_propagation');

const communities = await graphStore.communitySchema();

// Structure:
{
  "community-123": {
    level: 0,
    title: "Cluster 123",
    nodes: ["entity-1", "entity-2", ...],
    edges: [["entity-1", "entity-2"], ...],
    chunkIds: ["doc-1", "doc-2", ...],
    occurrence: 0.85,
    subCommunities: [],
  }
}

Leiden Algorithm (Requires GDS)

When GDS is available, Leiden provides hierarchical clustering:

typescript
const graphStore = dozerDBGraph({
  // ... config
  hasGDS: true,
  maxGraphClusterSize: 10,
  graphClusterSeed: 42,
})('my-namespace');

await graphStore.clustering('leiden');

Supported Algorithms

AlgorithmRequires GDSHierarchicalBest For
label_propagationNoNoGeneral use, no setup
leidenYesYesHigh-quality communities
louvainYesYesLarge graphs

Node Embeddings (GDS Required)

When GDS is installed, you can generate graph-based node embeddings:

typescript
const graphStore = dozerDBGraph({
  // ... config
  hasGDS: true,
})('my-namespace');

// Node2Vec embeddings
const [embeddings, nodeIds] = await graphStore.embedNodes('node2vec');

// FastRP embeddings (faster)
const [embeddings, nodeIds] = await graphStore.embedNodes('fastRP');

Advanced Features

Knowledge Graph Extraction

typescript
const kg = await graphStore.getKnowledgeGraph(
  'person',      // Node label to start from
  2,             // Max depth
  1,             // Min degree (connectivity)
  true           // Include nodes at exactly minDegree
);

// Returns:
{
  nodes: [
    { id: "entity-1", labels: ["person"], properties: {...} },
    ...
  ],
  edges: [
    { id: "edge-1", type: "RELATED", source: "entity-1", target: "entity-2", properties: {...} },
    ...
  ]
}

Direct Cypher Queries

For advanced use cases, access the driver directly:

typescript
import neo4j from 'neo4j-driver';

const driver = neo4j.driver(
  'bolt://localhost:7687',
  neo4j.auth.basic('dozerdb', 'password')
);

const session = driver.session();
const result = await session.run(`
  MATCH (n:person)-[r:works_with]->(m:person)
  WHERE n.id = $nodeId
  RETURN m.id AS colleague
`, { nodeId: 'entity-1' });

await session.close();
await driver.close();

Neo4j vs DozerDB

FeatureNeo4jDozerDB
ProtocolBoltBolt
Query LanguageCypherCypher
GDS SupportBuilt-in (Enterprise)Optional plugin
LicenseGPL / CommercialOpen Source
ClusteringRequired for GDSLabel propagation built-in

DozerDB is a great choice when you need:

  • A fully open-source graph database
  • Neo4j compatibility without licensing concerns
  • Basic community detection without GDS

Use Neo4j when you need:

  • Advanced GDS algorithms (PageRank, centrality, etc.)
  • Enterprise support
  • Neo4j Aura cloud hosting

Performance Optimization

Indexes

DozerDB automatically creates indexes on node IDs. For better performance:

cypher
-- Index on entity descriptions
CREATE INDEX entity_description
FOR (n:Node)
ON (n.description)

-- Composite index for frequent queries
CREATE INDEX entity_type_source
FOR (n:Node)
ON (n.entity_type, n.source_id)

Query Optimization

cypher
-- Use EXPLAIN to analyze query plans
EXPLAIN MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

-- Use PROFILE for execution stats
PROFILE MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

Production Deployment

Docker Compose

yaml
version: '3.8'
services:
  dozerdb:
    image: graphstack/dozerdb:latest
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: dozerdb/your-secure-password
      NEO4J_dbms_memory_heap_initial__size: 2G
      NEO4J_dbms_memory_heap_max__size: 4G
      NEO4J_dbms_memory_pagecache_size: 2G
    volumes:
      - dozerdb-data:/data
      - dozerdb-logs:/logs

volumes:
  dozerdb-data:
  dozerdb-logs:

Memory Configuration

For production workloads:

bash
# Heap size (for query execution)
NEO4J_dbms_memory_heap_initial_size=4G
NEO4J_dbms_memory_heap_max_size=8G

# Page cache (for graph data)
NEO4J_dbms_memory_pagecache_size=4G

Troubleshooting

Connection Refused

Error: ServiceUnavailable: Connection refused

Solution:

  1. Verify DozerDB is running: docker ps
  2. Check port mapping: 7687 for Bolt
  3. Test with Browser at http://localhost:7474

Clustering Algorithm Not Supported

Error: Clustering algorithm leiden not supported

Solution:

  1. Use label_propagation instead (no GDS required)
  2. Or install GDS plugin and set hasGDS: true

Out of Memory

Error: OutOfMemoryError: Java heap space

Solution:

  1. Increase heap size in Docker environment variables
  2. Use pagination for large result sets
  3. Add LIMIT clauses to queries

Cost Considerations

DeploymentCostBest For
Self-hosted (Docker)Free + hostingAll use cases
KubernetesFree + infrastructureProduction scale
Managed cloudVariableSimplified operations

DozerDB is fully open-source, so there are no licensing fees regardless of scale.

Next Steps

Released under the Elastic License 2.0.