DozerDB Graph Storage

The @graphrag-js/dozerdb package provides production-ready graph storage using DozerDB, an open-source Neo4j-compatible graph database.

Installation

bash

pnpm add @graphrag-js/dozerdb

Features

Neo4j-Compatible - Uses Bolt protocol and Cypher query language
Label Propagation Clustering - Built-in community detection (no GDS required)
Optional GDS Support - Leiden/Louvain algorithms when GDS is installed
ACID Transactions - Data consistency guarantees
Multi-Tenant Support - Label-based namespace isolation
Open Source - Fully open-source database

Prerequisites

DozerDB Database

You need a running DozerDB instance:

Option 1: Docker (Recommended)

bash

docker run -d \
  --name dozerdb \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=dozerdb/password \
  graphstack/dozerdb:latest

Option 2: Docker Compose

yaml

version: '3.8'
services:
  dozerdb:
    image: graphstack/dozerdb:latest
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: dozerdb/password
    volumes:
      - dozerdb-data:/data

volumes:
  dozerdb-data:

Verify Connection

Connect using the Neo4j Browser at http://localhost:7474 or via Cypher:

cypher

RETURN "DozerDB connected!" AS message

Quick Start

typescript

import { createGraph } from '@graphrag-js/core';
import { microsoftGraph } from '@graphrag-js/microsoft';
import { dozerDBGraph } from '@graphrag-js/dozerdb';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph(),
  storage: {
    graph: dozerDBGraph({
      url: 'bolt://localhost:7687',
      username: 'dozerdb',
      password: 'password',
      database: 'neo4j',
      hasGDS: false, // Set true if GDS plugin is installed
    }),
  }
});

await graph.insert('Your documents...');
const result = await graph.query('Your question?');

Configuration

`dozerDBGraph(config)`

typescript

interface DozerDBGraphConfig {
  url: string;                    // DozerDB connection URL (bolt://host:port)
  username: string;               // Database username
  password: string;               // Database password
  database?: string;              // Database name (default: 'neo4j')
  workingDir?: string;            // Namespace prefix (default: 'default')
  maxGraphClusterSize?: number;   // Max clustering levels (default: 10)
  graphClusterSeed?: number;      // Clustering random seed (default: 42)
  hasGDS?: boolean;               // GDS plugin available (default: false)
}

Connection URL Formats

typescript

// Local instance
url: 'bolt://localhost:7687'

// Remote instance
url: 'bolt://my-dozerdb-server.com:7687'

// Custom port
url: 'bolt://my-server.com:17687'

Usage Examples

Basic Graph Operations

typescript

import { dozerDBGraph } from '@graphrag-js/dozerdb';

const graphStore = dozerDBGraph({
  url: 'bolt://localhost:7687',
  username: 'dozerdb',
  password: 'password',
})('my-namespace');

// Add nodes
await graphStore.upsertNode('entity-1', {
  entity_type: 'person',
  description: 'John Doe, software engineer',
  source_id: 'doc-1',
});

// Add edges
await graphStore.upsertEdge('entity-1', 'entity-2', {
  relationship: 'works_with',
  weight: 0.8,
  description: 'Collaborates on projects',
});

// Run label propagation clustering (no GDS required)
await graphStore.clustering('label_propagation');

// Get community structure
const communities = await graphStore.communitySchema();

With Microsoft GraphRAG

DozerDB works well with Microsoft GraphRAG's community detection:

typescript

import { microsoftGraph } from '@graphrag-js/microsoft';
import { dozerDBGraph } from '@graphrag-js/dozerdb';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph({
    entityTypes: ['person', 'organization', 'location', 'event'],
    graphClusterAlgorithm: 'label_propagation', // Works without GDS
    maxGraphClusterSize: 10,
  }),
  storage: {
    graph: dozerDBGraph({
      url: 'bolt://localhost:7687',
      username: 'dozerdb',
      password: 'password',
    }),
  }
});

With GDS Plugin

If you have the GDS plugin installed, you can use Leiden clustering:

typescript

const graphStore = dozerDBGraph({
  url: 'bolt://localhost:7687',
  username: 'dozerdb',
  password: 'password',
  hasGDS: true, // Enable GDS features
})('my-namespace');

// Now you can use Leiden clustering
await graphStore.clustering('leiden');

// And node embeddings
const [embeddings, nodeIds] = await graphStore.embedNodes('node2vec');

Community Detection

Label Propagation (Default)

Label propagation works without any additional plugins:

typescript

await graphStore.clustering('label_propagation');

const communities = await graphStore.communitySchema();

// Structure:
{
  "community-123": {
    level: 0,
    title: "Cluster 123",
    nodes: ["entity-1", "entity-2", ...],
    edges: [["entity-1", "entity-2"], ...],
    chunkIds: ["doc-1", "doc-2", ...],
    occurrence: 0.85,
    subCommunities: [],
  }
}

Leiden Algorithm (Requires GDS)

When GDS is available, Leiden provides hierarchical clustering:

typescript

const graphStore = dozerDBGraph({
  // ... config
  hasGDS: true,
  maxGraphClusterSize: 10,
  graphClusterSeed: 42,
})('my-namespace');

await graphStore.clustering('leiden');

Supported Algorithms

Algorithm	Requires GDS	Hierarchical	Best For
`label_propagation`	No	No	General use, no setup
`leiden`	Yes	Yes	High-quality communities
`louvain`	Yes	Yes	Large graphs

Node Embeddings (GDS Required)

When GDS is installed, you can generate graph-based node embeddings:

typescript

const graphStore = dozerDBGraph({
  // ... config
  hasGDS: true,
})('my-namespace');

// Node2Vec embeddings
const [embeddings, nodeIds] = await graphStore.embedNodes('node2vec');

// FastRP embeddings (faster)
const [embeddings, nodeIds] = await graphStore.embedNodes('fastRP');

Advanced Features

Knowledge Graph Extraction

typescript

const kg = await graphStore.getKnowledgeGraph(
  'person',      // Node label to start from
  2,             // Max depth
  1,             // Min degree (connectivity)
  true           // Include nodes at exactly minDegree
);

// Returns:
{
  nodes: [
    { id: "entity-1", labels: ["person"], properties: {...} },
    ...
  ],
  edges: [
    { id: "edge-1", type: "RELATED", source: "entity-1", target: "entity-2", properties: {...} },
    ...
  ]
}

Direct Cypher Queries

For advanced use cases, access the driver directly:

typescript

import neo4j from 'neo4j-driver';

const driver = neo4j.driver(
  'bolt://localhost:7687',
  neo4j.auth.basic('dozerdb', 'password')
);

const session = driver.session();
const result = await session.run(`
  MATCH (n:person)-[r:works_with]->(m:person)
  WHERE n.id = $nodeId
  RETURN m.id AS colleague
`, { nodeId: 'entity-1' });

await session.close();
await driver.close();

Neo4j vs DozerDB

Feature	Neo4j	DozerDB
Protocol	Bolt	Bolt
Query Language	Cypher	Cypher
GDS Support	Built-in (Enterprise)	Optional plugin
License	GPL / Commercial	Open Source
Clustering	Required for GDS	Label propagation built-in

DozerDB is a great choice when you need:

A fully open-source graph database
Neo4j compatibility without licensing concerns
Basic community detection without GDS

Use Neo4j when you need:

Advanced GDS algorithms (PageRank, centrality, etc.)
Enterprise support
Neo4j Aura cloud hosting

Performance Optimization

Indexes

DozerDB automatically creates indexes on node IDs. For better performance:

cypher

-- Index on entity descriptions
CREATE INDEX entity_description
FOR (n:Node)
ON (n.description)

-- Composite index for frequent queries
CREATE INDEX entity_type_source
FOR (n:Node)
ON (n.entity_type, n.source_id)

Query Optimization

cypher

-- Use EXPLAIN to analyze query plans
EXPLAIN MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

-- Use PROFILE for execution stats
PROFILE MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

Production Deployment

Docker Compose

yaml

version: '3.8'
services:
  dozerdb:
    image: graphstack/dozerdb:latest
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: dozerdb/your-secure-password
      NEO4J_dbms_memory_heap_initial__size: 2G
      NEO4J_dbms_memory_heap_max__size: 4G
      NEO4J_dbms_memory_pagecache_size: 2G
    volumes:
      - dozerdb-data:/data
      - dozerdb-logs:/logs

volumes:
  dozerdb-data:
  dozerdb-logs:

Memory Configuration

For production workloads:

bash

# Heap size (for query execution)
NEO4J_dbms_memory_heap_initial_size=4G
NEO4J_dbms_memory_heap_max_size=8G

# Page cache (for graph data)
NEO4J_dbms_memory_pagecache_size=4G

Troubleshooting

Connection Refused

Error: ServiceUnavailable: Connection refused

Solution:

Verify DozerDB is running: docker ps
Check port mapping: 7687 for Bolt
Test with Browser at http://localhost:7474

Clustering Algorithm Not Supported

Error: Clustering algorithm leiden not supported

Solution:

Use label_propagation instead (no GDS required)
Or install GDS plugin and set hasGDS: true

Out of Memory

Error: OutOfMemoryError: Java heap space

Solution:

Increase heap size in Docker environment variables
Use pagination for large result sets
Add LIMIT clauses to queries

Cost Considerations

Deployment	Cost	Best For
Self-hosted (Docker)	Free + hosting	All use cases
Kubernetes	Free + infrastructure	Production scale
Managed cloud	Variable	Simplified operations

DozerDB is fully open-source, so there are no licensing fees regardless of scale.

Next Steps

Neo4j Storage - For GDS integration
FalkorDB Storage - Redis-based alternative
Qdrant Storage - For vector search
Microsoft GraphRAG Algorithm

DozerDB Graph Storage ​

Installation ​

Features ​

Prerequisites ​

DozerDB Database ​

Verify Connection ​

Quick Start ​

Configuration ​

dozerDBGraph(config) ​

Connection URL Formats ​

Usage Examples ​

Basic Graph Operations ​

With Microsoft GraphRAG ​

With GDS Plugin ​

Community Detection ​

Label Propagation (Default) ​

Leiden Algorithm (Requires GDS) ​

Supported Algorithms ​

Node Embeddings (GDS Required) ​

Advanced Features ​

Knowledge Graph Extraction ​

Direct Cypher Queries ​

Neo4j vs DozerDB ​

Performance Optimization ​

Indexes ​

Query Optimization ​

Production Deployment ​

Docker Compose ​

Memory Configuration ​

Troubleshooting ​

Connection Refused ​

Clustering Algorithm Not Supported ​

Out of Memory ​

Cost Considerations ​

Next Steps ​

DozerDB Graph Storage

Installation

Features

Prerequisites

DozerDB Database

Verify Connection

Quick Start

Configuration

`dozerDBGraph(config)`

Connection URL Formats

Usage Examples

Basic Graph Operations

With Microsoft GraphRAG

With GDS Plugin

Community Detection

Label Propagation (Default)

Leiden Algorithm (Requires GDS)

Supported Algorithms

Node Embeddings (GDS Required)

Advanced Features

Knowledge Graph Extraction

Direct Cypher Queries

Neo4j vs DozerDB

Performance Optimization

Indexes

Query Optimization

Production Deployment

Docker Compose

Memory Configuration

Troubleshooting

Connection Refused

Clustering Algorithm Not Supported

Out of Memory

Cost Considerations

Next Steps