Skip to content

Neo4j Graph Storage

The @graphrag-js/neo4j package provides production-ready graph storage using Neo4j with Graph Data Science (GDS) support for advanced graph algorithms.

Installation

bash
pnpm add @graphrag-js/neo4j

Features

  • Leiden Community Detection - Built-in GDS integration
  • Cypher Query Language - Powerful graph queries
  • ACID Transactions - Data consistency guarantees
  • Horizontal Scaling - Cluster support (Enterprise)
  • Advanced Algorithms - PageRank, shortest paths, centrality
  • Label-based Isolation - Multi-tenant support

Prerequisites

Neo4j Database

You need Neo4j with the Graph Data Science (GDS) plugin:

Option 1: Docker (Recommended)

bash
docker run -d \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  -e NEO4J_PLUGINS='["graph-data-science"]' \
  neo4j:5-enterprise

Option 2: Neo4j Desktop

  1. Download Neo4j Desktop
  2. Create a new database
  3. Install the GDS plugin from the plugins tab

Option 3: Neo4j AuraDB

  • Sign up at Neo4j Aura
  • GDS is available on Professional and Enterprise tiers

Verify GDS Installation

cypher
CALL gds.version()

Should return the GDS version (e.g., 2.6.0).

Quick Start

typescript
import { createGraph } from '@graphrag-js/core';
import { microsoftGraph } from '@graphrag-js/microsoft';
import { neo4jGraph } from '@graphrag-js/neo4j';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph(), // Neo4j works best with community-based algorithms
  storage: {
    graph: neo4jGraph({
      url: 'bolt://localhost:7687',
      username: 'neo4j',
      password: 'password',
      database: 'neo4j', // optional
      maxGraphClusterSize: 10,
      graphClusterSeed: 42,
    }),
  }
});

await graph.insert('Your documents...');
const result = await graph.query('Your question?');

Configuration

neo4jGraph(config)

typescript
interface Neo4jGraphConfig {
  url: string;                    // Neo4j connection URL
  username: string;               // Database username
  password: string;               // Database password
  database?: string;              // Database name (default: 'neo4j')
  workingDir?: string;            // Namespace prefix (default: 'default')
  maxGraphClusterSize?: number;   // Leiden max levels (default: 10)
  graphClusterSeed?: number;      // Clustering random seed (default: 42)
}

Connection URL Formats

typescript
// Local instance
url: 'bolt://localhost:7687'

// Aura cloud instance
url: 'neo4j+s://xxxxx.databases.neo4j.io'

// Custom port
url: 'bolt://my-server.com:7687'

Usage Examples

Basic Graph Operations

typescript
import { neo4jGraph } from '@graphrag-js/neo4j';

const graphStore = neo4jGraph({
  url: 'bolt://localhost:7687',
  username: 'neo4j',
  password: 'password',
})('my-namespace');

// Add nodes
await graphStore.upsertNode('entity-1', {
  entity_type: 'person',
  description: 'John Doe, software engineer',
  source_id: 'doc-1',
});

// Add edges
await graphStore.upsertEdge('entity-1', 'entity-2', {
  relationship: 'works_with',
  weight: 0.8,
  description: 'Collaborates on projects',
});

// Run Leiden clustering
await graphStore.clustering('leiden');

// Get community structure
const communities = await graphStore.communitySchema();

With Microsoft GraphRAG

Neo4j is ideal for Microsoft GraphRAG's community detection:

typescript
import { microsoftGraph } from '@graphrag-js/microsoft';
import { neo4jGraph } from '@graphrag-js/neo4j';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph({
    entityTypes: ['person', 'organization', 'location', 'event'],
    graphClusterAlgorithm: 'leiden',
    maxGraphClusterSize: 10,
  }),
  storage: {
    graph: neo4jGraph({
      url: 'bolt://localhost:7687',
      username: 'neo4j',
      password: 'password',
      maxGraphClusterSize: 10,
    }),
  }
});

Multi-Database Setup

typescript
// Use different databases for different namespaces
const tenant1 = neo4jGraph({
  url: 'bolt://localhost:7687',
  username: 'neo4j',
  password: 'password',
  database: 'tenant1',
})('namespace-1');

const tenant2 = neo4jGraph({
  url: 'bolt://localhost:7687',
  username: 'neo4j',
  password: 'password',
  database: 'tenant2',
})('namespace-2');

Community Detection

Leiden Algorithm

Neo4j GDS provides hierarchical Leiden clustering:

typescript
await graphStore.clustering('leiden');

const communities = await graphStore.communitySchema();

// Structure:
{
  "community-123": {
    level: 0,           // Hierarchy level
    title: "Cluster 123",
    nodes: ["entity-1", "entity-2", ...],
    edges: [["entity-1", "entity-2"], ...],
    chunk_ids: ["doc-1", "doc-2", ...],
    occurrence: 0.85,   // Importance score
    sub_communities: ["community-456", ...],
  }
}

GDS Configuration

The clustering uses these GDS parameters:

cypher
CALL gds.leiden.write(
  'graph_namespace',
  {
    writeProperty: 'communityIds',
    includeIntermediateCommunities: true,
    relationshipWeightProperty: 'weight',
    maxLevels: 10,                    // maxGraphClusterSize
    tolerance: 0.0001,
    gamma: 1.0,
    theta: 0.01,
    randomSeed: 42                    // graphClusterSeed
  }
)

Advanced Features

Knowledge Graph Extraction

typescript
const kg = await graphStore.getKnowledgeGraph(
  'person',      // Node label to start from
  2,             // Max depth
  1,             // Min degree (connectivity)
  true           // Include nodes at exactly minDegree
);

// Returns:
{
  nodes: [
    { id: "entity-1", labels: ["person"], properties: {...} },
    ...
  ],
  edges: [
    { id: "edge-1", type: "RELATED", source: "entity-1", target: "entity-2", properties: {...} },
    ...
  ]
}

Direct Cypher Queries

For advanced use cases, access the driver directly:

typescript
import neo4j from 'neo4j-driver';

const driver = neo4j.driver(
  'bolt://localhost:7687',
  neo4j.auth.basic('neo4j', 'password')
);

const session = driver.session();
const result = await session.run(`
  MATCH (n:person)-[r:works_with]->(m:person)
  WHERE n.id = $nodeId
  RETURN m.id AS colleague
`, { nodeId: 'entity-1' });

await session.close();
await driver.close();

Performance Optimization

Indexes

Neo4j automatically creates indexes on node IDs. For better performance:

cypher
-- Index on entity descriptions (full-text search)
CREATE FULLTEXT INDEX entity_descriptions
FOR (n:person|organization|location)
ON EACH [n.description]

-- Composite indexes for frequent queries
CREATE INDEX entity_type_source
FOR (n:Node)
ON (n.entity_type, n.source_id)

Query Optimization

cypher
-- Use EXPLAIN to analyze query plans
EXPLAIN MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

-- Use PROFILE for execution stats
PROFILE MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

Batch Operations

For bulk inserts, use transactions:

typescript
const session = driver.session();
const tx = session.beginTransaction();

try {
  for (const node of nodes) {
    await tx.run(
      'MERGE (n:Node {id: $id}) SET n += $props',
      { id: node.id, props: node.properties }
    );
  }
  await tx.commit();
} catch (error) {
  await tx.rollback();
  throw error;
} finally {
  await session.close();
}

Monitoring & Debugging

Neo4j Browser

Access at http://localhost:7474

Useful queries:

cypher
-- Count nodes by label
MATCH (n)
RETURN labels(n) AS label, count(*) AS count
ORDER BY count DESC

-- View graph structure
MATCH (n)-[r]->(m)
RETURN n, r, m
LIMIT 100

-- Check community distribution
MATCH (n)
WHERE n.communityIds IS NOT NULL
RETURN n.communityIds[0] AS community, count(*) AS size
ORDER BY size DESC

GDS Monitoring

cypher
-- List projected graphs
CALL gds.graph.list()

-- View graph stats
CALL gds.graph.list('graph_namespace')
YIELD nodeCount, relationshipCount

-- Drop stale graphs
CALL gds.graph.drop('graph_namespace')

Production Deployment

Docker Compose

yaml
version: '3.8'
services:
  neo4j:
    image: neo4j:5-enterprise
    ports:
      - "7474:7474"  # HTTP
      - "7687:7687"  # Bolt
    environment:
      NEO4J_AUTH: neo4j/your-secure-password
      NEO4J_PLUGINS: '["graph-data-science"]'
      NEO4J_dbms_memory_heap_initial__size: 2G
      NEO4J_dbms_memory_heap_max__size: 4G
      NEO4J_dbms_memory_pagecache_size: 2G
    volumes:
      - neo4j-data:/data
      - neo4j-logs:/logs

volumes:
  neo4j-data:
  neo4j-logs:

Memory Configuration

For production workloads:

bash
# Heap size (for query execution)
NEO4J_dbms_memory_heap_initial_size=4G
NEO4J_dbms_memory_heap_max_size=8G

# Page cache (for graph data)
NEO4J_dbms_memory_pagecache_size=4G

Backup Strategy

bash
# Backup database
neo4j-admin database dump neo4j --to-path=/backups

# Restore database
neo4j-admin database load neo4j --from-path=/backups/neo4j.dump

Troubleshooting

GDS Plugin Not Found

Error: There is no procedure with the name gds.leiden.write

Solution:

  1. Verify GDS is installed: CALL gds.version()
  2. Restart Neo4j after installing plugins
  3. Check Neo4j logs for plugin loading errors

Connection Refused

Error: ServiceUnavailable: Connection refused

Solution:

  1. Verify Neo4j is running: docker ps
  2. Check port mapping: 7687 for Bolt
  3. Test with Neo4j Browser at http://localhost:7474

Out of Memory

Error: OutOfMemoryError: Java heap space

Solution:

  1. Increase heap size in neo4j.conf or Docker env
  2. Use pagination for large result sets
  3. Add LIMIT clauses to queries

Cost Considerations

DeploymentCostBest For
Self-hosted (Community)Free + hostingDevelopment, small production
Self-hosted (Enterprise)License feeLarge-scale production
Neo4j Aura FreeFree (limited)Development, testing
Neo4j Aura Professional~$65/month+Production
Neo4j Aura EnterpriseCustomEnterprise scale

Next Steps

Released under the Elastic License 2.0.