Neo4j Graph Storage

The @graphrag-js/neo4j package provides production-ready graph storage using Neo4j with Graph Data Science (GDS) support for advanced graph algorithms.

Installation

bash

pnpm add @graphrag-js/neo4j

Features

✅ Leiden Community Detection - Built-in GDS integration
✅ Cypher Query Language - Powerful graph queries
✅ ACID Transactions - Data consistency guarantees
✅ Horizontal Scaling - Cluster support (Enterprise)
✅ Advanced Algorithms - PageRank, shortest paths, centrality
✅ Label-based Isolation - Multi-tenant support

Prerequisites

Neo4j Database

You need Neo4j with the Graph Data Science (GDS) plugin:

Option 1: Docker (Recommended)

bash

docker run -d \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  -e NEO4J_PLUGINS='["graph-data-science"]' \
  neo4j:5-enterprise

Option 2: Neo4j Desktop

Download Neo4j Desktop
Create a new database
Install the GDS plugin from the plugins tab

Option 3: Neo4j AuraDB

Sign up at Neo4j Aura
GDS is available on Professional and Enterprise tiers

Verify GDS Installation

cypher

CALL gds.version()

Should return the GDS version (e.g., 2.6.0).

Quick Start

typescript

import { createGraph } from '@graphrag-js/core';
import { microsoftGraph } from '@graphrag-js/microsoft';
import { neo4jGraph } from '@graphrag-js/neo4j';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph(), // Neo4j works best with community-based algorithms
  storage: {
    graph: neo4jGraph({
      url: 'bolt://localhost:7687',
      username: 'neo4j',
      password: 'password',
      database: 'neo4j', // optional
      maxGraphClusterSize: 10,
      graphClusterSeed: 42,
    }),
  }
});

await graph.insert('Your documents...');
const result = await graph.query('Your question?');

Configuration

`neo4jGraph(config)`

typescript

interface Neo4jGraphConfig {
  url: string;                    // Neo4j connection URL
  username: string;               // Database username
  password: string;               // Database password
  database?: string;              // Database name (default: 'neo4j')
  workingDir?: string;            // Namespace prefix (default: 'default')
  maxGraphClusterSize?: number;   // Leiden max levels (default: 10)
  graphClusterSeed?: number;      // Clustering random seed (default: 42)
}

Connection URL Formats

typescript

// Local instance
url: 'bolt://localhost:7687'

// Aura cloud instance
url: 'neo4j+s://xxxxx.databases.neo4j.io'

// Custom port
url: 'bolt://my-server.com:7687'

Usage Examples

Basic Graph Operations

typescript

import { neo4jGraph } from '@graphrag-js/neo4j';

const graphStore = neo4jGraph({
  url: 'bolt://localhost:7687',
  username: 'neo4j',
  password: 'password',
})('my-namespace');

// Add nodes
await graphStore.upsertNode('entity-1', {
  entity_type: 'person',
  description: 'John Doe, software engineer',
  source_id: 'doc-1',
});

// Add edges
await graphStore.upsertEdge('entity-1', 'entity-2', {
  relationship: 'works_with',
  weight: 0.8,
  description: 'Collaborates on projects',
});

// Run Leiden clustering
await graphStore.clustering('leiden');

// Get community structure
const communities = await graphStore.communitySchema();

With Microsoft GraphRAG

Neo4j is ideal for Microsoft GraphRAG's community detection:

typescript

import { microsoftGraph } from '@graphrag-js/microsoft';
import { neo4jGraph } from '@graphrag-js/neo4j';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: microsoftGraph({
    entityTypes: ['person', 'organization', 'location', 'event'],
    graphClusterAlgorithm: 'leiden',
    maxGraphClusterSize: 10,
  }),
  storage: {
    graph: neo4jGraph({
      url: 'bolt://localhost:7687',
      username: 'neo4j',
      password: 'password',
      maxGraphClusterSize: 10,
    }),
  }
});

Multi-Database Setup

typescript

// Use different databases for different namespaces
const tenant1 = neo4jGraph({
  url: 'bolt://localhost:7687',
  username: 'neo4j',
  password: 'password',
  database: 'tenant1',
})('namespace-1');

const tenant2 = neo4jGraph({
  url: 'bolt://localhost:7687',
  username: 'neo4j',
  password: 'password',
  database: 'tenant2',
})('namespace-2');

Community Detection

Leiden Algorithm

Neo4j GDS provides hierarchical Leiden clustering:

typescript

await graphStore.clustering('leiden');

const communities = await graphStore.communitySchema();

// Structure:
{
  "community-123": {
    level: 0,           // Hierarchy level
    title: "Cluster 123",
    nodes: ["entity-1", "entity-2", ...],
    edges: [["entity-1", "entity-2"], ...],
    chunk_ids: ["doc-1", "doc-2", ...],
    occurrence: 0.85,   // Importance score
    sub_communities: ["community-456", ...],
  }
}

GDS Configuration

The clustering uses these GDS parameters:

cypher

CALL gds.leiden.write(
  'graph_namespace',
  {
    writeProperty: 'communityIds',
    includeIntermediateCommunities: true,
    relationshipWeightProperty: 'weight',
    maxLevels: 10,                    // maxGraphClusterSize
    tolerance: 0.0001,
    gamma: 1.0,
    theta: 0.01,
    randomSeed: 42                    // graphClusterSeed
  }
)

Advanced Features

Knowledge Graph Extraction

typescript

const kg = await graphStore.getKnowledgeGraph(
  'person',      // Node label to start from
  2,             // Max depth
  1,             // Min degree (connectivity)
  true           // Include nodes at exactly minDegree
);

// Returns:
{
  nodes: [
    { id: "entity-1", labels: ["person"], properties: {...} },
    ...
  ],
  edges: [
    { id: "edge-1", type: "RELATED", source: "entity-1", target: "entity-2", properties: {...} },
    ...
  ]
}

Direct Cypher Queries

For advanced use cases, access the driver directly:

typescript

import neo4j from 'neo4j-driver';

const driver = neo4j.driver(
  'bolt://localhost:7687',
  neo4j.auth.basic('neo4j', 'password')
);

const session = driver.session();
const result = await session.run(`
  MATCH (n:person)-[r:works_with]->(m:person)
  WHERE n.id = $nodeId
  RETURN m.id AS colleague
`, { nodeId: 'entity-1' });

await session.close();
await driver.close();

Performance Optimization

Indexes

Neo4j automatically creates indexes on node IDs. For better performance:

cypher

-- Index on entity descriptions (full-text search)
CREATE FULLTEXT INDEX entity_descriptions
FOR (n:person|organization|location)
ON EACH [n.description]

-- Composite indexes for frequent queries
CREATE INDEX entity_type_source
FOR (n:Node)
ON (n.entity_type, n.source_id)

Query Optimization

cypher

-- Use EXPLAIN to analyze query plans
EXPLAIN MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

-- Use PROFILE for execution stats
PROFILE MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m

Batch Operations

For bulk inserts, use transactions:

typescript

const session = driver.session();
const tx = session.beginTransaction();

try {
  for (const node of nodes) {
    await tx.run(
      'MERGE (n:Node {id: $id}) SET n += $props',
      { id: node.id, props: node.properties }
    );
  }
  await tx.commit();
} catch (error) {
  await tx.rollback();
  throw error;
} finally {
  await session.close();
}

Monitoring & Debugging

Neo4j Browser

Access at http://localhost:7474

Useful queries:

cypher

-- Count nodes by label
MATCH (n)
RETURN labels(n) AS label, count(*) AS count
ORDER BY count DESC

-- View graph structure
MATCH (n)-[r]->(m)
RETURN n, r, m
LIMIT 100

-- Check community distribution
MATCH (n)
WHERE n.communityIds IS NOT NULL
RETURN n.communityIds[0] AS community, count(*) AS size
ORDER BY size DESC

GDS Monitoring

cypher

-- List projected graphs
CALL gds.graph.list()

-- View graph stats
CALL gds.graph.list('graph_namespace')
YIELD nodeCount, relationshipCount

-- Drop stale graphs
CALL gds.graph.drop('graph_namespace')

Production Deployment

Docker Compose

yaml

version: '3.8'
services:
  neo4j:
    image: neo4j:5-enterprise
    ports:
      - "7474:7474"  # HTTP
      - "7687:7687"  # Bolt
    environment:
      NEO4J_AUTH: neo4j/your-secure-password
      NEO4J_PLUGINS: '["graph-data-science"]'
      NEO4J_dbms_memory_heap_initial__size: 2G
      NEO4J_dbms_memory_heap_max__size: 4G
      NEO4J_dbms_memory_pagecache_size: 2G
    volumes:
      - neo4j-data:/data
      - neo4j-logs:/logs

volumes:
  neo4j-data:
  neo4j-logs:

Memory Configuration

For production workloads:

bash

# Heap size (for query execution)
NEO4J_dbms_memory_heap_initial_size=4G
NEO4J_dbms_memory_heap_max_size=8G

# Page cache (for graph data)
NEO4J_dbms_memory_pagecache_size=4G

Backup Strategy

bash

# Backup database
neo4j-admin database dump neo4j --to-path=/backups

# Restore database
neo4j-admin database load neo4j --from-path=/backups/neo4j.dump

Troubleshooting

GDS Plugin Not Found

Error: There is no procedure with the name gds.leiden.write

Solution:

Verify GDS is installed: CALL gds.version()
Restart Neo4j after installing plugins
Check Neo4j logs for plugin loading errors

Connection Refused

Error: ServiceUnavailable: Connection refused

Solution:

Verify Neo4j is running: docker ps
Check port mapping: 7687 for Bolt
Test with Neo4j Browser at http://localhost:7474

Out of Memory

Error: OutOfMemoryError: Java heap space

Solution:

Increase heap size in neo4j.conf or Docker env
Use pagination for large result sets
Add LIMIT clauses to queries

Cost Considerations

Deployment	Cost	Best For
Self-hosted (Community)	Free + hosting	Development, small production
Self-hosted (Enterprise)	License fee	Large-scale production
Neo4j Aura Free	Free (limited)	Development, testing
Neo4j Aura Professional	~$65/month+	Production
Neo4j Aura Enterprise	Custom	Enterprise scale

Neo4j Graph Storage ​

Installation ​

Features ​

Prerequisites ​

Neo4j Database ​

Verify GDS Installation ​

Quick Start ​

Configuration ​

neo4jGraph(config) ​

Connection URL Formats ​

Usage Examples ​

Basic Graph Operations ​

With Microsoft GraphRAG ​

Multi-Database Setup ​

Community Detection ​

Leiden Algorithm ​

GDS Configuration ​

Advanced Features ​

Knowledge Graph Extraction ​

Direct Cypher Queries ​

Performance Optimization ​

Indexes ​

Query Optimization ​

Batch Operations ​

Monitoring & Debugging ​

Neo4j Browser ​

GDS Monitoring ​

Production Deployment ​

Docker Compose ​

Memory Configuration ​

Backup Strategy ​

Troubleshooting ​

GDS Plugin Not Found ​

Connection Refused ​

Out of Memory ​

Cost Considerations ​

Next Steps ​

Neo4j Graph Storage

Installation

Features

Prerequisites

Neo4j Database

Verify GDS Installation

Quick Start

Configuration

`neo4jGraph(config)`

Connection URL Formats

Usage Examples

Basic Graph Operations

With Microsoft GraphRAG

Multi-Database Setup

Community Detection

Leiden Algorithm

GDS Configuration

Advanced Features

Knowledge Graph Extraction

Direct Cypher Queries

Performance Optimization

Indexes

Query Optimization

Batch Operations

Monitoring & Debugging

Neo4j Browser

GDS Monitoring

Production Deployment

Docker Compose

Memory Configuration

Backup Strategy

Troubleshooting

GDS Plugin Not Found

Connection Refused

Out of Memory

Cost Considerations

Next Steps