PostgreSQL + pgvector Storage
The @graphrag-js/pgvector package provides vector storage using PostgreSQL with the pgvector extension, ideal for SQL-based workflows.
Installation
bash
pnpm add @graphrag-js/pgvectorFeatures
- ✅ SQL-Based - Familiar PostgreSQL database
- ✅ ACID Transactions - Data consistency guarantees
- ✅ JSONB Metadata - Rich filtering capabilities
- ✅ IVFFlat Indexing - Fast approximate search
- ✅ Horizontal Scaling - Sharding and replication
- ✅ Unified Storage - Combine with other PostgreSQL data
Prerequisites
PostgreSQL + pgvector
Option 1: Docker (Recommended)
bash
docker run -d \
--name postgres \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=graphrag \
-p 5432:5432 \
-v pgdata:/var/lib/postgresql/data \
pgvector/pgvector:pg16Option 2: Install pgvector on Existing PostgreSQL
bash
# macOS (Homebrew)
brew install pgvector
# Ubuntu/Debian
sudo apt install postgresql-16-pgvector
# From source
cd /tmp
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make installThen enable the extension:
sql
CREATE EXTENSION vector;Option 3: Hosted PostgreSQL
- Supabase - pgvector enabled by default
- Neon - Enable in project settings
- AWS RDS - Install pgvector extension
Verify Installation
sql
SELECT * FROM pg_extension WHERE extname = 'vector';Quick Start
typescript
import { createGraph } from '@graphrag-js/core';
import { lightrag } from '@graphrag-js/lightrag';
import { pgVector } from '@graphrag-js/pgvector';
import { openai } from '@ai-sdk/openai';
const graph = createGraph({
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
provider: lightrag(),
storage: {
vector: pgVector({
host: 'localhost',
port: 5432,
database: 'graphrag',
user: 'postgres',
password: 'password',
}),
}
});
await graph.insert('Your documents...');
const result = await graph.query('Your question?');Configuration
pgVector(config)
typescript
interface PgVectorConfig {
host?: string; // PostgreSQL host (default: 'localhost')
port?: number; // PostgreSQL port (default: 5432)
database?: string; // Database name (default: 'graphrag')
user?: string; // Database user
password?: string; // Database password
}Connection Examples
typescript
// Local instance
pgVector({
host: 'localhost',
port: 5432,
database: 'graphrag',
user: 'postgres',
password: 'password',
})
// Connection string
pgVector({
connectionString: 'postgresql://user:password@localhost:5432/graphrag',
})
// Supabase
pgVector({
host: 'db.xxxyyy.supabase.co',
port: 5432,
database: 'postgres',
user: 'postgres',
password: 'your-supabase-password',
})
// SSL connection
pgVector({
host: 'production.example.com',
port: 5432,
database: 'graphrag',
user: 'app',
password: 'secure-password',
ssl: {
rejectUnauthorized: false,
},
})Usage Examples
Basic Vector Operations
typescript
import { pgVector } from '@graphrag-js/pgvector';
const vectorStore = pgVector({
host: 'localhost',
port: 5432,
database: 'graphrag',
user: 'postgres',
password: 'password',
})('my-namespace');
// Create index
await vectorStore.createIndex({
indexName: 'documents',
dimension: 1536,
metric: 'cosine',
});
// Upsert vectors
await vectorStore.upsert({
indexName: 'documents',
vectors: [[0.1, 0.2, ...], [0.3, 0.4, ...]],
metadata: [
{ text: 'Document 1', category: 'tech' },
{ text: 'Document 2', category: 'science' },
],
ids: ['doc-1', 'doc-2'],
});
// Query with metadata filtering
const results = await vectorStore.query({
queryVector: [0.15, 0.25, ...],
topK: 10,
filter: { category: 'tech' },
});SQL Queries
Direct SQL access for advanced use cases:
typescript
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';
const pool = new Pool({
host: 'localhost',
port: 5432,
database: 'graphrag',
user: 'postgres',
password: 'password',
});
const client = await pool.connect();
await pgvector.registerType(client);
// Vector similarity search
const result = await client.query(`
SELECT id, metadata, vector <=> $1::vector AS distance
FROM my_namespace_documents
WHERE metadata->>'category' = 'tech'
ORDER BY vector <=> $1::vector
LIMIT 10
`, [JSON.stringify(queryVector)]);
client.release();JSONB Metadata Filtering
typescript
// Complex metadata queries
const results = await client.query(`
SELECT id, metadata
FROM my_namespace_documents
WHERE
metadata->>'category' = 'tech'
AND (metadata->>'year')::int >= 2023
AND metadata->'tags' ? 'ai'
ORDER BY vector <=> $1::vector
LIMIT 10
`, [vectorStr]);Distance Operators
pgvector provides three distance operators:
| Operator | Metric | Best For |
|---|---|---|
<-> | L2 (Euclidean) | Non-normalized vectors |
<=> | Cosine | Normalized vectors (default) |
<#> | Inner product | Dot product similarity |
sql
-- Cosine distance (default)
ORDER BY vector <=> query_vector
-- L2 distance
ORDER BY vector <-> query_vector
-- Inner product
ORDER BY vector <#> query_vectorIndexing
IVFFlat Index
For fast approximate similarity search:
sql
-- Create IVFFlat index
CREATE INDEX ON my_namespace_documents
USING ivfflat (vector vector_cosine_ops)
WITH (lists = 100);
-- Available operators
-- vector_l2_ops for L2 distance (<->)
-- vector_ip_ops for inner product (<#>)
-- vector_cosine_ops for cosine distance (<=>)Index Parameters
typescript
// Optimal lists = rows / 1000 (for IVFFlat)
const rows = 100000;
const lists = Math.max(10, Math.floor(rows / 1000));
await client.query(`
CREATE INDEX ON my_namespace_documents
USING ivfflat (vector vector_cosine_ops)
WITH (lists = ${lists})
`);Search Tuning
sql
-- Increase search accuracy (default: 10)
SET ivfflat.probes = 20;
-- Query with custom probes
BEGIN;
SET LOCAL ivfflat.probes = 50;
SELECT * FROM my_namespace_documents
ORDER BY vector <=> query_vector
LIMIT 10;
COMMIT;Performance Optimization
Connection Pooling
typescript
import { Pool } from 'pg';
const pool = new Pool({
host: 'localhost',
port: 5432,
database: 'graphrag',
user: 'postgres',
password: 'password',
max: 20, // Max connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});Batch Inserts
typescript
// Use COPY for bulk loading
await client.query('BEGIN');
const copyStream = client.query(
copyFrom('COPY my_namespace_documents (id, vector, metadata) FROM STDIN')
);
for (const row of rows) {
copyStream.write(`${row.id}\t${row.vector}\t${JSON.stringify(row.metadata)}\n`);
}
copyStream.end();
await client.query('COMMIT');Vacuum and Analyze
sql
-- Reclaim space and update statistics
VACUUM ANALYZE my_namespace_documents;
-- After bulk inserts
VACUUM (ANALYZE) my_namespace_documents;Production Deployment
Docker Compose
yaml
version: '3.8'
services:
postgres:
image: pgvector/pgvector:pg16
ports:
- "5432:5432"
environment:
POSTGRES_DB: graphrag
POSTGRES_USER: postgres
POSTGRES_PASSWORD: secure-password
POSTGRES_INITDB_ARGS: "-E UTF8"
volumes:
- pgdata:/var/lib/postgresql/data
shm_size: 256mb
volumes:
pgdata:Memory Configuration
For production workloads:
bash
# postgresql.conf
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
maintenance_work_mem = 1GB # For index creation
work_mem = 256MB # For query operationsBackup Strategy
bash
# Full backup
pg_dump -U postgres -d graphrag > backup.sql
# Table-specific backup
pg_dump -U postgres -d graphrag -t my_namespace_documents > vectors.sql
# Restore
psql -U postgres -d graphrag < backup.sqlMonitoring
Table Size
sql
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE tablename LIKE 'my_namespace%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;Index Usage
sql
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename LIKE 'my_namespace%';Query Performance
sql
-- Enable timing
\timing on
-- Explain query plan
EXPLAIN ANALYZE
SELECT id, vector <=> $1::vector AS distance
FROM my_namespace_documents
ORDER BY vector <=> $1::vector
LIMIT 10;Troubleshooting
Extension Not Found
Error: extension "vector" is not available
Solution:
- Install pgvector:
sudo apt install postgresql-16-pgvector - Restart PostgreSQL
- Run:
CREATE EXTENSION vector;
Dimension Mismatch
Error: vector must have X dimensions, not Y
Solution:
- Ensure all vectors have consistent dimensions
- OpenAI
text-embedding-3-small: 1536 - OpenAI
text-embedding-3-large: 3072
Slow Queries
Problem: Vector search is slow
Solution:
- Create IVFFlat index
- Increase
ivfflat.probesfor accuracy - Add metadata indexes:
CREATE INDEX ON table ((metadata->>'field')) - Use
EXPLAIN ANALYZEto identify bottlenecks
Cost Considerations
| Deployment | Cost | Best For |
|---|---|---|
| Self-hosted | Free + hosting | Full control |
| Supabase | $25+/month | Managed PostgreSQL |
| Neon | $19+/month | Serverless PostgreSQL |
| AWS RDS | $50+/month | Enterprise |
| Azure PostgreSQL | $40+/month | Azure ecosystem |
Benchmarks
On 1M vectors (1536-dim, 8GB RAM):
| Operation | Latency |
|---|---|
| Insert | 10-20ms |
| Search (k=10, no index) | 500-1000ms |
| Search (k=10, IVFFlat) | 20-50ms |
Next Steps
- Qdrant Storage - Specialized vector database
- Memory Storage - For development
- pgvector Documentation
- PostgreSQL Documentation