pgvector

Open-source vector similarity search for Postgres

Visit WebsiteView on GitHub
20.5k
Stars
+1706
Stars/month
0
Releases (6m)

Overview

pgvector is an open-source PostgreSQL extension that brings vector similarity search capabilities directly into your relational database. It allows you to store and query high-dimensional vectors alongside your regular data, supporting exact and approximate nearest neighbor search with multiple distance metrics including L2 distance, inner product, cosine distance, and Hamming distance. The extension supports various vector types from single-precision to binary and sparse vectors, making it versatile for different AI and machine learning applications. What makes pgvector particularly valuable is that it maintains all of PostgreSQL's core strengths - ACID compliance, point-in-time recovery, JOINs, and robust transaction support - while adding vector search capabilities. This means you can perform complex queries that combine vector similarity with traditional SQL operations, eliminating the need for separate vector databases in many scenarios. With over 20,000 GitHub stars, pgvector has become the de facto standard for vector search in PostgreSQL environments. It supports any programming language with a Postgres client and works with Postgres 13+, offering broad compatibility across different development stacks and deployment scenarios.

Pros

  • + Native PostgreSQL integration preserves ACID compliance, transactions, and allows complex JOINs between vector and relational data
  • + Supports multiple vector types (single/half-precision, binary, sparse) and distance metrics (L2, cosine, inner product, Hamming, Jaccard)
  • + Wide ecosystem compatibility with any language that has a Postgres client and available through multiple installation methods

Cons

  • - Requires PostgreSQL expertise and may have steeper learning curve compared to dedicated vector databases
  • - Installation complexity varies by platform, especially on Windows systems
  • - Performance may not match specialized vector databases for very large-scale vector workloads

Use Cases

Getting Started

1. Install pgvector extension (compile from source, use package manager, or choose hosted provider). 2. Enable the extension in your database with 'CREATE EXTENSION vector;' and create a table with vector column like 'CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));'. 3. Insert vector data with 'INSERT INTO items (embedding) VALUES ('[1,2,3]');' and query nearest neighbors using 'SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;'.