- Notifications
You must be signed in to change notification settings - Fork81
pgvector support for Python
License
pgvector/pgvector-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
pgvector support for Python
SupportsDjango,SQLAlchemy,SQLModel,Psycopg 3,Psycopg 2,asyncpg,pg8000, andPeewee
Run:
pip install pgvector
And follow the instructions for your database library:
Or check out some examples:
- Retrieval-augmented generation with Ollama
- Embeddings with OpenAI
- Binary embeddings with Cohere
- Sentence embeddings with SentenceTransformers
- Hybrid search with SentenceTransformers (Reciprocal Rank Fusion)
- Hybrid search with SentenceTransformers (cross-encoder)
- Sparse search with Transformers
- Late interaction search with ColBERT
- Visual document retrieval with ColPali
- Image search with PyTorch
- Image search with perceptual hashing
- Morgan fingerprints with RDKit
- Topic modeling with Gensim
- Implicit feedback recommendations with Implicit
- Explicit feedback recommendations with Surprise
- Recommendations with LightFM
- Horizontal scaling with Citus
- Bulk loading with
COPY
Create a migration to enable the extension
frompgvector.djangoimportVectorExtensionclassMigration(migrations.Migration):operations= [VectorExtension() ]
Add a vector field to your model
frompgvector.djangoimportVectorFieldclassItem(models.Model):embedding=VectorField(dimensions=3)
Also supportsHalfVectorField
,BitField
, andSparseVectorField
Insert a vector
item=Item(embedding=[1,2,3])item.save()
Get the nearest neighbors to a vector
frompgvector.djangoimportL2DistanceItem.objects.order_by(L2Distance('embedding', [3,1,2]))[:5]
Also supportsMaxInnerProduct
,CosineDistance
,L1Distance
,HammingDistance
, andJaccardDistance
Get the distance
Item.objects.annotate(distance=L2Distance('embedding', [3,1,2]))
Get items within a certain distance
Item.objects.alias(distance=L2Distance('embedding', [3,1,2])).filter(distance__lt=5)
Average vectors
fromdjango.db.modelsimportAvgItem.objects.aggregate(Avg('embedding'))
Also supportsSum
Add an approximate index
frompgvector.djangoimportHnswIndex,IvfflatIndexclassItem(models.Model):classMeta:indexes= [HnswIndex(name='my_index',fields=['embedding'],m=16,ef_construction=64,opclasses=['vector_l2_ops'] ),# orIvfflatIndex(name='my_index',fields=['embedding'],lists=100,opclasses=['vector_l2_ops'] ) ]
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Index vectors at half-precision
fromdjango.contrib.postgres.indexesimportOpClassfromdjango.db.models.functionsimportCastfrompgvector.djangoimportHnswIndex,HalfVectorFieldclassItem(models.Model):classMeta:indexes= [HnswIndex(OpClass(Cast('embedding',HalfVectorField(dimensions=3)),name='halfvec_l2_ops'),name='my_index',m=16,ef_construction=64 ) ]
Note: Add'django.contrib.postgres'
toINSTALLED_APPS
to useOpClass
Get the nearest neighbors
distance=L2Distance(Cast('embedding',HalfVectorField(dimensions=3)), [3,1,2])Item.objects.order_by(distance)[:5]
Enable the extension
session.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
frompgvector.sqlalchemyimportVectorclassItem(Base):embedding=mapped_column(Vector(3))
Also supportsHALFVEC
,BIT
, andSPARSEVEC
Insert a vector
item=Item(embedding=[1,2,3])session.add(item)session.commit()
Get the nearest neighbors to a vector
session.scalars(select(Item).order_by(Item.embedding.l2_distance([3,1,2])).limit(5))
Also supportsmax_inner_product
,cosine_distance
,l1_distance
,hamming_distance
, andjaccard_distance
Get the distance
session.scalars(select(Item.embedding.l2_distance([3,1,2])))
Get items within a certain distance
session.scalars(select(Item).filter(Item.embedding.l2_distance([3,1,2])<5))
Average vectors
frompgvector.sqlalchemyimportavgsession.scalars(select(avg(Item.embedding))).first()
Also supportssum
Add an approximate index
index=Index('my_index',Item.embedding,postgresql_using='hnsw',postgresql_with={'m':16,'ef_construction':64},postgresql_ops={'embedding':'vector_l2_ops'})# orindex=Index('my_index',Item.embedding,postgresql_using='ivfflat',postgresql_with={'lists':100},postgresql_ops={'embedding':'vector_l2_ops'})index.create(engine)
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Index vectors at half-precision
frompgvector.sqlalchemyimportHALFVECfromsqlalchemy.sqlimportfuncindex=Index('my_index',func.cast(Item.embedding,HALFVEC(3)).label('embedding'),postgresql_using='hnsw',postgresql_with={'m':16,'ef_construction':64},postgresql_ops={'embedding':'halfvec_l2_ops'})
Get the nearest neighbors
order=func.cast(Item.embedding,HALFVEC(3)).l2_distance([3,1,2])session.scalars(select(Item).order_by(order).limit(5))
Add an array column
frompgvector.sqlalchemyimportVectorfromsqlalchemyimportARRAYclassItem(Base):embeddings=mapped_column(ARRAY(Vector(3)))
And register the types with the underlying driver
For Psycopg 3, use
frompgvector.psycopgimportregister_vectorfromsqlalchemyimportevent@event.listens_for(engine,"connect")defconnect(dbapi_connection,connection_record):register_vector(dbapi_connection)
Forasync connections with Psycopg 3, use
frompgvector.psycopgimportregister_vector_asyncfromsqlalchemyimportevent@event.listens_for(engine.sync_engine,"connect")defconnect(dbapi_connection,connection_record):dbapi_connection.run_async(register_vector_async)
For Psycopg 2, use
frompgvector.psycopg2importregister_vectorfromsqlalchemyimportevent@event.listens_for(engine,"connect")defconnect(dbapi_connection,connection_record):register_vector(dbapi_connection,arrays=True)
Enable the extension
session.exec(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
frompgvector.sqlalchemyimportVectorclassItem(SQLModel,table=True):embedding:Any=Field(sa_type=Vector(3))
Also supportsHALFVEC
,BIT
, andSPARSEVEC
Insert a vector
item=Item(embedding=[1,2,3])session.add(item)session.commit()
Get the nearest neighbors to a vector
session.exec(select(Item).order_by(Item.embedding.l2_distance([3,1,2])).limit(5))
Also supportsmax_inner_product
,cosine_distance
,l1_distance
,hamming_distance
, andjaccard_distance
Get the distance
session.exec(select(Item.embedding.l2_distance([3,1,2])))
Get items within a certain distance
session.exec(select(Item).filter(Item.embedding.l2_distance([3,1,2])<5))
Average vectors
frompgvector.sqlalchemyimportavgsession.exec(select(avg(Item.embedding))).first()
Also supportssum
Add an approximate index
fromsqlmodelimportIndexindex=Index('my_index',Item.embedding,postgresql_using='hnsw',postgresql_with={'m':16,'ef_construction':64},postgresql_ops={'embedding':'vector_l2_ops'})# orindex=Index('my_index',Item.embedding,postgresql_using='ivfflat',postgresql_with={'lists':100},postgresql_ops={'embedding':'vector_l2_ops'})index.create(engine)
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Enable the extension
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the types with your connection
frompgvector.psycopgimportregister_vectorregister_vector(conn)
Forconnection pools, use
defconfigure(conn):register_vector(conn)pool=ConnectionPool(...,configure=configure)
Forasync connections, use
frompgvector.psycopgimportregister_vector_asyncawaitregister_vector_async(conn)
Create a table
conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding=np.array([1,2,3])conn.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()
Add an approximate index
conn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orconn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Enable the extension
cur=conn.cursor()cur.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the types with your connection or cursor
frompgvector.psycopg2importregister_vectorregister_vector(conn)
Create a table
cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding=np.array([1,2,3])cur.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))cur.fetchall()
Add an approximate index
cur.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orcur.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Enable the extension
awaitconn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the types with your connection
frompgvector.asyncpgimportregister_vectorawaitregister_vector(conn)
or your pool
asyncdefinit(conn):awaitregister_vector(conn)pool=awaitasyncpg.create_pool(...,init=init)
Create a table
awaitconn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding=np.array([1,2,3])awaitconn.execute('INSERT INTO items (embedding) VALUES ($1)',embedding)
Get the nearest neighbors to a vector
awaitconn.fetch('SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5',embedding)
Add an approximate index
awaitconn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orawaitconn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Enable the extension
conn.run('CREATE EXTENSION IF NOT EXISTS vector')
Register the types with your connection
frompgvector.pg8000importregister_vectorregister_vector(conn)
Create a table
conn.run('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding=np.array([1,2,3])conn.run('INSERT INTO items (embedding) VALUES (:embedding)',embedding=embedding)
Get the nearest neighbors to a vector
conn.run('SELECT * FROM items ORDER BY embedding <-> :embedding LIMIT 5',embedding=embedding)
Add an approximate index
conn.run('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orconn.run('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Add a vector column
frompgvector.peeweeimportVectorFieldclassItem(BaseModel):embedding=VectorField(dimensions=3)
Also supportsHalfVectorField
,FixedBitField
, andSparseVectorField
Insert a vector
item=Item.create(embedding=[1,2,3])
Get the nearest neighbors to a vector
Item.select().order_by(Item.embedding.l2_distance([3,1,2])).limit(5)
Also supportsmax_inner_product
,cosine_distance
,l1_distance
,hamming_distance
, andjaccard_distance
Get the distance
Item.select(Item.embedding.l2_distance([3,1,2]).alias('distance'))
Get items within a certain distance
Item.select().where(Item.embedding.l2_distance([3,1,2])<5)
Average vectors
frompeeweeimportfnItem.select(fn.avg(Item.embedding).coerce(True)).scalar()
Also supportssum
Add an approximate index
Item.add_index('embedding vector_l2_ops',using='hnsw')
Usevector_ip_ops
for inner product andvector_cosine_ops
for cosine distance
Create a half vector from a list
vec=HalfVector([1,2,3])
Or a NumPy array
vec=HalfVector(np.array([1,2,3]))
Get a list
lst=vec.to_list()
Get a NumPy array
arr=vec.to_numpy()
Create a sparse vector from a list
vec=SparseVector([1,0,2,0,3,0])
Or a NumPy array
vec=SparseVector(np.array([1,0,2,0,3,0]))
Or a SciPy sparse array
arr=coo_array(([1,2,3], ([0,2,4],)),shape=(6,))vec=SparseVector(arr)
Or a dictionary of non-zero elements
vec=SparseVector({0:1,2:2,4:3},6)
Note: Indices start at 0
Get the number of dimensions
dim=vec.dimensions()
Get the indices of non-zero elements
indices=vec.indices()
Get the values of non-zero elements
values=vec.values()
Get a list
lst=vec.to_list()
Get a NumPy array
arr=vec.to_numpy()
Get a SciPy sparse array
arr=vec.to_coo()
View thechangelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs andsubmit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/pgvector/pgvector-python.gitcd pgvector-pythonpip install -r requirements.txtcreatedb pgvector_python_testpytest
To run an example:
cd examples/loadingpip install -r requirements.txtcreatedb pgvector_examplepython3 example.py
About
pgvector support for Python
Resources
License
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.