pgvector/pgvector-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork81
Star1.3k

pgvector support for Python

License

MIT license

1.3k stars 81 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 610 Commits
.github/workflows		.github/workflows
examples		examples
pgvector		pgvector
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

pgvector-python

pgvector support for Python

SupportsDjango,SQLAlchemy,SQLModel,Psycopg 3,Psycopg 2,asyncpg,pg8000, andPeewee

Installation

Run:

pip install pgvector

And follow the instructions for your database library:

Or check out some examples:

Retrieval-augmented generation with Ollama
Embeddings with OpenAI
Binary embeddings with Cohere
Sentence embeddings with SentenceTransformers
Hybrid search with SentenceTransformers (Reciprocal Rank Fusion)
Hybrid search with SentenceTransformers (cross-encoder)
Sparse search with Transformers
Late interaction search with ColBERT
Visual document retrieval with ColPali
Image search with PyTorch
Image search with perceptual hashing
Morgan fingerprints with RDKit
Topic modeling with Gensim
Implicit feedback recommendations with Implicit
Explicit feedback recommendations with Surprise
Recommendations with LightFM
Horizontal scaling with Citus
Bulk loading withCOPY

Django

Create a migration to enable the extension

frompgvector.djangoimportVectorExtensionclassMigration(migrations.Migration):operations= [VectorExtension()    ]

Add a vector field to your model

frompgvector.djangoimportVectorFieldclassItem(models.Model):embedding=VectorField(dimensions=3)

Also supportsHalfVectorField,BitField, andSparseVectorField

Insert a vector

item=Item(embedding=[1,2,3])item.save()

Get the nearest neighbors to a vector

frompgvector.djangoimportL2DistanceItem.objects.order_by(L2Distance('embedding', [3,1,2]))[:5]

Also supportsMaxInnerProduct,CosineDistance,L1Distance,HammingDistance, andJaccardDistance

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3,1,2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3,1,2])).filter(distance__lt=5)

Average vectors

fromdjango.db.modelsimportAvgItem.objects.aggregate(Avg('embedding'))

Also supportsSum

Add an approximate index

frompgvector.djangoimportHnswIndex,IvfflatIndexclassItem(models.Model):classMeta:indexes= [HnswIndex(name='my_index',fields=['embedding'],m=16,ef_construction=64,opclasses=['vector_l2_ops']            ),# orIvfflatIndex(name='my_index',fields=['embedding'],lists=100,opclasses=['vector_l2_ops']            )        ]

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

Half-Precision Indexing

Index vectors at half-precision

fromdjango.contrib.postgres.indexesimportOpClassfromdjango.db.models.functionsimportCastfrompgvector.djangoimportHnswIndex,HalfVectorFieldclassItem(models.Model):classMeta:indexes= [HnswIndex(OpClass(Cast('embedding',HalfVectorField(dimensions=3)),name='halfvec_l2_ops'),name='my_index',m=16,ef_construction=64            )        ]

Note: Add'django.contrib.postgres' toINSTALLED_APPS to useOpClass

Get the nearest neighbors

distance=L2Distance(Cast('embedding',HalfVectorField(dimensions=3)), [3,1,2])Item.objects.order_by(distance)[:5]

SQLAlchemy

Enable the extension

session.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))

Add a vector column

frompgvector.sqlalchemyimportVectorclassItem(Base):embedding=mapped_column(Vector(3))

Also supportsHALFVEC,BIT, andSPARSEVEC

Insert a vector

item=Item(embedding=[1,2,3])session.add(item)session.commit()

Get the nearest neighbors to a vector

session.scalars(select(Item).order_by(Item.embedding.l2_distance([3,1,2])).limit(5))

Also supportsmax_inner_product,cosine_distance,l1_distance,hamming_distance, andjaccard_distance

Get the distance

session.scalars(select(Item.embedding.l2_distance([3,1,2])))

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3,1,2])<5))

Average vectors

frompgvector.sqlalchemyimportavgsession.scalars(select(avg(Item.embedding))).first()

Also supportssum

Add an approximate index

index=Index('my_index',Item.embedding,postgresql_using='hnsw',postgresql_with={'m':16,'ef_construction':64},postgresql_ops={'embedding':'vector_l2_ops'})# orindex=Index('my_index',Item.embedding,postgresql_using='ivfflat',postgresql_with={'lists':100},postgresql_ops={'embedding':'vector_l2_ops'})index.create(engine)

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

Half-Precision Indexing

Index vectors at half-precision

frompgvector.sqlalchemyimportHALFVECfromsqlalchemy.sqlimportfuncindex=Index('my_index',func.cast(Item.embedding,HALFVEC(3)).label('embedding'),postgresql_using='hnsw',postgresql_with={'m':16,'ef_construction':64},postgresql_ops={'embedding':'halfvec_l2_ops'})

Get the nearest neighbors

order=func.cast(Item.embedding,HALFVEC(3)).l2_distance([3,1,2])session.scalars(select(Item).order_by(order).limit(5))

Arrays

Add an array column

frompgvector.sqlalchemyimportVectorfromsqlalchemyimportARRAYclassItem(Base):embeddings=mapped_column(ARRAY(Vector(3)))

And register the types with the underlying driver

For Psycopg 3, use

frompgvector.psycopgimportregister_vectorfromsqlalchemyimportevent@event.listens_for(engine,"connect")defconnect(dbapi_connection,connection_record):register_vector(dbapi_connection)

Forasync connections with Psycopg 3, use

frompgvector.psycopgimportregister_vector_asyncfromsqlalchemyimportevent@event.listens_for(engine.sync_engine,"connect")defconnect(dbapi_connection,connection_record):dbapi_connection.run_async(register_vector_async)

For Psycopg 2, use

frompgvector.psycopg2importregister_vectorfromsqlalchemyimportevent@event.listens_for(engine,"connect")defconnect(dbapi_connection,connection_record):register_vector(dbapi_connection,arrays=True)

SQLModel

Enable the extension

session.exec(text('CREATE EXTENSION IF NOT EXISTS vector'))

Add a vector column

frompgvector.sqlalchemyimportVectorclassItem(SQLModel,table=True):embedding:Any=Field(sa_type=Vector(3))

Also supportsHALFVEC,BIT, andSPARSEVEC

Insert a vector

item=Item(embedding=[1,2,3])session.add(item)session.commit()

Get the nearest neighbors to a vector

session.exec(select(Item).order_by(Item.embedding.l2_distance([3,1,2])).limit(5))

Also supportsmax_inner_product,cosine_distance,l1_distance,hamming_distance, andjaccard_distance

Get the distance

session.exec(select(Item.embedding.l2_distance([3,1,2])))

Get items within a certain distance

session.exec(select(Item).filter(Item.embedding.l2_distance([3,1,2])<5))

Average vectors

frompgvector.sqlalchemyimportavgsession.exec(select(avg(Item.embedding))).first()

Also supportssum

Add an approximate index

fromsqlmodelimportIndexindex=Index('my_index',Item.embedding,postgresql_using='hnsw',postgresql_with={'m':16,'ef_construction':64},postgresql_ops={'embedding':'vector_l2_ops'})# orindex=Index('my_index',Item.embedding,postgresql_using='ivfflat',postgresql_with={'lists':100},postgresql_ops={'embedding':'vector_l2_ops'})index.create(engine)

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

Psycopg 3

Enable the extension

conn.execute('CREATE EXTENSION IF NOT EXISTS vector')

frompgvector.psycopgimportregister_vectorregister_vector(conn)

Forconnection pools, use

defconfigure(conn):register_vector(conn)pool=ConnectionPool(...,configure=configure)

Forasync connections, use

frompgvector.psycopgimportregister_vector_asyncawaitregister_vector_async(conn)

Create a table

conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding=np.array([1,2,3])conn.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Add an approximate index

conn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orconn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

Psycopg 2

Enable the extension

cur=conn.cursor()cur.execute('CREATE EXTENSION IF NOT EXISTS vector')

frompgvector.psycopg2importregister_vectorregister_vector(conn)

Create a table

cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding=np.array([1,2,3])cur.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))

Get the nearest neighbors to a vector

cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))cur.fetchall()

Add an approximate index

cur.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orcur.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

asyncpg

Enable the extension

awaitconn.execute('CREATE EXTENSION IF NOT EXISTS vector')

frompgvector.asyncpgimportregister_vectorawaitregister_vector(conn)

or your pool

asyncdefinit(conn):awaitregister_vector(conn)pool=awaitasyncpg.create_pool(...,init=init)

Create a table

awaitconn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding=np.array([1,2,3])awaitconn.execute('INSERT INTO items (embedding) VALUES ($1)',embedding)

Get the nearest neighbors to a vector

awaitconn.fetch('SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5',embedding)

Add an approximate index

awaitconn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orawaitconn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

pg8000

Enable the extension

conn.run('CREATE EXTENSION IF NOT EXISTS vector')

frompgvector.pg8000importregister_vectorregister_vector(conn)

Create a table

conn.run('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding=np.array([1,2,3])conn.run('INSERT INTO items (embedding) VALUES (:embedding)',embedding=embedding)

Get the nearest neighbors to a vector

conn.run('SELECT * FROM items ORDER BY embedding <-> :embedding LIMIT 5',embedding=embedding)

Add an approximate index

conn.run('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')# orconn.run('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

Peewee

Add a vector column

frompgvector.peeweeimportVectorFieldclassItem(BaseModel):embedding=VectorField(dimensions=3)

Also supportsHalfVectorField,FixedBitField, andSparseVectorField

Insert a vector

item=Item.create(embedding=[1,2,3])

Get the nearest neighbors to a vector

Item.select().order_by(Item.embedding.l2_distance([3,1,2])).limit(5)

Also supportsmax_inner_product,cosine_distance,l1_distance,hamming_distance, andjaccard_distance

Get the distance

Item.select(Item.embedding.l2_distance([3,1,2]).alias('distance'))

Get items within a certain distance

Item.select().where(Item.embedding.l2_distance([3,1,2])<5)

Average vectors

frompeeweeimportfnItem.select(fn.avg(Item.embedding).coerce(True)).scalar()

Also supportssum

Add an approximate index

Item.add_index('embedding vector_l2_ops',using='hnsw')

Usevector_ip_ops for inner product andvector_cosine_ops for cosine distance

Reference

Half Vectors

Create a half vector from a list

vec=HalfVector([1,2,3])

Or a NumPy array

vec=HalfVector(np.array([1,2,3]))

Get a list

lst=vec.to_list()

Get a NumPy array

arr=vec.to_numpy()

Sparse Vectors

Create a sparse vector from a list

vec=SparseVector([1,0,2,0,3,0])

Or a NumPy array

vec=SparseVector(np.array([1,0,2,0,3,0]))

Or a SciPy sparse array

arr=coo_array(([1,2,3], ([0,2,4],)),shape=(6,))vec=SparseVector(arr)

Or a dictionary of non-zero elements

vec=SparseVector({0:1,2:2,4:3},6)

Note: Indices start at 0

Get the number of dimensions

dim=vec.dimensions()

Get the indices of non-zero elements

indices=vec.indices()

Get the values of non-zero elements

values=vec.values()

Get a list

lst=vec.to_list()

Get a NumPy array

arr=vec.to_numpy()

Get a SciPy sparse array

arr=vec.to_coo()

History

View thechangelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs andsubmit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/pgvector/pgvector-python.gitcd pgvector-pythonpip install -r requirements.txtcreatedb pgvector_python_testpytest

To run an example:

cd examples/loadingpip install -r requirements.txtcreatedb pgvector_examplepython3 example.py

About

pgvector support for Python

Releases

24tags

Packages

No packages published

Movatterモバイル変換

License

pgvector/pgvector-python

Folders and files

Latest commit

History

Repository files navigation

pgvector-python

Installation

Django

Half-Precision Indexing

SQLAlchemy

Half-Precision Indexing

Arrays

SQLModel

Psycopg 3

Psycopg 2

asyncpg

pg8000

Peewee

Reference

Half Vectors

Sparse Vectors

History

Contributing

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors9

Languages

Packages