Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit1d3fde6

Browse files
committed
Ready to ship
0 parents  commit1d3fde6

File tree

5 files changed

+220
-0
lines changed

5 files changed

+220
-0
lines changed

‎LICENSE

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Copyright (c) 2024 PostgresML Team
2+
3+
Permission is hereby granted, free of charge, to any person obtaining
4+
a copy of this software and associated documentation files (the
5+
"Software"), to deal in the Software without restriction, including
6+
without limitation the rights to use, copy, modify, merge, publish,
7+
distribute, sublicense, and/or sell copies of the Software, and to
8+
permit persons to whom the Software is furnished to do so, subject to
9+
the following conditions:
10+
11+
The above copyright notice and this permission notice shall be
12+
included in all copies or substantial portions of the Software.
13+
14+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

‎README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
#postgresml-django
2+
3+
postgresml-django is a Python module that integrates PostgresML with Django ORM, enabling automatic in-database embedding of Django models. It simplifies the process of creating and searching vector embeddings for your text data.
4+
5+
##Introduction
6+
7+
This module provides a seamless way to:
8+
- Automatically generate in-databse embeddings for specified fields in your Django models
9+
- Perform vector similarity searches in-database
10+
11+
##Installation
12+
13+
1. Ensure you have[pgml](https://github.com/postgresml/postgresml) installed and configured in your database. The easiest way to do that is to sign up for a free serverless database at[postgresml.org](https://postgresml.org). You can also host it your self.
14+
15+
2. Install the package using pip:
16+
17+
```
18+
pip install postgresml-django
19+
```
20+
21+
You are ready to go!
22+
23+
##Usage Examples
24+
25+
###Example 1: Using intfloat/e5-small-v2
26+
27+
This example demonstrates using the`intfloat/e5-small-v2` transformer, which has an embedding size of 384.
28+
29+
```python
30+
from django.dbimport models
31+
from postgresml_djangoimport VectorField, Embed
32+
33+
classDocument(Embed):
34+
text= models.TextField()
35+
text_embedding= VectorField(
36+
field_to_embed="text",
37+
dimensions=384,
38+
transformer="intfloat/e5-small-v2"
39+
)
40+
41+
# Searching
42+
results= Document.vector_search("text_embedding","some query to search against")
43+
```
44+
45+
###Example 2: Using mixedbread-ai/mxbai-embed-large-v1
46+
47+
This example shows how to use the`mixedbread-ai/mxbai-embed-large-v1` transformer, which has an embedding size of 512 and requires specific parameters for recall.
48+
49+
```python
50+
from django.dbimport models
51+
from postgresml_djangoimport VectorField, Embed
52+
53+
classArticle(Embed):
54+
content= models.TextField()
55+
content_embedding= VectorField(
56+
field_to_embed="content",
57+
dimensions=512,
58+
transformer="mixedbread-ai/mxbai-embed-large-v1",
59+
transformer_recall_parameters={
60+
"query":"Represent this sentence for searching relevant passages:"
61+
}
62+
)
63+
64+
# Searching
65+
results= Article.vector_search("content_embedding","search query")
66+
```
67+
68+
Note the differences between the two examples:
69+
1. The`dimensions` parameter is set to 384 for`intfloat/e5-small-v2` and 512 for`mixedbread-ai/mxbai-embed-large-v1`.
70+
2. The`mixedbread-ai/mxbai-embed-large-v1` transformer requires additional parameters for recall, which are specified in the`transformer_recall_parameters` argument.
71+
72+
Both examples will automatically generate embeddings when instances are saved and allow for vector similarity searches using the`vector_search` method.
73+
74+
##Contributing
75+
76+
We welcome contributions to postgresml-django! Whether it's bug reports, feature requests, documentation improvements, or code contributions, your input is valuable to us. Feel free to open issues or submit pull requests on our GitHub repository.

‎pyproject.toml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[project]
2+
name ="postgresml-django"
3+
requires-python =">=3.8"
4+
version ="0.1.0"
5+
description ="PostgresML Django integration that enables automatic embedding of specified fields."
6+
authors = [
7+
{name ="PostgresML",email ="team@postgresml.org"},
8+
]
9+
readme ="README.md"
10+
keywords = ["django","machine learning","vector databases","embeddings"]
11+
classifiers = [
12+
"Programming Language :: Python :: 3",
13+
"License :: OSI Approved :: MIT License",
14+
"Operating System :: OS Independent",
15+
]
16+
dependencies = [
17+
"Django",
18+
"pgvector"
19+
]
20+
21+
[project.urls]
22+
Homepage ="https://postgresml.org"
23+
Repository ="https://github.com/postgresml/postgresml-django"
24+
Documentation ="https://github.com/postgresml/postgresml-django"
25+
26+
[build-system]
27+
requires = ["hatchling"]
28+
build-backend ="hatchling.build"

‎src/postgresml_django/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .mainimport*

‎src/postgresml_django/main.py

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
fromdjango.dbimportmodels
2+
fromdjango.db.modelsimportFunc,Value,F
3+
fromdjango.db.models.functionsimportCast
4+
importpgvector.django
5+
importjson
6+
7+
8+
classGenerateEmbedding(Func):
9+
function="pgml.embed"
10+
template="%(function)s('%(transformer)s', %(expressions)s, '%(parameters)s')"
11+
allowed_default=False
12+
13+
def__init__(self,expression,transformer,parameters={}):
14+
self.transformer=transformer
15+
self.parameters=parameters
16+
super().__init__(expression)
17+
18+
defas_sql(self,compiler,connection,**extra_context):
19+
extra_context["transformer"]=self.transformer
20+
extra_context["parameters"]=json.dumps(self.parameters)
21+
returnsuper().as_sql(compiler,connection,**extra_context)
22+
23+
24+
classEmbed(models.Model):
25+
classMeta:
26+
abstract=True
27+
28+
defsave(self,*args,**kwargs):
29+
update_fields=kwargs.get("update_fields")
30+
31+
# Check for fields to embed
32+
forfieldinself._meta.get_fields():
33+
ifisinstance(field,VectorField):
34+
ifnothasattr(self,field.field_to_embed):
35+
raiseAttributeError(
36+
f"Field to embed does not exist: `{field.field_to_embed}`"
37+
)
38+
39+
# Only embed if it's a new instance, full save, or this field is being updated
40+
ifnotself.pkorupdate_fieldsisNoneorfield.nameinupdate_fields:
41+
value_to_embed=getattr(self,field.field_to_embed)
42+
setattr(
43+
self,
44+
field.name,
45+
GenerateEmbedding(
46+
Value(value_to_embed),
47+
field.transformer,
48+
field.transformer_store_parameters,
49+
),
50+
)
51+
52+
super().save(*args,**kwargs)
53+
54+
@classmethod
55+
defvector_search(
56+
cls,field,query_text,distance_function=pgvector.django.CosineDistance
57+
):
58+
# Get the fields
59+
field_instance=getattr(cls._meta.model,field).field
60+
61+
# Generate an embedding for the text
62+
query_embedding=GenerateEmbedding(
63+
Value(query_text),
64+
"intfloat/e5-small-v2",
65+
field_instance.transformer_recall_parameters,
66+
)
67+
68+
# Return the QuerySet
69+
returncls.objects.annotate(
70+
distance=distance_function(
71+
F(field),
72+
Cast(
73+
query_embedding,
74+
output_field=VectorField(dimensions=field_instance.dimensions),
75+
),
76+
)
77+
).order_by("distance")
78+
79+
80+
classVectorField(pgvector.django.VectorField):
81+
def__init__(
82+
self,
83+
field_to_embed=None,
84+
dimensions=None,
85+
transformer=None,
86+
transformer_store_parameters={},
87+
transformer_recall_parameters={},
88+
*args,
89+
**kwargs,
90+
):
91+
self.field_to_embed=field_to_embed
92+
self.transformer=transformer
93+
self.transformer_store_parameters=transformer_store_parameters
94+
self.transformer_recall_parameters=transformer_recall_parameters
95+
super().__init__(dimensions=dimensions,*args,**kwargs)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp