Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Nearest neighbor search for Rails

License

NotificationsYou must be signed in to change notification settings

ankane/neighbor

Repository files navigation

Nearest neighbor search for Rails

Supports:

  • Postgres (cube and pgvector)
  • MariaDB 11.8
  • MySQL 9 (searching requires HeatWave) - experimental
  • SQLite (sqlite-vec) - experimental

Also available forRedis andS3 Vectors

Build Status

Installation

Add this line to your application’s Gemfile:

gem"neighbor"

For Postgres

Neighbor supports two extensions:cube andpgvector. cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.

For cube, run:

rails generate neighbor:cuberails db:migrate

For pgvector,install the extension and run:

rails generate neighbor:vectorrails db:migrate

For SQLite

Add this line to your application’s Gemfile:

gem"sqlite-vec"

And run:

rails generate neighbor:sqlite

Getting Started

Create a migration

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchange# cubeadd_column:items,:embedding,:cube# pgvector, MariaDB, and MySQLadd_column:items,:embedding,:vector,limit:3# dimensions# sqlite-vecadd_column:items,:embedding,:binaryendend

Add to your model

classItem <ApplicationRecordhas_neighbors:embeddingend

Update the vectors

item.update(embedding:[1.0,1.2,0.5])

Get the nearest neighbors to a record

item.nearest_neighbors(:embedding,distance:"euclidean").first(5)

Get the nearest neighbors to a vector

Item.nearest_neighbors(:embedding,[0.9,1.3,1.1],distance:"euclidean").first(5)

Records returned fromnearest_neighbors will have aneighbor_distance attribute

nearest_item=item.nearest_neighbors(:embedding,distance:"euclidean").firstnearest_item.neighbor_distance

See the additional docs for:

Or check out someexamples

cube

Distance

Supported values are:

  • euclidean
  • cosine
  • taxicab
  • chebyshev

For cosine distance with cube, vectors must be normalized before being stored.

classItem <ApplicationRecordhas_neighbors:embedding,normalize:trueend

For inner product with cube, seethis example.

Dimensions

Thecube type can have up to 100 dimensions by default. See thePostgres docs for how to increase this.

For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.

classItem <ApplicationRecordhas_neighbors:embedding,dimensions:3end

pgvector

Distance

Supported values are:

  • euclidean
  • inner_product
  • cosine
  • taxicab
  • hamming
  • jaccard

Dimensions

Thevector type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.

Thehalfvec type can have up to 16,000 dimensions, and half vectors with up to 4,000 dimensions can be indexed.

Thebit type can have up to 83 million dimensions, and bit vectors with up to 64,000 dimensions can be indexed.

Thesparsevec type can have up to 16,000 non-zero elements, and sparse vectors with up to 1,000 non-zero elements can be indexed.

Indexing

Add an approximate index to speed up queries. Create a migration with:

classAddIndexToItemsEmbedding <ActiveRecord::Migration[8.1]defchangeadd_index:items,:embedding,using::hnsw,opclass::vector_l2_ops# oradd_index:items,:embedding,using::ivfflat,opclass::vector_l2_opsendend

Use:vector_cosine_ops for cosine distance and:vector_ip_ops for inner product.

Set the size of the dynamic candidate list with HNSW

Item.connection.execute("SET hnsw.ef_search = 100")

Or the number of probes with IVFFlat

Item.connection.execute("SET ivfflat.probes = 3")

Half-Precision Vectors

Use thehalfvec type to store half-precision vectors

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchangeadd_column:items,:embedding,:halfvec,limit:3# dimensionsendend

Half-Precision Indexing

Index vectors at half precision for smaller indexes

classAddIndexToItemsEmbedding <ActiveRecord::Migration[8.1]defchangeadd_index:items,"(embedding::halfvec(3)) halfvec_l2_ops",using::hnswendend

Get the nearest neighbors

Item.nearest_neighbors(:embedding,[0.9,1.3,1.1],distance:"euclidean",precision:"half").first(5)

Binary Vectors

Use thebit type to store binary vectors

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchangeadd_column:items,:embedding,:bit,limit:3# dimensionsendend

Get the nearest neighbors by Hamming distance

Item.nearest_neighbors(:embedding,"101",distance:"hamming").first(5)

Binary Quantization

Use expression indexing for binary quantization

classAddIndexToItemsEmbedding <ActiveRecord::Migration[8.1]defchangeadd_index:items,"(binary_quantize(embedding)::bit(3)) bit_hamming_ops",using::hnswendend

Sparse Vectors

Use thesparsevec type to store sparse vectors

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchangeadd_column:items,:embedding,:sparsevec,limit:3# dimensionsendend

Get the nearest neighbors

embedding=Neighbor::SparseVector.new({0=>0.9,1=>1.3,2=>1.1},3)Item.nearest_neighbors(:embedding,embedding,distance:"euclidean").first(5)

MariaDB

Distance

Supported values are:

  • euclidean
  • cosine
  • hamming

Indexing

Vector columns must usenull: false to add a vector index

classCreateItems <ActiveRecord::Migration[8.1]defchangecreate_table:itemsdo |t|t.vector:embedding,limit:3,null:falset.index:embedding,type::vectorendendend

Binary Vectors

Use thebigint type to store binary vectors

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchangeadd_column:items,:embedding,:bigintendend

Note: Binary vectors can have up to 64 dimensions

Get the nearest neighbors by Hamming distance

Item.nearest_neighbors(:embedding,5,distance:"hamming").first(5)

MySQL

Distance

Supported values are:

  • euclidean
  • cosine
  • hamming

Note: TheDISTANCE() function isonly available on HeatWave

Binary Vectors

Use thebinary type to store binary vectors

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchangeadd_column:items,:embedding,:binaryendend

Get the nearest neighbors by Hamming distance

Item.nearest_neighbors(:embedding,"\x05",distance:"hamming").first(5)

sqlite-vec

Distance

Supported values are:

  • euclidean
  • cosine
  • taxicab
  • hamming

Dimensions

For sqlite-vec, it’s a good idea to specify the number of dimensions to ensure all records have the same number.

classItem <ApplicationRecordhas_neighbors:embedding,dimensions:3end

Virtual Tables

You can also usevirtual tables

classAddEmbeddingToItems <ActiveRecord::Migration[8.1]defchange# Rails 8+create_virtual_table:items,:vec0,["id integer PRIMARY KEY AUTOINCREMENT NOT NULL","embedding float[3] distance_metric=L2"]# Rails < 8execute<<~SQL      CREATE VIRTUAL TABLE items USING vec0(        id integer PRIMARY KEY AUTOINCREMENT NOT NULL,        embedding float[3] distance_metric=L2      )    SQLendend

Usedistance_metric=cosine for cosine distance

You can optionally ignore any shadow tables that are created

ActiveRecord::SchemaDumper.ignore_tables +=["items_chunks","items_rowids","items_vector_chunks00"]

Get thek nearest neighbors

Item.where("embedding MATCH ?",[1,2,3].to_s).where(k:5).order(:distance)

Filter by primary key

Item.where(id:[2,3]).where("embedding MATCH ?",[1,2,3].to_s).where(k:5).order(:distance)

Int8 Vectors

Use thetype option for int8 vectors

classItem <ApplicationRecordhas_neighbors:embedding,dimensions:3,type::int8end

Binary Vectors

Use thetype option for binary vectors

classItem <ApplicationRecordhas_neighbors:embedding,dimensions:8,type::bitend

Get the nearest neighbors by Hamming distance

Item.nearest_neighbors(:embedding,"\x05",distance:"hamming").first(5)

Examples

OpenAI Embeddings

Generate a model

rails generate model Document content:text embedding:vector{1536}rails db:migrate

And addhas_neighbors

classDocument <ApplicationRecordhas_neighbors:embeddingend

Create a method to call theembeddings API

defembed(input)url="https://api.openai.com/v1/embeddings"headers={"Authorization"=>"Bearer#{ENV.fetch("OPENAI_API_KEY")}","Content-Type"=>"application/json"}data={input:input,model:"text-embedding-3-small"}response=Net::HTTP.post(URI(url),data.to_json,headers).tap(&:value)JSON.parse(response.body)["data"].map{ |v|v["embedding"]}end

Pass your input

input=["The dog is barking","The cat is purring","The bear is growling"]embeddings=embed(input)

Store the embeddings

documents=[]input.zip(embeddings)do |content,embedding|documents <<{content:content,embedding:embedding}endDocument.insert_all!(documents)

And get similar documents

document=Document.firstdocument.nearest_neighbors(:embedding,distance:"cosine").first(5).map(&:content)

See thecomplete code

Cohere Embeddings

Generate a model

rails generate model Document content:text embedding:bit{1536}rails db:migrate

And addhas_neighbors

classDocument <ApplicationRecordhas_neighbors:embeddingend

Create a method to call theembed API

defembed(input,input_type)url="https://api.cohere.com/v2/embed"headers={"Authorization"=>"Bearer#{ENV.fetch("CO_API_KEY")}","Content-Type"=>"application/json"}data={texts:input,model:"embed-v4.0",input_type:input_type,embedding_types:["ubinary"]}response=Net::HTTP.post(URI(url),data.to_json,headers).tap(&:value)JSON.parse(response.body)["embeddings"]["ubinary"].map{ |e|e.map{ |v|v.chr.unpack1("B*")}.join}end

Pass your input

input=["The dog is barking","The cat is purring","The bear is growling"]embeddings=embed(input,"search_document")

Store the embeddings

documents=[]input.zip(embeddings)do |content,embedding|documents <<{content:content,embedding:embedding}endDocument.insert_all!(documents)

Embed the search query

query="forest"query_embedding=embed([query],"search_query")[0]

And search the documents

Document.nearest_neighbors(:embedding,query_embedding,distance:"hamming").first(5).map(&:content)

See thecomplete code

Sentence Embeddings

You can generate embeddings locally withInformers.

Generate a model

rails generate model Document content:text embedding:vector{384}rails db:migrate

And addhas_neighbors

classDocument <ApplicationRecordhas_neighbors:embeddingend

Load amodel

model=Informers.pipeline("embedding","sentence-transformers/all-MiniLM-L6-v2")

Pass your input

input=["The dog is barking","The cat is purring","The bear is growling"]embeddings=model.(input)

Store the embeddings

documents=[]input.zip(embeddings)do |content,embedding|documents <<{content:content,embedding:embedding}endDocument.insert_all!(documents)

And get similar documents

document=Document.firstdocument.nearest_neighbors(:embedding,distance:"cosine").first(5).map(&:content)

See thecomplete code

Hybrid Search

You can use Neighbor for hybrid search withInformers.

Generate a model

rails generate model Document content:text embedding:vector{768}rails db:migrate

And addhas_neighbors and a scope for keyword search

classDocument <ApplicationRecordhas_neighbors:embeddingscope:search,->(query){where("to_tsvector(content) @@ plainto_tsquery(?)",query).order(Arel.sql("ts_rank_cd(to_tsvector(content), plainto_tsquery(?)) DESC",query))}end

Create some documents

Document.create!(content:"The dog is barking")Document.create!(content:"The cat is purring")Document.create!(content:"The bear is growling")

Generate an embedding for each document

embed=Informers.pipeline("embedding","Snowflake/snowflake-arctic-embed-m-v1.5")embed_options={model_output:"sentence_embedding",pooling:"none"}# specific to embedding modelDocument.find_eachdo |document|embedding=embed.(document.content, **embed_options)document.update!(embedding:embedding)end

Perform keyword search

query="growling bear"keyword_results=Document.search(query).limit(20).load_async

And semantic search in parallel (the query prefix is specific to theembedding model)

query_prefix="Represent this sentence for searching relevant passages: "query_embedding=embed.(query_prefix +query, **embed_options)semantic_results=Document.nearest_neighbors(:embedding,query_embedding,distance:"cosine").limit(20).load_async

To combine the results, use Reciprocal Rank Fusion (RRF)

Neighbor::Reranking.rrf(keyword_results,semantic_results).first(5)

Or a reranking model

rerank=Informers.pipeline("reranking","mixedbread-ai/mxbai-rerank-xsmall-v1")results=(keyword_results +semantic_results).uniqrerank.(query,results.map(&:content)).first(5).map{ |v|results[v[:doc_id]]}

See thecomplete code

Sparse Search

You can generate sparse embeddings locally withTransformers.rb.

Generate a model

rails generate model Document content:text embedding:sparsevec{30522}rails db:migrate

And addhas_neighbors

classDocument <ApplicationRecordhas_neighbors:embeddingend

Load amodel to generate embeddings

classEmbeddingModeldefinitialize(model_id)@model=Transformers::AutoModelForMaskedLM.from_pretrained(model_id)@tokenizer=Transformers::AutoTokenizer.from_pretrained(model_id)@special_token_ids=@tokenizer.special_tokens_map.map{ |_,token|@tokenizer.vocab[token]}enddefembed(input)feature=@tokenizer.(input,padding:true,truncation:true,return_tensors:"pt",return_token_type_ids:false)output=@model.(**feature)[0]values=Torch.max(output *feature[:attention_mask].unsqueeze(-1),dim:1)[0]values=Torch.log(1 +Torch.relu(values))values[0..,@special_token_ids]=0values.to_aendendmodel=EmbeddingModel.new("opensearch-project/opensearch-neural-sparse-encoding-v1")

Pass your input

input=["The dog is barking","The cat is purring","The bear is growling"]embeddings=model.embed(input)

Store the embeddings

documents=[]input.zip(embeddings)do |content,embedding|documents <<{content:content,embedding:Neighbor::SparseVector.new(embedding)}endDocument.insert_all!(documents)

Embed the search query

query="forest"query_embedding=model.embed([query])[0]

And search the documents

Document.nearest_neighbors(:embedding,Neighbor::SparseVector.new(query_embedding),distance:"inner_product").first(5).map(&:content)

See thecomplete code

Disco Recommendations

You can use Neighbor for online item-based recommendations withDisco. We’ll use MovieLens data for this example.

Generate a model

rails generate model Movie name:string factors:cuberails db:migrate

And addhas_neighbors

classMovie <ApplicationRecordhas_neighbors:factors,dimensions:20,normalize:trueend

Fit the recommender

data=Disco.load_movielensrecommender=Disco::Recommender.new(factors:20)recommender.fit(data)

Store the item factors

movies=[]recommender.item_ids.eachdo |item_id|movies <<{name:item_id,factors:recommender.item_factors(item_id)}endMovie.create!(movies)

And get similar movies

movie=Movie.find_by(name:"Star Wars (1977)")movie.nearest_neighbors(:factors,distance:"cosine").first(5).map(&:name)

See the complete code forcube andpgvector

History

View thechangelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/neighbor.gitcd neighborbundle install# Postgrescreatedb neighbor_testbundleexec rake test:postgresql# SQLitebundleexec rake test:sqlite# MariaDBdocker run -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 -e MARIADB_DATABASE=neighbor_test -p 3307:3306 mariadb:11.8bundleexec rake test:mariadb# MySQLdocker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 -e MYSQL_DATABASE=neighbor_test -p 3306:3306 mysql:9bundleexec rake test:mysql

[8]ページ先頭

©2009-2025 Movatter.jp