- Notifications
You must be signed in to change notification settings - Fork3.3k
Hybrid Search
In addition to vectors, Milvus supports data types such as boolean, integers, floating-point numbers, and more. A collection in Milvus can hold multiple fields for accommodating different data features or properties. Milvus is a flexible vector database that pairs scalar filtering with powerful vector similarity search.
A hybrid search is a vector similarity search, during which you can filter the scalar data by specifying aboolean expression.
For example:
In Python
importrandomfrompymilvusimportconnections,Collection,FieldSchema,CollectionSchema,DataType# Connect to serverconnections.connect("default",host='localhost',port='19530')# Create a collectioncollection_name="test_collection_search"schema=CollectionSchema([FieldSchema("film_id",DataType.INT64,is_primary=True),FieldSchema("films",dtype=DataType.FLOAT_VECTOR,dim=2)])collection=Collection(collection_name,schema,using='default',shards_num=2)# Insert some random datadata= [ [iforiinrange(10)], [[random.random()for_inrange(2)]for_inrange(10)],]collection.insert(data)collection.num_entities# Load collection to memorycollection.load()# Conduct a similarity search with an expression filtering ID columnsearch_param= {"data": [[1.0,1.0]],"anns_field":"films","param": {"metric_type":"L2"},"limit":2,"expr":"film_id in [2,4,6,8]"}res=collection.search(**search_param)# Check resultshits=res[0]print(f"- Total hits:{len(hits)}, hits ids:{hits.ids} ")print(f"- Top1 hit id:{hits[0].id}, distance:{hits[0].distance}, score:{hits[0].score} ")
In Node.js
import{MilvusClient}from"@zilliz/milvus2-sdk-node";constmilvusClient=newMilvusClient("localhost:19530");// Prepare a test collectionconstCOLLECTION_NAME="test_collection_search";milvusClient.collectionManager.createCollection({collection_name:COLLECTION_NAME,fields:[{name:"films",description:"vector field",data_type:DataType.FloatVector,type_params:{dim:"2",},},{name:"film_id",data_type:DataType.Int64,autoID:false,is_primary_key:true,description:"",},],});// Insert some random dataletid=1;constentities=Array.from({length:10},()=>({films:Array.from({length:2},()=>Math.random()*10),film_id:id++,}));awaitmilvusClient.collectionManager.insert({collection_name:COLLECTION_NAME,fields_data:entities,});// Load collection to memory & conduct a search with boolean expressionawaitmilvusClient.collectionManager.loadCollection({collection_name:COLLECTION_NAME,});awaitmilvusClient.dataManager.search({collection_name:COLLECTION_NAME,// partition_names: [],expr:"film_id in [1,4,6,8]",vectors:[entities[0].films],search_params:{anns_field:"films",topk:"4",metric_type:"L2",params:JSON.stringify({nprobe:10}),},vector_type:100,// float vector -> 100});// search result will be like:{ status: { error_code: 'Success', reason: '' }, results: [ { score: 0, id: '1' }, { score: 9.266796112060547, id: '4' }, { score: 28.263811111450195, id: '8' }, { score: 41.055686950683594, id: '6' } ]}
A predicate expression outputs a boolean value. Milvus conducts scalar filtering by searching with predicates. A predicate expression, when evaluated, returns either TRUE or FALSE.
EBNF grammar rules describe boolean expressions rules:
Expr = LogicalExpr | NILLogicalExpr = LogicalExpr BinaryLogicalOp LogicalExpr | UnaryLogicalOp LogicalExpr | "(" LogicalExpr ")" | SingleExpr;BinaryLogicalOp = "&&" | "and" | "||" | "or";UnaryLogicalOp = "not";SingleExpr = TermExpr | CompareExpr;TermExpr = IDENTIFIER "in" ConstantArray;Constant = INTEGER | FLOATConstantExpr = Constant | ConstantExpr BinaryArithOp ConstantExpr | UnaryArithOp ConstantExpr; ConstantArray = "[" ConstantExpr { "," ConstantExpr } "]";UnaryArithOp = "+" | "-"BinaryArithOp = "+" | "-" | "*" | "/" | "%" | "**";CompareExpr = IDENTIFIER CmpOp IDENTIFIER | IDENTIFIER CmpOp ConstantExpr | ConstantExpr CmpOp IDENTIFIER | ConstantExpr CmpOpRestricted IDENTIFIER CmpOpRestricted ConstantExpr;CmpOpRestricted = "<" | "<=";CmpOp = ">" | ">=" | "<" | "<=" | "=="| "!=";
The following table lists the description of each symbol mentioned in the above Boolean expression rules:
Notation | Description |
---|---|
= | Definition. |
, | Concatenation. |
; | Termination. |
| | Alternation. |
{...} | Repetition. |
(...) | Grouping. |
NIL | Empty. The expression can be an empty string. |
INTEGER | Integers such as 1, 2, 3. |
FLOAT | Float numbers such as 1.0, 2.0. |
CONST | Integers or float numbers. |
IDENTIFIER | Identifier. In Milvus, the IDENTIFIER represents the field name. |
LogicalOp | A LogicalOp is a logical operator that supports combining more than one relational operation in one comparison. Returned value of a LogicalOp is either TRUE (1) or FALSE (0). There are two types of LogicalOps, including BinaryLogicalOps and UnaryLogicalOps. |
UnaryLogicalOp | UnaryLogicalOp refers to the unary logical operator "not". |
BinaryLogicalOp | Binary logical operators that perform actions on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
ArithmeticOp | An ArithmeticOp, namely an arithmetic operator, performs mathematical operations such as addition and subtraction on operands. |
UnaryArithOp | A UnaryArithOp is an arithmetic operator that performs an operation on a single operand. The negative UnaryArithOp changes a positive expression into a negative one, or the other way round. |
BinaryArithOp | A BinaryArithOp, namely a binary operator, performs operations on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
CmpOp | CmpOp is a relational operator that perform actions on two operands. |
CmpOpRestricted | CmpOpRestricted is restricted to "Less than" and "Equal". |
ConstantExpr | ConstantExpr can be a Constant or a BinaryArithop on two ConstExprs or a UnaryArithOp on a single ConstantExpr. It is defined recursively. |
ConstantArray | ConstantArray is wrapped by square brackets, and ConstantExpr can be repeated in the square brackets. ConstArray must include at least one ConstantExpr. |
TermExpr | TermExpr is used to check whether the value of an IDENTIFIER appears in a ConstantArray. TermExpr is represented by "in". |
CompareExpr | A CompareExpr, namely comparison expression can be relational operations on two IDENTIFIERs, or relational operations on one IDENTIFIER and one ConstantExpr, or ternary operation on two ConstantExprs and one IDENTIFIER. |
SingleExpr | SingleExpr, namely single expression, can be either a TermExpr or a CompareExpr. |
LogicalExpr | A LogicalExpr can be a BinaryLogicalOp on two LogicalExprs, or a UnaryLogicalOp on a single LogicalExpr, or a LogicalExpr grouped within parentheses, or a SingleExpr. The LogicalExpr is defined recursively. |
Expr | Expr, an abbreviation meaning expression, can be LogicalExpr or NIL. |
Logical operators perform a comparison between two expressions.
Symbol | Operation | Example | Description |
---|---|---|---|
'and' && | and | expr1 && expr2 | True if both expr1 and expr2 are true. |
'or' || | or | expr1 || expr2 | True if either expr1 or expr2 are true. |
Binary arithmetic operators contain two operands and can perform basic arithmetic operations and return the corresponding result.
Symbol | Operation | Example | Description |
---|---|---|---|
+ | Addition | a + b | Add the two operands. |
- | Subtraction | a - b | Subtract the second operand from the first operand. |
* | Multiplication | a * b | Multiply the two operands. |
/ | Division | a / b | Divide the first operand by the second operand. |
** | Power | a ** b | Raise the first operand to the power of the second operand. |
% | Modulo | a % b | Divide the first operand by the second operand and yield the remainder portion. |
Relational operators use symbols to check for equality, inequality, or relative order between two expressions.
Symbol | Operation | Example | Description |
---|---|---|---|
< | Less than | a < b | True if a is less than b. |
> | Greater than | a > b | True if a is greater than b. |
== | Equal | a == b | True if a is equal to b. |
!= | Not equal | a != b | True if a is not equal to b. |
<= | Less than or equal | a <= b | True if a is less than or equal to b. |
>= | Greater than or equal | a >= b | True if a is greater than or equal to b. |
The following table lists the precedence and associativity of operators. Operators are listed top to bottom, in descending precedence.
Precedence | Operator | Description | Associativity |
---|---|---|---|
1 | + - | UnaryArithOp | Left-to-right |
2 | not | UnaryLogicOp | Right-to-left |
3 | ** | BinaryArithOp | Left-to-right |
4 | * / % | BinaryArithOp | Left-to-right |
5 | + - | BinaryArithOp | Left-to-right |
6 | < <= > >= | CmpOp | Left-to-right |
7 | == != | CmpOp | Left-to-right |
8 | && and | BinaryLogicOp | Left-to-right |
9 | || or | BinaryLogicOp | Left-to-right |
- Expressions are normally evaluated from left to right. Complex expressions are evaluated one at a time. The order in which the expressions are evaluated is determined by the precedence of the operators used.
- If an expression contains two or more operators with the same precedence, the operator to the left is evaluated first.
- When a lower precedence operation should be processed first, it should be enclosed within parentheses.
- Parentheses can be nested within expressions. Innermost parenthetical expressions are evaluated first.