Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Cpp engine of DingoDB Expression.

License

NotificationsYou must be signed in to change notification settings

dingodb/dingo-libexpr

Repository files navigation

Build

This project is the coprocessor part ofDingo Expression, written in C++. The project is made standalone to be easily integrated into other C++ projects.

Getting Started

#include"expr/runner.h"usingnamespacedingodb::expr;

Assumebytes is an array of typeexpr::Byte[size], which contains the encoded expression

Runner runner;runner.Decode(bytes, size);runner.Run();Operand result = runner.Get();

Operand is a class to hold any supported data types. AnOperand object can be created from any supported typeT by the constructorOperand<T>(T value), and the value contained in anOperand object can be retrieved by

TOperand::GetValue()const;

Operand can also hold aNULL value, and can be tested by the overloaded "equals/non-equals" operator, likeoperand == nullptr oroperand != nullptr.

Variables

Dingo Expression Coprocessor supports variables. Internally, variables are indexed by integers, start from 0. If the expression contains variables, aTuple must be bind to theRunner before run

Tuple tuple{1,"Alice",100.0};runner.BindTuple(&tuple);

whereTuple is a type alias ofstd::vector<Operand>. The variables indexed by 0, 1, 2 would be assigned the 1st, 2nd, 3rd values of thetuple, respectively.

Note:

  • Therunner does not take ownership of thetuple, which should be released by the caller if necessary
  • The number of elemnets in thetuple is not checked when retrieving the value and it is the caller's duty to ensure the index of variables are valid

Relational Algebra

Dingo Expression Coprocessor has also limited implementation for relational algebra. To use it, add the following to the source code

#include"rel/rel_runner.h"usingnamespacedingodb::expr;usingnamespacedingodb::rel;

Assumebytes is an array of typeexpr::Byte[size], which contains the encoded relational algebra

auto *rel =new RelRunner();rel->Decode(buf, size);

ThenTuples can be put into theRelRunner

for (int i =0; i < tuples.size(); ++i) {constauto *output = rel->Put(tuples[i]);}

Theoutput returned is anotherTuple ornullptr. The former is a valid result and the latter means no result can be retrieved immediately for the input may be fitered out or cached.

If there are datum cached in theRelRunner (which is typically when the relational algebra contains any aggregation), the cached results must be taken out after all tuples are put in

while ((constauto *output = rel->Get()) != nullptr) {do_some_thing(output);};

Note:

  • TheRelRunner takes over the ownership of theTuple put in. The caller must not try to release it
  • If theoutput returned either byPut orGet is notnullptr, it must be released by the caller
  • The implementation ofRelRunner is not thread-safe

Implementations

Expression Evaluating

TheRunner class is for expression evaluating. The following figure illustrate the evaluating process, in which the expression is like2 + T[0] * 3

Implementation of runner

TheRunner contains an operand stack and an operator verctor. The operator vector is constructed byDecode method from the encoded bytes of an expression. Each operator can manipulate (push/pop) operands in the operand stack following pre-defined process. WhenRun method is called, operators in the vector are carried out one by one. By callingGet method, the top elemement in the stack is poped out as the returned result. Mostly, there is only one oprand left in the stack afterRun for a valid expression.

Encodings

Data Types

Data types are represented by 4 bit, as described in the following table

TypeDescriptionDecimal CodeHex CodeCorresponding C++ Type
INT3232 bit integer10x1int32_t
INT6464 bit integer20x2int64_t
BOOLboolean30x3bool
FLOATsingle precision floating point40x4float
DOUBLEdouble precision floating point50x5double
DECIMALarbitrary-precision decimal number60x6Not supported yet
STRINGstring70x7std::shared_ptr<std::string>

Note the C++ types are wrapped inOperand class.

Immediate Numbers

Immediate numbers are const values embedded in the encoded expressions, to implement const, index of variables, length of arrays, etc. The encoding of immediate numbers is described in the following table

Type of Immediate NumberDescription of Encoding
INT32See encoding of VarInt inProtocol Buffers
INT64The same as above
FLOAT4 bytes big-endian representation
DOUBLE8 bytes big-endian representation
DECIMALTo be defined
STRINGEncoding the byte length of it first, followed by all the types of it. The length is encoded asINT32 type

End of Expression

If the expressions are used in relational algebra, a special byte may be needed to indicate the end of expression. Byte0x00 is used for that purpose. But byte0x00 is not always means the termination of expression, because there may be any bytes in the presentation of an immediate number.

Operators

The expressions Dingo Expression Coprocessor processed are represented in suffix style, i.e. the operator is put after its operands. Consts and variables are also treated as operator in this representation. All the operators are encoded into unique bytes representation. Operators may have one or more immediate numbers followed. Generally

  • For 1-byte representation, the HSB is set to0, and the lower 4 bits can be used to indicate a data type if needed
  • For 2-bytes representation, the HSB is set to1, and the higher 4 bits and the lower 4 bits of the 2nd byte can be used to indicate two data types individually if needed
  • No byte0x00 or0xFF is allowed in the representation except for immediate numbers

The encoding of operators is illustrated in the following figure

Encoding of operators

One-byte Operators

The encoding of one-byte operators is as follows (The template-likeT is any of theData Types)

OperatorHigher 4 BitsLower 4 BitsImmediate NumberDescription
NULL<T>0x0Encode typeTNoneNULL value
CONST<T>0x1Encode typeTT type valueT type const,T !=BOOL
CONST<BOOL>0x1Encode typeBOOLNoneBOOL valuetrue
CONST_N<T>0x2Encode typeTT type valueT type const,T ==INT32 orT ==INT64, the real value is the inverse of the encoded immediate number
CONST_N<BOOL>0x2Encode typeBOOLNoneBOOL valuefalse
VAR<T>0x3Encode typeTINT32 type valueT type variable indexed by an integer
VAR_S<T>0x4Encode typeTSTRING type valueT type variable indexed by a string.Not implemented yet
NOT0x50x1NoneUnaryNOT
AND0x50x2NoneBinaryAND
OR0x50x3NoneBinaryOR
AND_FUN0x50x4INT32 type valueVariadicAND, the immediate number is the number of parameters.Not implemented yet
OR_FUN0x50x5INT32 type valueVariadicOR, the immediate number is the number of parameters.Not implemented yet

Arrays

OperatorHigher 4 BitsLower 4 BitsImmediate number 0Imediate number 1..NDescription
ARRAY<T>0x6Encode typeTINT32 type numberN, the number of elementsencodeN values ofT type continouslyArray ofT type consts. Not implement yet
ARRAY<AGG>NoneNoneINT32 type numberN, the number of elementsencodeN aggregation functions continouslyArray of aggregations, used in relational algebra
ARRAY<CONST>NoneNoneINT32 type numberN, the number of elementsencodeNCONST/CONST_N expressions continouslyTuple of consts.Not implemented yet

Map (Key-Value Data)

Operator1st ByteHigher 4 Bits of 2nd ByteLower 4 Bits of 2nd ByteImmediate Number 0Immediate Number 1..2NDescription
MAP<K, V>0x80Encode typeKEncode typeVINT32 type numberN, tye number of key-value pairsencodeK andV type consts alternatelyMap of consts,K andV is the type of key and value.Not implemented yet

Two-bytes operators

The encoding of two-bytes operators is as follows (The template-likeT is any of theData Types)

Operator1st ByteHigher 4 Bits of 2nd ByteLower 4 Bits of 2nd ByteImmediate NumberDescription
POS<T>0x810x0Encode typeTNoneUnary+
SUM<T>0x810x1Encode typeTINT32 type value, the number of parametersVaridiacSUM.Not implemented yet
NEG<T>0x820x0Encode typeTNoneUnary-
ADD<T>0x830x0Encode typeTNoneBinary+ 
SUB<T>0x840x0Encode typeTNoneBinary-
MUL<T>0x850x0Encode typeTNoneBinary*
DIV<T>0x860x0Encode typeTNoneBinary/
MOD<T>0x870x0Encode typeTNoneBinary%
EQ<T>0x910x0Encode typeTNoneBinary==
GE<T>0x920x0Encode typeTNoneBinary>=
GT<T>0x930x0Encode typeTNoneBinary>
LE<T>0x940x0Encode typeTNoneBinary<=
LT<T>0x950x0Encode typeTNoneBinary<
NE<T>0x960x0Encode typeTNoneBinary<> 
IS_NULL<T>0xA10x0Encode typeTNoneUnaryIS_NULL function
IS_TRUE<T>0xA20x0Encode typeTNoneUnaryIS_TRUE function
IS_FALSE<T>0xA30x0Encode typeTNoneUnaryIS_FALSE function
MIN<T>0xB10x0Encode typeTNoneBinaryMIN function
VARG_MIN<T>0xB10x1Encode typeTINT32 type value, the number of parametersVariadicMIN function
MAX<T>0xB20x0Encode typeTNoneBinaryMAX function
VARG_MAX<T>0xB20x1Encode typeTINT32 type value, the number of parametersVariadicMAX function
ABS<T>0xB30x0Encode typeTNoneUnaryABS function
CAST<D, T>0xF0Encode typeDEncode typeTNoneCasting fromT type toD type

Functions

The coding of functions is as follows

Operator1st Byte2nd ByteImmediate NumberDescription
FUN0xF11-byte positive integer, the sequence number of the functionNonePre-defined functions
VARG_FUN0xF21-byte positive integer, the sequence number of the functionINT32 type value, the number of parametersPre-defined varidiac functions.Not implemented yet

Sequence Number of Functions

Mathematical Functions

FunctionType of ParametersType of Return ValueSequence NumberDescription
CEILDOUBLEDOUBLE0x01Round up to the smallest integer
FLOORDOUBLEDOUBLE0x02Round down to the largest integer
ROUNDINT64,INT32INT640x03Round.Not implemented yet
ROUNDDOUBLE,INT32DOUBLE0x04Round.Not implemented yet
POWDOUBLEDOUBLE0x05Power.Not implemented yet
POWINT64INT640x06Power.Not implemented yet
SINDOUBLEDOUBLE0x07Sine
COSDOUBLEDOUBLE0x08Cosine
TANDOUBLEDOUBLE0x09Tangent
ASINDOUBLEDOUBLE0x0AArc sine
ACOSDOUBLEDOUBLE0x0BArc cosine
ATANDOUBLEDOUBLE0x0CArc tangent
SINHDOUBLEDOUBLE0x0DHyperbolic sine
COSHDOUBLEDOUBLE0x0EHyperbolic cosine
TANHDOUBLEDOUBLE0x0FHyperbolic tangent
EXPDOUBLEDOUBLE0x10Exponential
LOGDOUBLEDOUBLE0x11Logarithm

String Functions

FunctionType of ParametersType of Return ValueSequence NumberDescription
CHAR_LENGTHSTRINGINT320x20Return the length of a string.Not implemented yet
CONCATSTRING,STRINGSTRING0x21Concatenate two strings
LOWERSTRINGSTRING0x22Converts to lowercase
UPPERSTRINGSTRING0x23Converts to uppercase
LEFTSTRING,INT32STRING0x24Sub string from left
RIGHTSTRING,INT32STRING0x25Sub string from right
TRIMSTRINGSTRING0x26Trim whitespaces
TRIMSTRING,STRINGSTRING0x27Trim specified string.Not implemented yet
LTRIMSTRINGSTRING0x28Trim whitespaces from left
LTRIMSTRING,STRINGSTRING0x29Trim specified string from left.Not implemented yet
RTRIMSTRINGSTRING0x2ATrim whitespaces from right
RTRIMSTRING,STRINGSTRING0x2BTrime specified string from right.Not implemented yet
SUBSTRSTRING,INT32,INT32STRING0x2CSub string from a "position" to another "position". The 1st "position" is0
SUBSTRSTRING,INT32STRING0x2DSub string from a "position" to the end. The 1st "position" is0
MIDSTRING,INT32,INT32STRING0x2ESub string from a "position" to the specified length. The 1st "position" is1
MIDSTRING,INT32STRING0x2FSub string from a "position" to the end. The 1st "position" is1
REPEATSTRING,INT32STRING0x30Repeat the string.Not implemented yet
REVERSESTRINGSTRING0x31Reverse the string.Not implemented yet
REPLACESTRING,STRING,STRINGSTRING0x32Search and replace.Not implemented yet
LOCATESTRING,STRING,INT32INT320x33Find the "position" of a string in another string. The 1st "position" is1.Not implemented yet
LOCATESTRING,STRINGINT320x34Find the "position" of a string in another string. The 1st "position" is1.Not implemented yet
FORMATDOUBLE,INT32STRING0x35Formatted output of a number.Not implemented yet

Aggregation Functions

Aggregation functions are used only in relational algebra. Complicated aggregations such asAVG is not supported here. Actually,AVG can be converted toSUM andCOUNT with a projection before encoded.

Encoding of aggregations does not follow the way to encode normal function and has no leading0xF1 byte, but is more like that of one-byte operators, as in the following table

AggregationType of ParameterType of Return ValueHigher 4 BitsLow 4 BitsImmediate NumberDescription
COUNT_ALLNoneINT640x10x0NoneCount all rows
COUNT<T>TINT640x1Encode typeTINT32 type value, the column indexCount only non-null values
SUM<T>TT0x2Encode typeTINT32 type value, the column indexSum the values
MAX<T>TT0x3Encode typeTINT32 type value, the column indexMaximum of the values
MIN<T>TT0x4Encode typeTINT32 type values, the column indexMinimum of the values

Note:

  • The column indices are all started from0
  • Some aggregation functions are requried to return0 if there are no input rows, such asCOUNT. For simplicity,NULL is returned for all aggregation functions here and it is in the final reducing stage to convertNULL to0

Relational Algebra Operators

Only "simple" (the operators have only one single input and one output) relational algebra expressions are supported. These operators are chained with one's output connected to another's input to form a whole relational algebra expression. They are encoded one by one from the input end to the output end.

The encoding of each algebra operators contains an unique "leading byte", a "payload" for the parameters and expressions in the operator and a "trailing byte" which is alwaysEOE (End Of Expression). The details are in the following table

OperatorLeading BytePayloadTrailing Byte
Filter0x71Encode the filter expressionEOE
Project0x72Encode the project expression one by oneEOE
Grouped Aggregation0x73Encode group indices asARRAY<INT32> type, then the list of aggregation functions asARRAY<AGG> typeEOE
Ungrouped Aggregation0x74Encode the list of aggregation functions asARRAY<AGG> typeEOE

A "Project" operator may contains several expressions but they can be concatenated into one "huge" expression without any separator simplify the evaluating process. The "huge" expression is decoded by oneRunner, and after evaluating there will be several results left in the operand stack just as needed. These results can be taken out by multiple calls toGet method.

Used by

About

Cpp engine of DingoDB Expression.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp