The following sections describe the Alfresco Full Text Search (FTS) syntax.
The Alfresco Full Text Search (FTS) query text can be used standalone or it can be embedded in CMIS-SQL using thecontains() predicate function. The CMIS specification supports a subset of FTS. The full power of FTS can not be used and, at the same time, maintain portability between CMIS repositories.
FTS is exposed directly by the interface, which adds its own template, and is also used as its default field. The default template is:
%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT)When FTS is embedded in CMIS-SQL, only the CMIS-SQL-style property identifiers (cmis:name) and aliases, CMIS-SQL column aliases, and the special fields listed can be used to identify fields. The SQL query defines tables and table aliases afterfrom andjoin clauses. If the SQL query references more than one table, thecontains() function must specify a single table to use by its alias. All properties in the embedded FTS query are added to this table and all column aliases used in the FTS query must refer to the same table. For a single table, the table alias is not required as part of thecontains() function.
When FTS is used standalone, fields can also be identified usingprefix:local-name and{uri}local-name styles.
Query time boosts allow matches on certain parts of the query to influence the score more than others.
All query elements can be boosted: terms, phrases, exact terms, expanded terms, proximity (only in filed groups), ranges, and groups.
term^2.4"phrase"^3term~0.8^4=term^3~term^4cm:name:(big * yellow)^41..2^2[1 TO 2]^2yellow AND (car OR bus)^3The date field types in Solr support the date math expressions.
The date math expression makes it easy to create times relative to fixed moments in time and includes the current time which can be represented using the special value ofNOW.
The date math expressions consist either adding some quantity of time in a specified unit, or rounding the current time by a specified unit. Expressions can be chained and are evaluated left to right.
For example, to represents a point in time two months from now, use:
NOW+2MONTHSTo represents a point in time one day ago, use:
NOW-1DAYA slash is used to indicate rounding. To represents the beginning of the current hour, use:
NOW/HOURTo represent a point in time six months and three days into the future and then rounds that time to the beginning of that day, use:
NOW+6MONTHS+3DAYS/DAYWhile date math is most commonly used relative to NOW, it can be applied to any fixed moment in time as well:
1972-05-20T17:33:18.772Z+6MONTHS+3DAYS/DAYNote: Solr 6 date math supports
TODAY.
Single terms, phrases, and so on can be combined usingOR in upper, lower, or mixed case.
TheOR operator is interpreted as “at least one is required, more than one or all can be returned”.
If not otherwise specified, by default search fragments will beORed together.
big yellow bananabig OR yellow OR bananaTEXT:big TEXT:yellow TEXT:bananaTEXT:big OR TEXT:yellow OR TEXT:bananaThese queries search for nodes that contain at least one of the termsbig,yellow, orbanana in any content.
Any character can be escaped using the backslash “” in terms, IDs (field identifiers), and phrases. Java unicode escape sequences are supported. Whitespace can be escaped in terms and IDs.
For example:
cm:my content:my nameTo search for an exact term you must prefix it with “=”. The supported syntax:
=term=term1 =term2=“multi term phrase”
Note:
=“multi term phrase”returns documents only with the exact phrase and terms in the exact order.
=field:term=field:term1 =field:term2=field:“multi term phrase”If you don’t specify a field the search runs against name, description, title, and content. If the field specified isTOKENIZED=false, only the full field is matched. If the field you specified isTOKENIZED=TRUE orTOKENIZED=BOTH then the search is run on the cross locale tokenized version of the field.
Note: If cross locale is not configured for the field then an exception occurs.
The list of default supported Alfresco properties is declared in the<insight_engine_home>/solrhome/conf/shared.properties file:
alfresco.cross.locale.property.0={http://www.alfresco.org/model/content/1.0}namealfresco.cross.locale.property.1={http://www.alfresco.org/model/content/1.0}lockOwnerYou can extend that capability by uncommenting the lines below and performing a full reindex. This has the result of enabling cross locale on all properties defined with those property types:
alfresco.cross.locale.datatype.0={http://www.alfresco.org/model/dictionary/1.0}textalfresco.cross.locale.datatype.1={http://www.alfresco.org/model/dictionary/1.0}contentalfresco.cross.locale.datatype.2={http://www.alfresco.org/model/dictionary/1.0}mltextSearch specific fields rather than the default. Terms, phrases, etc. can all be preceded by a field. If not the default field TEXT is used.
field:termfield:"phrase"=field:exact~field:expandFields fall into three types: property fields, special fields, and fields for data types.
Property fields evaluate the search term against a particular property, special fields are described in the following table, and data type fields evaluate the search term against all properties of the given type.
| Type | Description |
|---|---|
| Property | Fully qualified property, for example{http://www.alfresco.org/model/content/1.0}name:apple |
| Property | Fully qualified property, for example@{http://www.alfresco.org/model/content/1.0}name:apple |
| Property | CMIS style property, for examplecm_name:apple. |
| Property | Prefix style property, for examplecm:name:apple. |
| Property | Prefix style property, for example@cm:name:apple. |
| Property | TEXT, for exampleTEXT:apple. |
| Special | ID, for exampleID:"NodeRef" |
| Special | ISROOT, for exampleISROOT:T |
| Special | TX, for exampleTX:"TX" |
| Special | PARENT, for examplePARENT:"NodeRef" |
| Special | PRIMARYPARENT, for examplePRIMARYPARENT:"NodeRef". |
| Special | QNAME, for exampleQNAME:"app:company_home". |
| Special | CLASS, for exampleCLASS:"qname". |
| Special | EXACTCLASS, for exampleEXACTCLASS:"qname". |
| Special | TYPE, for exampleTYPE:"qname". |
| Special | EXACTTYPE, for exampleEXACTTYPE:"qname". |
| Special | ASPECT for exampleASPECT:"qname". |
| Special | EXACTASPECT, for exampleEXACTASPECT:"qname". |
| Special | ISUNSET for exampleISUNSET:"property-qname" |
| Special | ISNULL, for exampleISNULL:"property-qname". |
| Special | ISNOTNULL, for exampleISNOTNULL:"property-qname". |
| Special | EXISTS for exampleEXISTS:"name of the property". |
| Special | SITE for exampleSITE:"shortname of the site". |
| Special | TAG. TAG: “name of the tag”Note:TAG must be in upper case. |
| Fully qualified data type | Data Type,http://www.alfresco.org/model/dictionary/1.0}content:apple |
| prefixed data type | Data Type, d:content:apple |
When you search in multi-value fields there are additional options available than forSearch in fields. To search in multi-value fields your properties must haveMultiple values enabled, for more seeCreate a property.
The following example queries are executed using a sample multi-valued property"mul:os" that stores values"MacOS" and"Linux".
mul:os:"MacOS"
Returns the document because"MacOS" is one of the values of the property.
mul:os:("MacOS" AND "Windows")
Does not return a document because the property doesn’t contain the value"Windows".
mul:os:("MacOS" OR "Windows")
Returns the document because"MacOS" is one of the values of the property, even though"Windows" is not.
This relates to the priority defined on properties in the data dictionary, which can be both tokenized or untokenized.
Explicit priority is set by prefixing the query with “=” for identifier pattern matches.
The tilde (~) can be used to force tokenization.
Alfresco supports fuzzy searches based on the Lucene default Levenshtein Distance.
To do a fuzzy search use the tilde (~) symbol at the end of a single word term with a parameter between 0 and 1 to specify the required similarity. Use a value closer to 1 for higher similarity.
For example, to search for a term similar in spelling toroam use the fuzzy search:
roam~0.9This search will find terms likefoam,roaming, androams.
Use parentheses to encapsulateOR statements for the search engine to execute them properly.
TheOR operator is executed as “I would like at least one of these terms”.
Groupings of terms are made using( and ). Groupings of all query elements are supported in general. Groupings are also supported after a field - field group.
The query elements in field groups all apply to the same field and cannot include a field.
(big OR large) AND banana title:((big OR large) AND banana)When you search, entries are generally a term or a phrase. The string representation you type in will be transformed to the appropriate type for each property when executing the query. For convenience, there are numeric literals but string literals can also be used.
You can specify either a particular date or a date literal. A date literal is a fixed expression that represents a relative range of time, for example last month, this week, or next year.
dateTime field values are stored as Coordinated Universal Time (UTC). The date fields represent a point in time with millisecond precision. For date field formatting, Solr usesDateTimeFormatter.ISO_INSTANT. The ISO instant formatter formats an instant in Coordinated Universal Time (UTC), for example:
YYYY-MM-DDThh:mm:ssZwhere,
YYYY is the year.MM is the month.DD is the day of the month.hh is the hour of the day as on a 24-hour clock.mm is minutes.ss is seconds.Z is a literalZ character indicating that this string representation of the date is in UTC.Note: No time zone can be specified. The string representation of dates is always expressed in UTC, for example:
1972-05-20T17:33:18ZString literals for phrases can be enclosed in double quotes or single quotes. Java single character anduXXXX-based escaping are supported within these literals.
Integer and decimal literals conform to the Java definitions.
Dates as any other literal can be expressed as a term or phrase. Dates are in the format...... Any or all of the time can be truncated.
In range queries, strings, term, and phrases that do not parse to valid type instance for the property are treated as open ended.
test:integer[ 0 TO MAX] matches anything positiveYou can narrow your search results by excluding words with theNOT syntax.
Single terms, phrases, and so on can be combined using “NOT” in upper, lower, or mixed case, or prefixed with “!” or “-”.
These queries search for nodes that contain the termsyellow in any content.
yellow NOT bananayellow !bananayellow -bananaNOT yellow banana-yellow banana!yellow bananaTheNOT operator can only be used for string keywords; it doesn’t work for numerals or dates.
Prefixing any search qualifier with a- excludes all results that are matched by that qualifier.
Sometimes AND and OR are not enough. If you want to find documents that must contain the term “car”, score those with the term “red” higher, but do not match those just containing “red”.
| Operator | Description |
|---|---|
| ”,” | The field, phrase, group is optional; a match increases the score. |
| ”+” | The field, phrase, group is mandatory (Note: this differs from Google - see “=”) |
| ”-“, “!” | The field, phrase, group must not match. |
The following example finds documents that contain the term “car”, score those with the term “red” higher, but does not match those just containing “red”:
+car |redNote: At least one element of a query must match (or not match) for there to be any results.
AllAND andOR constructs can be expressed with these operators.
Phrases are enclosed in double quotes. Any embedded quotes can be escaped using ``. If no field is specified then the default TEXT field will be used, as with searches for a single term.
The whole phrase will be tokenized before the search according to the appropriate data dictionary definition(s).
"big yellow banana"Operator precedence is SQL-like (not Java-like). When there is more than one logical operator in a statement, and they are not explicitly grouped using parentheses,NOT is evaluated first, thenAND, and finallyOR.
The following shows the operator precedence from highest to lowest:
"[, ], <, >()~ (prefix and postfix), =^+, |, -NOT,ANDORAND andOR can be combined with+,|,- with the following meanings:
| AND (no prefix is the same as +) | Description |
|---|---|
big AND dog | big and dog must occur |
+big AND +dog | big and dog must occur |
big AND +dog | big and dog must occur |
+big AND dog | big and dog must occur |
big AND \|dog | big must occur and dog should occur |
\|big AND dog | big should occur and dog must occur |
\|big AND \|dog | both big and dog should occur, and at least one must match |
big AND -dog | big must occur and dog must not occur |
-big AND dog | big must not occur and dog must occur |
-big AND -dog | both big and dog must not occur |
\|big AND -dog | big should occur and dog must not occur |
| OR (no prefix is the same as +) | Description |
|---|---|
dog OR wolf | dog and wolf should occur, and at least one must match |
+dog OR +wolf | dog and wolf should occur, and at least one must match |
dog OR +wolf | dog and wolf should occur, and at least one must match |
+dog OR wolf | dog and wolf should occur, and at least one must match |
dog OR \|wolf | dog and wolf should occur, and at least one must match |
\|dog OR wolf | dog and wolf should occur, and at least one must match |
\|dog OR \|wolf | dog and wolf should occur, and at least one must match |
dog OR -wolf | dog should occur and wolf should not occur, one of the clauses must be valid for any result |
-dog OR wolf | dog should not occur and wolf should occur, one of the clauses must be valid for any result |
-dog OR -wolf | dog and wolf should not occur, one of the clauses must be valid for any result |
These examples show how to embed queries in CMIS.
- strict queriesSELECT * FROM Document WHERE CONTAINS("zebra")SELECT * FROM Document WHERE CONTAINS("quick")- Alfresco extensionsSELECT * FROM Document D WHERE CONTAINS(D, 'cmis:name:\'Tutorial\'')SELECT cmis:name as BOO FROM Document D WHERE CONTAINS('BOO:\'Tutorial\'')ResultSet results = searchService.query(storeRef, SearchService.LANGUAGE_FTS_ALFRESCO, "quick");SearchService.LANGUAGE_FTS_ALFRESCO = "fts-alfresco"FTS is supported in the node browser.
search{ query: string, mandatory, in appropriate format and encoded for the given language store: string, optional, defaults to 'workspace://SpacesStore' language: string, optional, one of: lucene, xpath, jcr-xpath, fts-alfresco - defaults to 'lucene' templates: [], optional, Array of query language template objects (see below) - if supported by the language sort: [], optional, Array of sort column objects (see below) - if supported by the language page: object, optional, paging information object (see below) - if supported by the language namespace: string, optional, the default namespace for properties defaultField: string, optional, the default field for query elements when not explicit in the query onerror: string optional, result on error - one of: exception, no-results - defaults to 'exception'}sort{ column: string, mandatory, sort column in appropriate format for the language ascending: boolean optional, defaults to false}page{ maxItems: int, optional, max number of items to return in result set skipCount: int optional, number of items to skip over before returning results}template{ field: string, mandatory, custom field name for the template template: string mandatory, query template replacement for the template}For example:
var def = { query: "cm:name:test*", language: "fts-alfresco" }; var results = search.query(def);FTS is not supported in FreeMarker.
Google-style proximity is supported.
To specify proximity for fields, use grouping.
big * appleTEXT:(big * apple)big *(3) appleTEXT:(big *(3) apple)The FTS query language supports query templates. These are intended to help when building application specific searches.
A template is a query but with additional support to specify template substitution.
%field
Insert the parse tree for the currentftstest and replace all references to fields in the current parse tree with the supplied field.
%(field1, field2)%(field1 field2)
(The comma is optional.) Create a disjunction, and for each field, add the parse tree for the currentftstest to the disjunction, and then replace all references to fields in the current parse tree with the current field from the list.
| Name | Template | Example Query | Expanded Query |
|---|---|---|---|
| t1 | %cm:name | t1:n1 | cm:name:n1 |
| t1 | %cm:name | t1:”n1” | cm:name:”n1” |
| t1 | %cm:name | ~t1:n1^4 | ~cm:name:n1^4 |
| t2 | %(cm:name, cm:title) | t2:”woof” | (cm:name:”woof” OR cm:title:”woof”) |
| t2 | %(cm:name, cm:title) | ~t2:woof^4 | (~cm:name:woof OR ~cm:title:woof)^4 |
| t3 | %cm:name AND my:boolean:true | t3:banana | (cm:name:banana AND my:boolean:true) |
Templates can refer to other templates.
nameAndTitle -> %(cm:name, cm:title)nameAndTitleAndDesciption -> %(nameAndTitle, cm:description)Inclusive ranges can be specified in Google-style. There is an extended syntax for more complex ranges. Unbounded ranges can be defined using MIN and MAX for numeric and date types and “u0000” and “FFFF” for text (anything that is invalid).
| Lucene | Description | Example | |
|---|---|---|---|
[#1 TO #2] | #1..#2 | The range #1 to #2 inclusive#1 <= x <= #2 | 0..5[0 TO 5] |
<#1 TO #2] | The range #1 to #2 including #2 but not #1.#1 < x <= #2 | <0 TO 5] | |
[#1 TO #2> | The range #1 to #2 including #1 but not #2.#1 <= x < #2 | [0 TO 5> | |
<#1 TO #2> | The range #1 to #2 exclusive.#1 < x < #2 | <0 TO 5> |
TEXT:apple..bananamy:int:[0 TO 10]my:float:2.5..3.5my:float:0..MAXmt:text:[l TO "uFFFF"]Single terms are tokenized before the search according to the appropriate data dictionary definition(s).
If you do not specify a field, it will search in the content and properties. This is a shortcut for searching all properties of type content. Terms can not contain a whitespace.
bananaTEXT:bananaBoth of these queries will find any nodes with the word “banana” in any property of typed:content.
If the appropriate data dictionary definition(s) for the field supports both FTS and untokenized search, then FTS search will be used. FTS will include synonyms if the analyzer generates them. Terms cannot contain whitespace.
Spans and positions are not implemented. Positions will depend on tokenization.
Anything more detailed than one *(2) two are arbitrarily dependent on the tokenization. An identifier and pattern matching, or dual FTS and ID tokenization, might be the answer in these cases.
term[^] - startterm[$] - endterm[position]These are of possible use but excluded for now. Lucene surround extensions:
and(terms etc)99w(terms etc)97n(terms etc)To force tokenization and term expansion, prefix the term with~.
For a property with both ID and FTS indexes, where the ID index is the default, force the use of the FTS index.
~runningWildcards are supported in terms, phrases, and exact phrases using* to match zero, one, or more characters and? to match a single character.
The* wildcard character can appear on its own and implies Google-style. The “anywhere after” wildcard pattern can be combined with the= prefix for identifier based pattern matching. Search will return and highlight any word that begins with the root of the word truncated by the* wildcard character.
The following will all find the term apple.
TEXT:app?eTEXT:app*TEXT:*ppleappl?*ple=*ple"ap*le""***le""?????"When performing a search that includes a wildcard character, it is best to wrap your search term in double quotation marks. This ensures all metadata and content are searched.