Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PMS full-text search engine with no external dependencies written in C#

License

NotificationsYou must be signed in to change notification settings

pms-search/FullTextSearch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: MITTravis Status

Full-Text Search Engine with no external dependencies written in C# for .NET Core.

The aim of this project is to showcase algorithms, data structures and techniques that are used to create full-text search engines.

Getting Started

On Windows:

  1. Download and build code. Use the following commands:

    dotnet restoredotnet build
  2. Open folder with binaries:bin\Debug\netcoreapp2.0

  3. Start the following command. Replace DATA_PATH with a path to Datasets folder

    run_test.bat DATA_PATH
  4. If everything goes well the following messages are printed:

    Log from index construction:

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll index "F:\Sources\FullTextSearch\Datasets"PMS Full-Text Search (c) Petro Protsyk 2017-2018F:\Sources\FullTextSearch\Datasets\Simple\TestFile001.txtF:\Sources\FullTextSearch\Datasets\Simple\TestFile002.txtF:\Sources\FullTextSearch\Datasets\Simple\TestFile003.txtIndexed documents: 3, time: 00:00:00.1010004

    Dump of the index (for each term in the dictionary - the list of all occurrences):

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll printPMS Full-Text Search (c) Petro Protsyk 2017-20182017 -> [1,1,9]algorithms -> [1,1,19]and -> [1,1,20]apple -> [3,1,1]banana -> [3,1,2]build -> [1,1,25]c -> [1,1,16]data -> [1,1,21]demonstrate -> [1,1,18]...

    Search with query WORD(pms):

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll search "WORD(pms)"{filename:"TestFile001.txt", size:"180", created:"2018-04-02T10:09:41.4208444+02:00"}{[1,1,1]}{filename:"TestFile002.txt", size:"29", created:"2018-04-02T10:09:41.4248447+02:00"}{[2,1,1]}Documents found: 2, matches: 2, time: 00:00:00.0564721

    Lookup in the dictionary using a pattern i.e. all terms matching pattern:

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll lookup pet*petro-mariya-sophieTerms found: 1, time: 00:00:00.0704173dotnet Protsyk.PMS.FullText.ConsoleUtil.dll lookup projct~1projectTerms found: 1, time: 00:00:00.0847931

Query Language

  • WORD(apple) - single word
  • WILD(app*) - wildcard pattern
  • EDIT(apple, 1) - Levenshtein (edit distance, fuzzy search)

Conjunction operators

  • OR - boolean or
  • AND - boolean and
  • SEQ - sequence of words, phrase

Examples of queries:

  • AND(WORD(apple), OR(WILD(a*), EDIT(apple, 1)))
  • SEQ(WORD(hello), WORD(world))

Data Structures

  • Dictionary of the persistent index is implemented using:Ternary Search Tree.
  • Key-value storage for document metadata is based on persistent B-Tree implementation:B-Tree.

Algorithms

References

Links

About

PMS full-text search engine with no external dependencies written in C#

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C#99.8%
  • Batchfile0.2%

[8]ページ先頭

©2009-2025 Movatter.jp