Movatterモバイル変換

M4rukku/Ukkonens-Linear-Time-Shortest-Common-SuperstringPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

An Implementation of Ukkonens 1990 linear-time algorithm for finding an approximate shortest superstring in Java. Also includes an extendable version of Aho Corasick's efficient string matcher.

License

MIT license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
.gitignore		.gitignore
License.md		License.md
Readme.md		Readme.md
UkkonensAlgorithm.iml		UkkonensAlgorithm.iml
pom.xml		pom.xml

Repository files navigation

Notes

This is an implementation of the AhoCorasick efficient string matcher (https://dl.acm.org/doi/10.1145/360825.360855) and Ukkonens' linear time approximate shortest common superstring finder (https://link.springer.com/article/10.1007/BF01840391).

Originally inspired by IMCs 32BIDS coding challenge (already finished), where I implemented the initial version, I decided to clean up and publish my code (under a MIT license).

Both algorithms can be used independently; you can find usage examples either in the tests or in the javadoc. Should you need to build a new algorithm based on ACs' string matcher, you can create a new node extending ACTrieNode and a corresponding implementation of an AbstractACNodeFactory which you then can inject into the AhoCorasickTrieFactory.

You can run the Build with "mvn package". This will run and execute the tests, generate the docs and produce a jar.

Usage

Find SCS (Ukkonen):

List<String>keys =List.of("aki","ele","kiki","kira","lea");UkkonenSCSFinderfinder =UkkonenSCSFinder.createFromKeys(keys);Stringscs =finder.getSCS();

Find all matches of a set of keys in a text (AC):

KeywordTextMatchermatcher =KeywordTextMatcher.createFromParameters(LanguageParameterFactory.defaultParameter,keys);List<Match>matches =matcher.matchText(text);

A quick intro to Language Parameters

The class LanguageParameter is a utility class which encodes language information. In particular, it stores the characters that embody the aphabet, as well as a mapper function between the characters inside and the integers from 0 (inclusive) to the total alphabet size (exclusive). We need this mapping function because our nodes have a "goto" function that is basically an array linking our current nodes with successor nodes. The successor node in the trie for character c can be found by accessing gotoArray(mapped(c)). In the aftermath, it might have been easier to replace the array with a HashMap, but for small alphabets an array encoding is faster. Nonetheless, the LanguageParameter class pops-up all over the place in my implementation; so it is important to be aware of it.

Create a new LanguageParameter: (from Alphabet alone)

privatestaticfinalList<Character>lowerCaseEnglishAlphabet ="abcdefghijklmnopqrstuvwxyz".chars().mapToObj(c -> (char)c).collect(Collectors.toList());LanguageParameterenglishLowerCaseLanguageParameter =LanguageParameterFactory.createLanguageParametersFromAlphabet(lowerCaseEnglishAlphabet);

Create a new LanguageParameter: (from Alphabet + Custom Mapping Function)

LanguageParameterenglishLowerCaseLanguageParameter =LanguageParameterFactory.createLanguageParametersFromParams(myChar -> (int) (myChar -'a'),lowerCaseEnglishAlphabet);

About

An Implementation of Ukkonens 1990 linear-time algorithm for finding an approximate shortest superstring in Java. Also includes an extendable version of Aho Corasick's efficient string matcher.

Releases

No releases published

Packages

No packages published

Languages

Java100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Notes

Usage

A quick intro to Language Parameters

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

M4rukku/Ukkonens-Linear-Time-Shortest-Common-Superstring

Folders and files

Latest commit

History

Repository files navigation

Notes

Usage

A quick intro to Language Parameters

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages