Movatterモバイル変換

kearch/kearchPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star56

Distributed search engine

License

GPL-3.0 license

56 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,036 Commits
benchmark		benchmark
figure		figure
packages		packages
services		services
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ansible.cfg		ansible.cfg
generate_READMEs_from_openapi.sh		generate_READMEs_from_openapi.sh
logo-text-side-white.png		logo-text-side-white.png
me-playbook.yml		me-playbook.yml
me_db_checker.sh		me_db_checker.sh
me_deploy.sh		me_deploy.sh
me_test.sh		me_test.sh
sp-playbook.yml		sp-playbook.yml
sp_db_checker.sh		sp_db_checker.sh
sp_deploy.sh		sp_deploy.sh
sp_es_checker.sh		sp_es_checker.sh
sp_test.sh		sp_test.sh

Repository files navigation

kearch is a distributed search engine. You can set up your own search engine using kearch and connect your search engine to another search engine.

You can access our search engine fromhttps://kearch.info.

There are two types of search engines in kearch. One isspecialist search engine and the other ismeta search engine. Aspecialist search engine is a specialized search engine for a topic. For example, a search engine for history, programming language ... anything you want.

On the other hand, ameta search engine is used for connecting specialized search engines. You can conect any specialist search engines using a meta search engine. For example, you can get search engine about some programming languages when you connect specialized search engines about Lisp, Haskell, C#, etc..

If you want to set up your ownspecialist search engine, please read from1. Specialist search engine. If you want to set up your ownmeta search engine, please read from2. Meta search engine.

1. Specialist search engine

1.1 Prepare a server for a specialist search engine

First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is as follows.

RAM: 8GiB
SSD/HDD: 100GiB
CPU: Dual core processor
OS: Ubuntu 18.04
Global IP adress or domain
SSH login using public key authentication

You can get a qualified server usingSakura Cloud,AWS,GCP orMicrosoft Azure.

1.2 Deploy a specialist search engine to your server using Ansible

Second, deploy a specialist search engine using Ansible. If you don't install Ansible to yourlocal machine, please install it first. You can install Ansible by following commands.

Debian/Ubuntu:sudo apt install ansible
Mac:brew install ansible

And then clone this repository yourlocal machine by the following command.

~$ git clone https://github.com/kearch/kearch.git

Finally, deploy a specialist search engine using Ansible. Please replace<HOSTNAME> and<USERNAME> depending on your environment. (In most cases,<HOSTNAME> is the IP adress of your server.Don't forget a comma after<HOSTNAME>. ) This takes some time to finish. I recommend you to take a coffee break.

~/kearch$ ansible-playbook sp-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv

1.3 Configuration of your specialist search engine

Please accesshttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.

The default Username and Password are "root" and "password". We strongly recommend you toupdate password immdiately after login.

After updating password, Pleaseset engine name here.

Andset the global IP adress of your server here.

1.4 Set a topic to your specialist search engine and start crawling

Now, you canset a topic to your specialist search engine. There are two way to set a topic. One is using word frequency dictionary (Method A) and the other is using URLs (Method B). You must choose one of them.I think word frequency dictionary (Method A) is better.

1.4.1.A Use word frequency dictionary

You must choose alanguage and then inputword frequencies in your crawling topic andWord frequencies in random topic.

You shoud input characteristic words and their ratio inword frequencies in your crawling topic. If you feel troublesome to input, please have a lookAppendix4. You can find easy way to generate text to input there.

You should input all words and their ratio in the Web inword frequencies in random topic. But it is very difficult. So I recommend you to checkuse default dict.

1.4.1.B Use URLs

You must choose alanguage and input some URLs related your own topic inURLs in your crawling topic. And then, input some URLs about random topics inURLs in random topic.

Though this method is easier than frequency dictionary one, it is rougher. This is because I recommend you to useMethod A.

1.4.2 Start crawling

Then, you can start crawling from some URLs. Please specify some URLs from here.

1.5 Use your specialist search engine

Now, you can use your specialist search engine fromhttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32550.

1.6 Connect your specialist search engine to a meta search engine

There are two cases for connecting a specialist search engine and a meta search engine. One is sending aconnection request from a specialist search and another is sendinf from a meta search engine.

1.6.1.A Connect from your specialist search engine to a meta search engine

In this case, yousend aconnection request from your specialist search engine.

After sending a connection request, the administrator of the meta search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.

1.6.1.B Connect from a meta search engine to your specialist search engine

In this case, youreceive aconnection request from a specialist search engine. When a specialist search engine send a connection request to your meta search engine, it is displayed in this way.

You can approve a connection request just pushingapprove button.

2. Meta search engine

2.1 Prepare a server for a meta search engine

First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is following.

RAM: 4GiB
SSD/HDD: 100GiB
CPU: Dual core processor
OS: Ubuntu 18.04
Global IP adress or domain
SSH login using public key authentication

You can get a qualified server usingSakura Cloud,AWS,GCP orMicrosoft Azure.

2.2 Deploy a meta search engine to your server using Ansible

Second, deploy a meta search engine using Ansible. If you don't install Ansible to yourlocal machine, please install it first. You can install Ansible by following commands.

Debian/Ubuntu:sudo apt install ansible
Mac:brew install ansible

And then clone this repository yourlocal machine by the following command.

~$ git clone https://github.com/kearch/kearch.git

Finally, deploy a meta search engine using Ansible. Please replace<HOSTNAME> and<USERNAME> depending on your environment. (In most cases,<HOSTNAME> is the IP adress of your server.Don't forget a comma after<HOSTNAME>. ) This takes some time to finish. I recommend you to take a coffee brake.

~/kearch$ ansible-playbook me-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv

2.3 Configuration of your meta search engine

Please accesshttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.

The default Username and Password are "root" and "password". We strongly recommend you toupdate password immdiately after login.

Andset the global IP adress of your server here.

2.4 Connect your meta search engine to a specialist search engine

There are two cases for connecting a meta search engine and a specialist search engine. One is sending aconnection request from a meta search and another is sending from a specialist search engine.

2.4.1.A Connect from your meta search engine to a specialist search engine

In this case, yousend aconnection request from your meta search engine.

After sending a connection request, the administrator of the specialist search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.

2.4.1.B Connect from a meta search engine to your specialist search engine

In this case, youreceive aconnection request from a meta search engine. When a meta search engine send a connection request to your specialist search engine, it is displayed in this way.

You can approve a connection request just pushingapprove button.

2.5 Use your meta search engine

Now, you can use your meta search engine fromhttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32450.

Appendix

Appendix 1. How to deploy kearch to your kubernetes cluster

git clone https://github.com/kearch/kearch.gitcd kearch./sp_deploy.sh spdb spes all./me_deploy.sh medb all

Appendix 2. Port numbers for services

32700: Admin setting page port of specialist search engines
32600: Admin setting page port of meta search engines
32500: Gateway port of specialist search engines
32400: Gateway port of meta search engines
32550: Search engine front page port of specialist search engines
32450: Search engine front page port of meta search engines

Appendix 3. Check your DB in kearch

Check the specialist DB.

./sp_db_checker.sh

Check the meta DB.

./me_db_checker.sh

Appendix 4. Generate word frequencies from URLs

You can generate frequencies from URLs easily usinggenerate_frequencies_from_URLs.py inutils dicrtory.

$ cd utils$ python3 generate_frequencies_from_URLs.py haskell_listhaskell 213language 55programming 43ghc 42...

Please replacehaskell_list with your own URL list and generate your frequencies. URL list is just only a text file of newline-separated URLs.

About

Distributed search engine

Movatterモバイル変換

License

kearch/kearch

Folders and files

Latest commit

History

Repository files navigation

1. Specialist search engine

1.1 Prepare a server for a specialist search engine

1.2 Deploy a specialist search engine to your server using Ansible

1.3 Configuration of your specialist search engine

1.4 Set a topic to your specialist search engine and start crawling

1.4.1.A Use word frequency dictionary

1.4.1.B Use URLs

1.4.2 Start crawling

1.5 Use your specialist search engine

1.6 Connect your specialist search engine to a meta search engine

1.6.1.A Connect from your specialist search engine to a meta search engine

1.6.1.B Connect from a meta search engine to your specialist search engine

2. Meta search engine

2.1 Prepare a server for a meta search engine

2.2 Deploy a meta search engine to your server using Ansible

2.3 Configuration of your meta search engine

2.4 Connect your meta search engine to a specialist search engine

2.4.1.A Connect from your meta search engine to a specialist search engine

2.4.1.B Connect from a meta search engine to your specialist search engine

2.5 Use your meta search engine

Appendix

Appendix 1. How to deploy kearch to your kubernetes cluster

Appendix 2. Port numbers for services

Appendix 3. Check your DB in kearch

Appendix 4. Generate word frequencies from URLs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages