Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Distributed search engine

License

NotificationsYou must be signed in to change notification settings

kearch/kearch

Repository files navigation

kearch is a distributed search engine. You can set up your own search engine using kearch and connect your search engine to another search engine.

You can access our search engine fromhttps://kearch.info.

There are two types of search engines in kearch. One isspecialist search engine and the other ismeta search engine. Aspecialist search engine is a specialized search engine for a topic. For example, a search engine for history, programming language ... anything you want.

On the other hand, ameta search engine is used for connecting specialized search engines. You can conect any specialist search engines using a meta search engine. For example, you can get search engine about some programming languages when you connect specialized search engines about Lisp, Haskell, C#, etc..

If you want to set up your ownspecialist search engine, please read from1. Specialist search engine. If you want to set up your ownmeta search engine, please read from2. Meta search engine.

1. Specialist search engine

1.1 Prepare a server for a specialist search engine

First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is as follows.

  • RAM: 8GiB
  • SSD/HDD: 100GiB
  • CPU: Dual core processor
  • OS: Ubuntu 18.04
  • Global IP adress or domain
  • SSH login using public key authentication

You can get a qualified server usingSakura Cloud,AWS,GCP orMicrosoft Azure.

1.2 Deploy a specialist search engine to your server using Ansible

Second, deploy a specialist search engine using Ansible. If you don't install Ansible to yourlocal machine, please install it first. You can install Ansible by following commands.

  • Debian/Ubuntu:sudo apt install ansible
  • Mac:brew install ansible

And then clone this repository yourlocal machine by the following command.

~$ git clone https://github.com/kearch/kearch.git

Finally, deploy a specialist search engine using Ansible. Please replace<HOSTNAME> and<USERNAME> depending on your environment. (In most cases,<HOSTNAME> is the IP adress of your server.Don't forget a comma after<HOSTNAME>. ) This takes some time to finish. I recommend you to take a coffee break.

~/kearch$ ansible-playbook sp-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv

1.3 Configuration of your specialist search engine

Please accesshttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.

The default Username and Password are "root" and "password". We strongly recommend you toupdate password immdiately after login.

After updating password, Pleaseset engine name here.

Andset the global IP adress of your server here.

1.4 Set a topic to your specialist search engine and start crawling

Now, you canset a topic to your specialist search engine. There are two way to set a topic. One is using word frequency dictionary (Method A) and the other is using URLs (Method B). You must choose one of them.I think word frequency dictionary (Method A) is better.

1.4.1.A Use word frequency dictionary

You must choose alanguage and then inputword frequencies in your crawling topic andWord frequencies in random topic.

You shoud input characteristic words and their ratio inword frequencies in your crawling topic. If you feel troublesome to input, please have a lookAppendix4. You can find easy way to generate text to input there.

You should input all words and their ratio in the Web inword frequencies in random topic. But it is very difficult. So I recommend you to checkuse default dict.

1.4.1.B Use URLs

You must choose alanguage and input some URLs related your own topic inURLs in your crawling topic. And then, input some URLs about random topics inURLs in random topic.

Though this method is easier than frequency dictionary one, it is rougher. This is because I recommend you to useMethod A.

1.4.2 Start crawling

Then, you can start crawling from some URLs. Please specify some URLs from here.

1.5 Use your specialist search engine

Now, you can use your specialist search engine fromhttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32550.

1.6 Connect your specialist search engine to a meta search engine

There are two cases for connecting a specialist search engine and a meta search engine. One is sending aconnection request from a specialist search and another is sendinf from a meta search engine.

1.6.1.A Connect from your specialist search engine to a meta search engine

In this case, yousend aconnection request from your specialist search engine.

After sending a connection request, the administrator of the meta search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.

1.6.1.B Connect from a meta search engine to your specialist search engine

In this case, youreceive aconnection request from a specialist search engine. When a specialist search engine send a connection request to your meta search engine, it is displayed in this way.

You can approve a connection request just pushingapprove button.

2. Meta search engine

2.1 Prepare a server for a meta search engine

First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is following.

  • RAM: 4GiB
  • SSD/HDD: 100GiB
  • CPU: Dual core processor
  • OS: Ubuntu 18.04
  • Global IP adress or domain
  • SSH login using public key authentication

You can get a qualified server usingSakura Cloud,AWS,GCP orMicrosoft Azure.

2.2 Deploy a meta search engine to your server using Ansible

Second, deploy a meta search engine using Ansible. If you don't install Ansible to yourlocal machine, please install it first. You can install Ansible by following commands.

  • Debian/Ubuntu:sudo apt install ansible
  • Mac:brew install ansible

And then clone this repository yourlocal machine by the following command.

~$ git clone https://github.com/kearch/kearch.git

Finally, deploy a meta search engine using Ansible. Please replace<HOSTNAME> and<USERNAME> depending on your environment. (In most cases,<HOSTNAME> is the IP adress of your server.Don't forget a comma after<HOSTNAME>. ) This takes some time to finish. I recommend you to take a coffee brake.

~/kearch$ ansible-playbook me-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv

2.3 Configuration of your meta search engine

Please accesshttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.

The default Username and Password are "root" and "password". We strongly recommend you toupdate password immdiately after login.

Andset the global IP adress of your server here.

2.4 Connect your meta search engine to a specialist search engine

There are two cases for connecting a meta search engine and a specialist search engine. One is sending aconnection request from a meta search and another is sending from a specialist search engine.

2.4.1.A Connect from your meta search engine to a specialist search engine

In this case, yousend aconnection request from your meta search engine.

After sending a connection request, the administrator of the specialist search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.

2.4.1.B Connect from a meta search engine to your specialist search engine

In this case, youreceive aconnection request from a meta search engine. When a meta search engine send a connection request to your specialist search engine, it is displayed in this way.

You can approve a connection request just pushingapprove button.

2.5 Use your meta search engine

Now, you can use your meta search engine fromhttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32450.

Appendix

Appendix 1. How to deploy kearch to your kubernetes cluster

git clone https://github.com/kearch/kearch.gitcd kearch./sp_deploy.sh spdb spes all./me_deploy.sh medb all

Appendix 2. Port numbers for services

  • 32700: Admin setting page port of specialist search engines
  • 32600: Admin setting page port of meta search engines
  • 32500: Gateway port of specialist search engines
  • 32400: Gateway port of meta search engines
  • 32550: Search engine front page port of specialist search engines
  • 32450: Search engine front page port of meta search engines

Appendix 3. Check your DB in kearch

Check the specialist DB.

./sp_db_checker.sh

Check the meta DB.

./me_db_checker.sh

Appendix 4. Generate word frequencies from URLs

You can generate frequencies from URLs easily usinggenerate_frequencies_from_URLs.py inutils dicrtory.

$ cd utils$ python3 generate_frequencies_from_URLs.py haskell_listhaskell 213language 55programming 43ghc 42...

Please replacehaskell_list with your own URL list and generate your frequencies. URL list is just only a text file of newline-separated URLs.


[8]ページ先頭

©2009-2025 Movatter.jp