- Notifications
You must be signed in to change notification settings - Fork2
kearch/kearch
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
kearch is a distributed search engine. You can set up your own search engine using kearch and connect your search engine to another search engine.
You can access our search engine fromhttps://kearch.info.
There are two types of search engines in kearch. One isspecialist search engine and the other ismeta search engine. Aspecialist search engine is a specialized search engine for a topic. For example, a search engine for history, programming language ... anything you want.
On the other hand, ameta search engine is used for connecting specialized search engines. You can conect any specialist search engines using a meta search engine. For example, you can get search engine about some programming languages when you connect specialized search engines about Lisp, Haskell, C#, etc..
If you want to set up your ownspecialist search engine, please read from1. Specialist search engine. If you want to set up your ownmeta search engine, please read from2. Meta search engine.
First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is as follows.
- RAM: 8GiB
- SSD/HDD: 100GiB
- CPU: Dual core processor
- OS: Ubuntu 18.04
- Global IP adress or domain
- SSH login using public key authentication
You can get a qualified server usingSakura Cloud,AWS,GCP orMicrosoft Azure.
Second, deploy a specialist search engine using Ansible. If you don't install Ansible to yourlocal machine, please install it first. You can install Ansible by following commands.
- Debian/Ubuntu:
sudo apt install ansible
- Mac:
brew install ansible
And then clone this repository yourlocal machine by the following command.
~$ git clone https://github.com/kearch/kearch.git
Finally, deploy a specialist search engine using Ansible. Please replace<HOSTNAME>
and<USERNAME>
depending on your environment. (In most cases,<HOSTNAME>
is the IP adress of your server.Don't forget a comma after<HOSTNAME>
. ) This takes some time to finish. I recommend you to take a coffee break.
~/kearch$ ansible-playbook sp-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv
Please accesshttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.
The default Username and Password are "root" and "password". We strongly recommend you toupdate password immdiately after login.
After updating password, Pleaseset engine name here.
Andset the global IP adress of your server here.
Now, you canset a topic to your specialist search engine. There are two way to set a topic. One is using word frequency dictionary (Method A) and the other is using URLs (Method B). You must choose one of them.I think word frequency dictionary (Method A) is better.
You must choose alanguage and then inputword frequencies in your crawling topic andWord frequencies in random topic.
You shoud input characteristic words and their ratio inword frequencies in your crawling topic. If you feel troublesome to input, please have a lookAppendix4. You can find easy way to generate text to input there.
You should input all words and their ratio in the Web inword frequencies in random topic. But it is very difficult. So I recommend you to checkuse default dict.
You must choose alanguage and input some URLs related your own topic inURLs in your crawling topic. And then, input some URLs about random topics inURLs in random topic.
Though this method is easier than frequency dictionary one, it is rougher. This is because I recommend you to useMethod A.
Then, you can start crawling from some URLs. Please specify some URLs from here.
Now, you can use your specialist search engine fromhttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32550.
There are two cases for connecting a specialist search engine and a meta search engine. One is sending aconnection request from a specialist search and another is sendinf from a meta search engine.
In this case, yousend aconnection request from your specialist search engine.
After sending a connection request, the administrator of the meta search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.
In this case, youreceive aconnection request from a specialist search engine. When a specialist search engine send a connection request to your meta search engine, it is displayed in this way.
You can approve a connection request just pushingapprove button.
First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is following.
- RAM: 4GiB
- SSD/HDD: 100GiB
- CPU: Dual core processor
- OS: Ubuntu 18.04
- Global IP adress or domain
- SSH login using public key authentication
You can get a qualified server usingSakura Cloud,AWS,GCP orMicrosoft Azure.
Second, deploy a meta search engine using Ansible. If you don't install Ansible to yourlocal machine, please install it first. You can install Ansible by following commands.
- Debian/Ubuntu:
sudo apt install ansible
- Mac:
brew install ansible
And then clone this repository yourlocal machine by the following command.
~$ git clone https://github.com/kearch/kearch.git
Finally, deploy a meta search engine using Ansible. Please replace<HOSTNAME>
and<USERNAME>
depending on your environment. (In most cases,<HOSTNAME>
is the IP adress of your server.Don't forget a comma after<HOSTNAME>
. ) This takes some time to finish. I recommend you to take a coffee brake.
~/kearch$ ansible-playbook me-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv
Please accesshttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.
The default Username and Password are "root" and "password". We strongly recommend you toupdate password immdiately after login.
Andset the global IP adress of your server here.
There are two cases for connecting a meta search engine and a specialist search engine. One is sending aconnection request from a meta search and another is sending from a specialist search engine.
In this case, yousend aconnection request from your meta search engine.
After sending a connection request, the administrator of the specialist search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.
In this case, youreceive aconnection request from a meta search engine. When a meta search engine send a connection request to your specialist search engine, it is displayed in this way.
You can approve a connection request just pushingapprove button.
Now, you can use your meta search engine fromhttp://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32450.
git clone https://github.com/kearch/kearch.gitcd kearch./sp_deploy.sh spdb spes all./me_deploy.sh medb all
- 32700: Admin setting page port of specialist search engines
- 32600: Admin setting page port of meta search engines
- 32500: Gateway port of specialist search engines
- 32400: Gateway port of meta search engines
- 32550: Search engine front page port of specialist search engines
- 32450: Search engine front page port of meta search engines
Check the specialist DB.
./sp_db_checker.sh
Check the meta DB.
./me_db_checker.sh
You can generate frequencies from URLs easily usinggenerate_frequencies_from_URLs.py
inutils
dicrtory.
$ cd utils$ python3 generate_frequencies_from_URLs.py haskell_listhaskell 213language 55programming 43ghc 42...
Please replacehaskell_list
with your own URL list and generate your frequencies. URL list is just only a text file of newline-separated URLs.