Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Rewriting web proxy and archival tool. At this point, it just tries to download all the things.

License

NotificationsYou must be signed in to change notification settings

fake-name/ReadableWebProxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reading long-form content on the internet is a shitty experience.
This is a web-proxy that tries to make it better.

This is arewriting proxy. In other words, it proxies arbitrary webcontent, while allowing the rewriting of the remote content as drivenby a set of rule-files. The goal is to effectively allow the completecustomization of any existing web-sites as driven by predefined rules.

Functionally, it's used for extracting just the actual content bodyof a site and reproducing it in a clean layout. It also modifiesall links on the page to point to internal addresses, so following alink points to the proxied version of the file, rather then the original.


While the above was the original scope, the project has mutated heavily. At thispoint, it has a complete web spider and archives entire websites to local storage.Additionally, multiple versions of each page are kept, with a overall rollingrefresh of the entire database at configurable intervals (configurable on aper-domain, or global basis).

There are also a lot of facilities responsible for feeding the releases/RSS viewsas part of wlnupdates.com.


Quick installation overview:

  • Install Redis
  • (optional) install InfluxDB
  • (optional) install Graphite
  • Install Postgresql>= 10.
  • Build the community extensions for Postgresql.
  • Create a database for the project.
  • In the project database, install thepg_trgm andcitext extensions from thecommunity extensions modules.
  • Copysettings.example.py tosettings.py.
  • Fill in all settings in settings.py
  • Setup virtualhost by runningbuild-venv.sh
  • Activate vhost:source flask/bin/activate
  • Bootstrap DB:alembic uprade head
  • (on another machine/session) Run local fetch RPC serverrun_local.sh fromhttps://github.com/fake-name/AutoTriever
  • Run server:python3 run.py
  • If you want to run the spider, it has a LOT more complicated components:
    • Main scraper is started bypython runScrape.py
    • Raw scraper is started bypython runScrape.py raw
    • Scraper periodic scheduler is started bypython runScrape.py scheduler
    • The scraper requires substantial RPC infrastructure. You will need:
      • A RabbitMQ instance with a public DNS address
      • A machine running saltstack + salt-master with a public DNS addressOn the salt machine, runhttps://github.com/fake-name/AutoTriever/tree/master/marshaller/salt_scheduler.py
      • A variable number of RPC workers to execute fetch tasks. TheAutoTriever project can be used to manage these.
      • A machine to run the RPC local demultiplexing agent (run_agent.sh)The RPC agent allows multiple projects to use the RPC systemsimultaneously. Since the RPC system basically allows executingeither predefined jobs, or arbitrary code on the worker swarm. Thisis fairly useful in general, so I've implemented it as a servicethat multiple of my projects then use.

Ubuntu dependencies

  • postgresql-common libpq-dev libenchant-dev
  • probably more I've forgotten

About

Rewriting web proxy and archival tool. At this point, it just tries to download all the things.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp