Movatterモバイル変換

rajatomar788/pywebcopyPublic

NotificationsYou must be signed in to change notification settings
Fork117
Star635

Locally saves webpages to your hard disk with images, css, js & links as is.

rajatomar788.github.io/pywebcopy/

License

View license

635 stars 117 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github/workflows		.github/workflows
docs		docs
pywebcopy		pywebcopy
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

    ____       _       __     __    ______                     _____      / __ \__  _| |     / /__  / /_  / ____/___  ____  __  __   /__  /     / /_/ / / / / | /| / / _ \/ __ \/ /   / __ \/ __ \/ / / /     / /     / ____/ /_/ /| |/ |/ /  __/ /_/ / /___/ /_/ / /_/ / /_/ /     / /     /_/    \__, / |__/|__/\___/_.___/\____/\____/ .___/\__, /     /_/            /____/                               /_/    /____/

Created By : Raja TomarLicense : Apache License 2.0Email: rajatomar788@gmail.com

PyWebCopy is a free tool for copying full or partial websites locallyonto your hard-disk for offline viewing.

PyWebCopy will scan the specified website and download its content onto your hard-disk.Links to resources such as style-sheets, images, and other pages in the websitewill automatically be remapped to match the local path.Using its extensive configuration you can define which parts of a website will be copied and how.

What can PyWebCopy do?

PyWebCopy will examine the HTML mark-up of a website and attempt to discover all linked resourcessuch as other pages, images, videos, file downloads - anything and everything.It will download all of theses resources, and continue to search for more.In this manner, WebCopy can "crawl" an entire website and download everything it seesin an effort to create a reasonable facsimile of the source website.

What can PyWebCopy not do?

PyWebCopy does not include a virtual DOM or any form of JavaScript parsing.If a website makes heavy use of JavaScript to operate, it is unlikely PyWebCopy will be ableto make a true copy if it is unable to discover all of the website due toJavaScript being used to dynamically generate links.

PyWebCopy does not download the raw source code of a web site,it can only download what the HTTP server returns.While it will do its best to create an offline copy of a website,advanced data driven websites may not work as expected once they have been copied.

Installation

pywebcopy is available on PyPi and is easily installable usingpip

$ pip install pywebcopy

You are ready to go. Read the tutorials below to get started.

First steps

You should always check if the latest pywebcopy is installed successfully.

>>> import pywebcopy>>> pywebcopy.__version___7.x.x

Your version may be different, now you can continue the tutorial.

Basic Usages

To save any single page, just type in python console

frompywebcopyimportsave_webpagesave_webpage(url="https://httpbin.org/",project_folder="E://savedpages//",project_name="my_site",bypass_robots=True,debug=True,open_in_browser=True,delay=None,threaded=False,)

To save full website (This could overload the target server, So, be careful)

frompywebcopyimportsave_websitesave_website(url="https://httpbin.org/",project_folder="E://savedpages//",project_name="my_site",bypass_robots=True,debug=True,open_in_browser=True,delay=None,threaded=False,)

Running Tests

Running tests is simple and doesn't require any external library.Just run this command from root directory of pywebcopy package.

$ python -m pywebcopy -t

Command Line Interface

pywebcopy have a very easy to use command-line interface whichcan help you do task without having to worrying about the innerlong way.

Getting list of commands
```
$ python -m pywebcopy --help
```

Using CLI

Usage: pywebcopy [-p|--page|-s|--site|-t|--tests] [--url=URL [,--location=LOCATION [,--name=NAME [,--pop [,--bypass_robots [,--quite [,--delay=DELAY]]]]]]]Python library to clone/archive pages or sites from the Internet.Options:  --version             show program's version number and exit  -h, --help            show this help message and exit  --url=URL             url of the entry point to be retrieved.  --location=LOCATION   Location where files are to be stored.  -n NAME, --name=NAME  Project name of this run.  -d DELAY, --delay=DELAY                        Delay between consecutive requests to the server.  --bypass_robots       Bypass the robots.txt restrictions.  --threaded            Use threads for faster downloading.  -q, --quite           Suppress the logging from this library.  --pop                 open the html page in default browser window after                        finishing the task.  CLI Actions List:    Primary actions available through cli.    -p, --page          Quickly saves a single page.    -s, --site          Saves the complete site.    -t, --tests         Runs tests for this library.

Running tests
```
  $ python -m pywebcopy run_tests
```

Authentication and Cookies

Most of the time authentication is needed to access a certain page.Its real easy to authenticate withpywebcopy because it uses anrequests.Session object for base http activity which can be accessedthroughWebPage.session attribute. And as you know thereare ton of tutorials on setting up authentication withrequests.Session.

Here is an example to fill forms

frompywebcopy.configsimportget_configconfig=get_config('http://httpbin.org/')wp=config.create_page()wp.get(config['project_url'])form=wp.get_forms()[0]form.inputs['email'].value='bar'# etcform.inputs['password'].value='baz'# etcwp.submit_form(form)wp.get_links()

You can read more in the github repositoriesdocs folder.

About

Locally saves webpages to your hard disk with images, css, js & links as is.

rajatomar788.github.io/pywebcopy/

Code of conduct

Contributing

Activity

Stars

635 stars

Watchers

9 watching

Forks

117 forks

Report repository

Releases3

7.1 Latest

May 13, 2025

+ 2 releases

Contributors11

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

What can PyWebCopy do?

What can PyWebCopy not do?

Installation

First steps

Basic Usages

Running Tests

Command Line Interface

Getting list of commands

Using CLI

Running tests

Authentication and Cookies

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases3

Uh oh!

Contributors11

Uh oh!

Languages

Movatterモバイル変換

License

rajatomar788/pywebcopy

Folders and files

Latest commit

History

Repository files navigation

What can PyWebCopy do?

What can PyWebCopy not do?

Installation

First steps

Basic Usages

Running Tests

Command Line Interface

Getting list of commands

Using CLI

Running tests

Authentication and Cookies

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases3

Uh oh!

Contributors11

Uh oh!

Languages