- Notifications
You must be signed in to change notification settings - Fork12
Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖
License
karlicoss/rexport
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
- The easiest way is
pip3 install --user git+https://github.com/karlicoss/rexport
.Alternatively, use
git clone --recursive
, orgit pull && git submodule update --init
. After that, you can usepip3 install --editable .
. - To use the API, you need to register acustom ‘personal script’ app and get
client_id
andclient_secret
parameters.See morehere.
- To access user’s personal data (e.g. saved posts/comments), Reddit API also requires
username
andpassword
.Yes, unfortunately it wants your plaintext Reddit password, you can read more about ithere.
Usage:
Recommended: createsecrets.py
keeping your api parameters, e.g.:
username = "USERNAME"password = "PASSWORD"client_id = "CLIENT_ID"client_secret = "CLIENT_SECRET"
If you have two-factor authentication enabled, append the six-digit 2FA token to the password, separated by a colon:
password = "PASSWORD:343642"
The token will, however, be short-lived.
After that, use:
python3 -m rexport.export --secrets /path/to/secrets.py
That way you type less and have control over where you keep your plaintext secrets.
Alternatively, you can pass parameters directly, e.g.
python3 -m rexport.export --username <username> --password <password> --client_id <client_id> --client_secret <client_secret>
However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.
You can also importexport.py
as a module and callget_json
function directly to get raw JSON.
Ihighly recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!
WARNING: reddit APIlimits your queries to 1000 entries.
Ihighly recommend to back up regularly and keep old exports. Easy way to achieve it is command like this:
python3 -m rexport.export --secrets /path/to/secrets.py >"export-$(date -I).json"
Or, you can usearctee that automates this.
Check out these links if you’re interested in getting older data that’s inaccessible by API:
- comment by /u/binkarus
- Reddit admis say that the rationale behind the API limitation is performance and caching
- perhaps you can request all of your data underGDPR? I haven’t tried that personally though.
- pushshift can help you retrieve old data. You can usepurarue/pushshift_comment_export to get the data.
Seeexample-output.json, it’s got some example data you might find in your data export. I’ve cleaned it up a bit as it’s got lots of different fields many of which are probably not relevant.
However, this is pretty API dependent and changes all the time, so better check withReddit API if you are looking to something specific.
You can userexport.dal
(stands for “Data Access/Abstraction Layer”) to access your exported data, even offline. I elaborate on motivation behind ithere.
- main usecase is to be imported as python module to allow forprogrammatic access to your data.
You can find some inspiration in=my.= package that I’m using as an API to all my personal data.
- to test it against your export, simply run:
python3 -m rexport.dal --source /path/to/export
- you can also try it interactively:
python3 -m rexport.dal --source /path/to/export --interactive
Example output:
Your most saved subreddits:[('orgmode', 50), ('emacs', 36), ('QuantifiedSelf', 33), ('AskReddit', 33), ('selfhosted', 29)]