lmr97/letterboxd_get_listPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

A CLI program to collect arbitrary film information from a list on Letterboxd.com, powered by a versatile LetterboxdFilm class.

License

MIT license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
LetterboxdFilm.py		LetterboxdFilm.py
README.md		README.md
get_list.py		get_list.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Repository files navigation

letterboxd_get_list

This repo has a CLI program that can convert any Letterboxd list into a CSV file, and get any desired information for each film in the list (title and year are added by default). It also has aLetterboxdFilm class definition (upon whichget_list.py depends), which allows for quick access to film information off Letterboxd given the URL to the film.

Please note: this program will take a while for long lists. Since I have no API access for Letterboxd, the program has to make 1-2 GET requests for each film on the list (at least one for any info, with a separate one for statistics). I've optimized as best I can given this, and I've provided a progress bar with the estimated time remaining on it so at least expectations can be set appropriately.

Installation

Clone the respoitory:

git clone https//github.com/lmr97/letterboxd_get_listcd letterboxd_get_list

Then install dependencies with Poetry:

poetry install

Then run with Poetry. For example:

poetry run python get_list.py     --list-url https://letterboxd.com/crew/list/2024-highest-rated-films/     --attributes director watches avg_rating     --output-file 2024-highest-rated.csv

Docker container

There is a GitHub Package for this repo calledletterboxd_get_list which runs out of a Docker container (linked in the Packages section in the sidebar). You can pull the image with the command below, or build it yourself with the Dockerfile included here.

docker pull ghcr.io/lmr97/letterboxd_get_list:latest

For easy retrieval of output files, I would recommend binding a folder to the container when you run it, like so:

mkdir OutputFiles docker run -d \    --name get-lb-list \    -v "${pwd}"/OutputFiles:/home/runner/OutputFiles    letterboxd_get_list:latest

`get_list.py` CLI usage

get_list.py [-h] -u, --list-url LIST_URL                [-a, --attributes VALID_ATTRIBUTE [...]]                [-o, --output-file OUTPUT_FILE]

Abbreviated options are accepted as well. In a bit more detail:

Option	Descriptions
`--help`,`-h`	Print usage and help.
`--list-url`,`-u`	(Required) The URL for the list on Letterboxd you'd like to convert to a CSV file.
`--attributes`,`-a`	(Optional) A series 1 or more of kinds of information about each film you would like included in the output, from the list of valid attributes below.
`--output-file`,`-o`	(Optional) A path/file to place the output. If none is given, this option will default to a filename will default to the last part of the URL, with`.csv` at the end, placed in the working directory (e.g. for`https://letterboxd.com/user/list/name-of-list/`, the file name would be`name-of-list.csv`).

The valid attribute arguments are as follows:

actor
additional-directing
additional-photography
art-direction
assistant-director
avg_rating
camera-operator
casting
choreography
cinematography
composer
costume-design
country
director
editor
executive-producer
genre
hairstyling
language
lighting
likes
makeup
mini-theme
original-writer
producer
production-design
set-decoration
songs
sound
special-effects
special-effects
studio
stunts
theme
title-design
visual-effects
watches
writer

Additional notes about output formatting:

For ranked lists, the rank is added as the first column. For non-ranked lists, this column is omitted.
For films where a given attribute is not found on the page, the program writes "Not listed" in that cell (it does not crash).
If an attribute has multiple values to it (e.g. the film has 3 directors), each element in that attribute will be separated by a;. In the case of casting, the key-value pairs (seeget_casting() heading below) will be separated by a semicolon as well, with the key separated from the value by a colon as is convention.
For films that have a comma in the title, the comma is removed so as to not mess up the CSV file.

A loading bar will display to show the progress, and once the program has written to the output file, it will printRetrieval complete! and terminate. The first few lines of the CSV that results from the above command is shown below:

Rank,Title,Year,Director,Watches,Avg_rating1,"Dune: Part Two",2024,Denis Villeneuve,2628745,4.412,"I'm Still Here",2024,Walter Salles,424304,4.343,"How to Make Millions Before Grandma Dies",2024,Pat Boonnitipat,105415,4.334,"Look Back",2024,Kiyotaka Oshiyama,258750,4.275,"Sing Sing",2023,Greg Kwedar,165271,4.27...

`LetterboxdFilm` class

This class takes care of all the requesting and parsing of HTML files for a give film's Letterboxd page, so the information is easily accessible to the user. Object initialization only requires the URL of the film. This After initialization, the following information is available as class attributes:

Title
Year
Film page URL
Film page HTML

These attributes are allstrs. Any other information of the film comes through class methods that query the HTML via CSS selectors, a kind of lazy evaluation to save initalization time and storage space. I intended it to be as intuitive as possible, but I feel the methods below warant further description:

`get_tabbed_attribute(attribute)`

Returns:list

This method returns data from the tabbed section of a Letterboxd film page where the cast, crew, details, genres, and releases info is (except for the Releases tab, as this section follows a different structure).

It will return["Not listed"] (a list with that string as its only element) if the attribute was not found for the given film, whether it was a valid attribute or not. An invalid argument warning is printed after a ValueError is raised if the attribute is not valid.

Always use the full, singular form of the attribute you'd like, and replace each space with a- (ASCII 45).

See example below:

>>> film = LetterboxdFilm("https://letterboxd.com/film/rango")>>> film.get_tabbed_attribute("assistant-director")>>> ['Adam Somner', 'Ian Calip']

`get_casting()`

Returns:dict

This method returns a dictionary that encodes the casting of a film, with the actor's names as keys, and the characters they play as values.

Feedback

Feel free to let me know if anything is going wrong as you use the program or class, don't hesitate to open a GitHub issue for it on this repository. If there is some functionality you'd like to see added, fork the repo, and submit a pull request here.

About

A CLI program to collect arbitrary film information from a list on Letterboxd.com, powered by a versatile LetterboxdFilm class.

Releases

No releases published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

letterboxd_get_list

Installation

Docker container

`get_list.py` CLI usage

`LetterboxdFilm` class

`get_tabbed_attribute(attribute)`

`get_casting()`

Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

lmr97/letterboxd_get_list

Folders and files

Latest commit

History

Repository files navigation

letterboxd_get_list

Installation

Docker container

get_list.py CLI usage

LetterboxdFilm class

get_tabbed_attribute(attribute)

get_casting()

Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

`get_list.py` CLI usage

`LetterboxdFilm` class

`get_tabbed_attribute(attribute)`

`get_casting()`

Packages