- Notifications
You must be signed in to change notification settings - Fork22
An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds
License
RobertMyles/tidyRSS
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
tidyRSS is a package for extracting data fromRSSfeeds, including Atom feeds and JSONfeeds. For geo-type feeds, see the section on changes in version 2below, or jump directly totidygeoRSS, which isdesigned for that purpose.
It is easy to use as it only has one function,tidyfeed(), which takesfive arguments:
- the url of the feed;
- a logical flag for whether you want the feed returned as a tibble ora list containing two tibbles;
- a logical flag for whether you want HTML tags removed from columnsin the dataframe;
- a config list that is passed off to
httr::GET(); - and a
parse_datesargument, a logical flag, which will attempt toparse dates ifTRUE(see below).
Ifparse_dates isTRUE,tidyfeed() will attempt to parse datesusing theanytime package.Note that this removes some lower-level control that you may wish toretain over how dates are parsed. Seethisissue for an example.
It can be installed directly fromCRANwith:
install.packages("tidyRSS")The development version can be installed from GitHub with theremotes package:
remotes::install_github("robertmyles/tidyrss")Here is how you can get the contents of theRJournal:
library(tidyRSS)tidyfeed("http://journal.r-project.org/rss.atom")The biggest change in version 2 is that tidyRSS no longer attempts toparse geo-type feeds intosftibbles. This functionality has been moved totidygeoRSS.
XML feeds can be finicky things, if you find one that doesn’t work withtidyfeed(), feel free to create anissue with the url ofthe feed that you are trying. Pull Requests are welcome if you’d like totry and fix it yourself. For older RSS feeds, some fields will almostnever be ‘clean’, that is, they will contain things like newlines (\n)or extra quote marks. Cleaning these in a generic way is more or lessimpossible so I suggest you usestringr,strex and/or tools from base Rsuch as gsub to clean these. This will mainly affect theitem_description column of a parsed RSS feed, and will not oftenaffect Atom feeds (and should never be a problem with JSON).
There are two other related packages that I’m aware of:
In comparison to feedeR, tidyRSS returns more information from the RSSfeed (if it exists), and development on rss seems to have stopped sometime ago.
For the schemas used to develop the parsers in this package, see:
- RSS:https://validator.w3.org/feed/docs/rss2.html
- Atom:https://validator.w3.org/feed/docs/atom.html
- JSON:https://jsonfeed.org/version/1
I’ve implemented most of the items in the schemas above. The followingare not yet implemented:
Atom meta info:
- contributor, generator, logo, subtitle
Rss meta info:
- cloud
- image
- textInput
- skipHours
- skipDays
About
An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors8
Uh oh!
There was an error while loading.Please reload this page.
