Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds

License

NotificationsYou must be signed in to change notification settings

RobertMyles/tidyRSS

Repository files navigation

CRAN_Status_BadgeCRAN_Download_BadgeCRAN_Download_BadgeR-CMD-checkCodecov test coverage

tidyRSS is a package for extracting data fromRSSfeeds, including Atom feeds and JSONfeeds. For geo-type feeds, see the section on changes in version 2below, or jump directly totidygeoRSS, which isdesigned for that purpose.

It is easy to use as it only has one function,tidyfeed(), which takesfive arguments:

  • the url of the feed;
  • a logical flag for whether you want the feed returned as a tibble ora list containing two tibbles;
  • a logical flag for whether you want HTML tags removed from columnsin the dataframe;
  • a config list that is passed off tohttr::GET();
  • and aparse_dates argument, a logical flag, which will attempt toparse dates ifTRUE (see below).

Ifparse_dates isTRUE,tidyfeed() will attempt to parse datesusing theanytime package.Note that this removes some lower-level control that you may wish toretain over how dates are parsed. Seethisissue for an example.

Installation

It can be installed directly fromCRANwith:

install.packages("tidyRSS")

The development version can be installed from GitHub with theremotes package:

remotes::install_github("robertmyles/tidyrss")

Usage

Here is how you can get the contents of theRJournal:

library(tidyRSS)tidyfeed("http://journal.r-project.org/rss.atom")

Changes in version 2.0.0

The biggest change in version 2 is that tidyRSS no longer attempts toparse geo-type feeds intosftibbles. This functionality has been moved totidygeoRSS.

Issues

XML feeds can be finicky things, if you find one that doesn’t work withtidyfeed(), feel free to create anissue with the url ofthe feed that you are trying. Pull Requests are welcome if you’d like totry and fix it yourself. For older RSS feeds, some fields will almostnever be ‘clean’, that is, they will contain things like newlines (\n)or extra quote marks. Cleaning these in a generic way is more or lessimpossible so I suggest you usestringr,strex and/or tools from base Rsuch as gsub to clean these. This will mainly affect theitem_description column of a parsed RSS feed, and will not oftenaffect Atom feeds (and should never be a problem with JSON).

Related

There are two other related packages that I’m aware of:

In comparison to feedeR, tidyRSS returns more information from the RSSfeed (if it exists), and development on rss seems to have stopped sometime ago.

Other

For the schemas used to develop the parsers in this package, see:

I’ve implemented most of the items in the schemas above. The followingare not yet implemented:

Atom meta info:

  • contributor, generator, logo, subtitle

Rss meta info:

  • cloud
  • image
  • textInput
  • skipHours
  • skipDays

About

An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors8

Languages


[8]ページ先頭

©2009-2025 Movatter.jp