Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

License

NotificationsYou must be signed in to change notification settings

andrewstuart/goq

Repository files navigation

Build StatusGoDocCoverage StatusGo Report Card

Example

import ("log""net/http""astuart.co/goq")// Structured representation for github file name tabletypeexamplestruct {Titlestring`goquery:"h1"`Files []string`goquery:"table.files tbody tr.js-navigation-item td.content,text"`}funcmain() {res,err:=http.Get("https://github.com/andrewstuart/goq")iferr!=nil {log.Fatal(err)}deferres.Body.Close()varexexampleerr=goq.NewDecoder(res.Body).Decode(&ex)iferr!=nil {log.Fatal(err)}log.Println(ex.Title,ex.Files)}

Details

goq

--import "astuart.co/goq"

Package goq was built to allow users to declaratively unmarshal HTML into gostructs using struct tags composed of css selectors.

I've made a best effort to behave very similarly to JSON and XML decoding aswell as exposing as much information as possible in the event of an error tohelp you debug your Unmarshaling issues.

When creating struct types to be unmarshaled into, the following general rulesapply:

  • Any type that implements the Unmarshaler interface will be passed a slice of*html.Node so that manual unmarshaling may be done. This takes the highestprecedence.

  • Any struct fields may be annotated with goquery metadata, which takes the formof an element selector followed by arbitrary comma-separated "value selectors."

  • A value selector may be one ofhtml,text, or[someAttrName].html andtext will result in the methods of the same name being called on the*goquery.Selection to obtain the value.[someAttrName] will result in*goquery.Selection.Attr("someAttrName") being called for the value.

  • A primitive value type will default to the text value of the resulting nodesif no value selector is given.

  • At least one value selector is required for maps, to determine the map key.The key type must follow both the rules applicable to go map indexing, as wellas these unmarshaling rules. The value of each key will be unmarshaled in thesame way the element value is unmarshaled.

  • For maps, keys will be retreived from thesame level of the DOM. The keyselector may be arbitrarily nested, though. The first level of children with anynumber of matching elements will be used, though.

  • For maps, any valuesmust be nestedbelow the level of the key selector.Parents or siblings of the element matched by the key selector will not beconsidered.

  • Once used, a "value selector" will be shifted off of the comma-separated list.This allows you to nest arbitrary levels of value selectors. For example, thetype[]map[string][]string would require one selector for the map key, andtake an optional second selector for the values of the string slice.

  • Any struct type encountered in nested types (e.g. map[string]SomeStruct) willoverride any remaining "value selectors" that had not been used. For example,given:

    struct S {F stringgoquery:",[bang]"}

    struct {T map[string]Sgoquery:"#someId,[foo],[bar],[baz]"}

[foo] will be used to determine the string map key,but[bar] and[baz]will be ignored, with the[bang] tag present S struct type taking precedence.

Usage

func NodeSelector

funcNodeSelector(nodes []*html.Node)*goquery.Selection

NodeSelector is a quick utility function to get a goquery.Selection from a sliceof *html.Node. Useful for performing unmarshaling, since the decision was madeto use []*html.Node for maximum flexibility.

func Unmarshal

funcUnmarshal(bs []byte,vinterface{})error

Unmarshal takes a byte slice and a destination pointer to any interface{}, andunmarshals the document into the destination based on the rules above. Any errorreturned here will likely be of type CannotUnmarshalError, though an initialgoquery error will pass through directly.

func UnmarshalSelection

funcUnmarshalSelection(s*goquery.Selection,ifaceinterface{})error

UnmarshalSelection will unmarshal a goquery.goquery.Selection into an interfaceappropriately annoated with goquery tags.

type CannotUnmarshalError

typeCannotUnmarshalErrorstruct {ErrerrorValstringFldOrIdxinterface{}}

CannotUnmarshalError represents an error returned by the goquery Unmarshaler andhelps consumers in programmatically diagnosing the cause of their error.

func (*CannotUnmarshalError) Error

func (e*CannotUnmarshalError)Error()string

type Decoder

typeDecoderstruct {}

Decoder implements the same API you will see in encoding/xml and encoding/jsonexcept that we do not currently support proper streaming decoding as it is notsupported by goquery upstream.

func NewDecoder

funcNewDecoder(r io.Reader)*Decoder

NewDecoder returns a new decoder given an io.Reader

func (*Decoder) Decode

func (d*Decoder)Decode(destinterface{})error

Decode will unmarshal the contents of the decoder when given an instance of anannotated type as its argument. It will return any errors encountered duringeither parsing the document or unmarshaling into the given object.

type Unmarshaler

typeUnmarshalerinterface {UnmarshalHTML([]*html.Node)error}

Unmarshaler allows for custom implementations of unmarshaling logic

TODO

  • Callable goquery methods with args, via reflection

About

A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors2

  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp