Movatterモバイル変換


[0]ホーム

URL:


omniparser

packagemodule
v1.0.5Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2024 License:MITImports:14Imported by:3

Details

Repository

github.com/jf-tech/omniparser

Links

README

omniparser

CIcodecovGo Report CardPkgGoDevMentioned in Awesome Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width,XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON outputbased on a schema written in JSON.

Min Golang Version: 1.16

Licenses and Sponsorship

Omniparser is publicly available underMIT License.Individual and corporate sponsorships are welcome and gratefullyappreciated, and will be listed in theSPONSORS page.Company-level sponsors get additional benefits and supportsgranted in theCOMPANY LICENSE.

Documentation

Docs:

References:

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the.snapshots sub directory, you'll find their corresponding output files.

Online Playground (not functioning)

UseThe Playground (may need to wait for a few seconds for instance to wake up)for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.

As for now (2023/03/14), all of our previous free docker hosting solutions went away and we haven't found another one yet. For now please clone the repo and use./cli.sh as described in theGetting Started page.

Why

  • No good ETL transform/parser library exists in Golang.
  • Even looking into Java and other languages, choices aren't many and all have limitations:
    • Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
    • BeanIO can't deal with EDI input.
    • Jolt can't deal with anything other than JSON input.
    • JSONata still only JSON -> JSON transform.
  • Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in somesituations.

Requirements

  • Golang 1.16 or later.

Recent Major Feature Additions/Changes

  • 2022/09: v1.0.4 released: addedcsv2 file format that supersedes the originalcsv format with support of hierarchical and nested records.
  • 2022/09: v1.0.3 released: addedfixedlength2 file format that supersedes the originalfixed-length format with support of hierarchical and nested envelopes.
  • 1.0.0 Released!
  • AddedTransform.RawRecord() for caller of omniparser to access the raw ingested record.
  • Deprecatedcustom_parse in favor ofcustom_func (custom_parse is still usable forback-compatibility, it is just removed from all public docs and samples).
  • AddedNonValidatingReader EDI segment reader.
  • Added fixed-length file format support in omniv21 handler.
  • Added EDI file format support in omniv21 handler.
  • Major restructure/refactoring
    • Upgrade omni schema version toomni.2.1 due a number of incompatible schema changes:
      • 'result_type' ->'type'
      • 'ignore_error_and_return_empty_str ->'ignore_error'
      • 'keep_leading_trailing_space' ->'no_trim'
    • Changed how we handle custom functions: previously we always use strings as in param type as well as result paramtype. Not anymore, all types are supported for custom function in and out params.
    • Changed the way we package custom functions for extensions: previously we collected custom functions from allextensions and then passed all of them to the extension that is used; this feels weird, now only the customfunctions included in a particular extension are used in that extension.
    • Deprecated/removed most of the custom functions in favor of using 'javascript'.
    • A number of package renaming.
  • Added CSV file format support in omniv2 handler.
  • Introduced IDR node cache for allocation recycling.
  • IntroducedIDR for in-memory data representation.
  • Added trie based high performancetimes.SmartParse.
  • Command line interface (one-offtransform cmd or long-running httpserver mode).
  • javascript engine integration as a custom_func.
  • JSON stream parser.
  • Extensibility:
    • Ability to provide custom functions.
    • Ability to provide custom schema handler.
    • Ability to customize the built-in omniv2 schema handler's parsing code.
    • Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

typeExtension

type Extension struct {CreateSchemaHandlerschemahandler.CreateFuncCreateSchemaHandlerParams interface{}CustomFuncscustomfuncs.CustomFuncs}

Extension allows user of omniparser to add new schema handlers, and/or new custom functionsin addition to the builtin handlers and functions.

typeSchema

type Schema interface {NewTransform(namestring, inputio.Reader, ctx *transformctx.Ctx) (Transform,error)Header()header.HeaderContent() []byte}

Schema is an interface that represents a schema used by omniparser.One instance of Schema is associated with one and only one schema.The instance of Schema can be reused for ingesting and transformingmultiple input files/streams, as long as they are all intended for thesame schema.Each ingestion/transform, however, needs a separate instance ofTransform. A Transform must not be shared and reused across differentinput files/streams.While the same instance of Schema can be shared across multiple threads,Transform is not multi-thread safe. All operations on it must be donewithin the same go routine.

funcNewSchema

func NewSchema(namestring, schemaReaderio.Reader, exts ...Extension) (Schema,error)

NewSchema creates a new instance of Schema. Caller can use the optional Extensions for customization.NewSchema will scan through exts left to right to find the first extension with a schema handler (specifiedby CreateSchemaHandler field) that supports the input schema. If no ext provided or no ext with a handlerthat supports the schema, then NewSchema will fall back to builtin extension (currently for schema version'omni.2.1'). If the input schema is still not supported by builtin extension, NewSchema will fail withErrSchemaNotSupported. Each extension much be fully self-contained meaning all the custom functions itintends to use in the schemas supported by it must be included in the same extension.

typeTransform

type Transform interface {// Read returns a JSON byte slice representing one ingested and transformed record.// io.EOF should be returned when input stream is completely consumed and future calls// to Read should always return io.EOF.// errs.ErrTransformFailed should be returned when a record ingestion and transformation// failed and such failure isn't considered fatal. Future calls to Read will attempt// new record ingestion and transformations.// Any other error returned is considered fatal and future calls to Read will always// return the same error.// Note if returned error isn't nil, then returned []byte will be nil.Read() ([]byte,error)// RawRecord returns the current raw record ingested from the input stream. If the last// Read call failed, or Read hasn't been called yet, it will return an error.RawRecord() (schemahandler.RawRecord,error)}

Transform is an interface that represents one input stream ingestion and transformoperation. An instance of a Transform must not be shared and reused among differentinput streams. An instance of a Transform must not be used across multiple goroutines.

Source Files

View all Source files

Directories

PathSynopsis
extensions

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f orF : Jump to
y orY : Canonical URL
go.dev uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic.Learn more.

[8]ページ先頭

©2009-2025 Movatter.jp