rgrove/parse-xmlPublic

NotificationsYou must be signed in to change notification settings
Fork16
Star306

A fast, safe, compliant XML parser for Node.js and browsers.

License

ISC license

306 stars 16 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.mocharc.js		.mocharc.js
.npmrc		.npmrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
typedoc.js		typedoc.js

Repository files navigation

parse-xml

A fast, safe, compliant XML parser for Node.js and browsers.

Installation

npm install @rgrove/parse-xml

Or, if you like living dangerously, you can loadthe minified bundle in a browser viaUnpkg and use theparseXml global.

Features

Returns a convenientobject tree representing an XML document.
Works great in Node.js and browsers.
Provideshelpful, detailed error messages with context when a document is not well-formed.
Mostly conforms toXML 1.0 (Fifth Edition) as a non-validating parser (seebelow for details).
Passes all relevant tests in theXML Conformance Test Suite.
Written in TypeScript and compiled to ES2020 JavaScript for Node.js and ES2017 JavaScript for browsers. The browser build is also optimized for minification.
Extremelyfast and surprisinglysmall.
Zero dependencies.

Not Features

While this parser is capable of parsing document type declarations (<!DOCTYPE ... >) and including them in the node tree, it doesn't actually do anything with them. External document type definitions won't be loaded, and the parser won't validate the document against a DTD or resolve custom entity references defined in a DTD.

In addition, the only supported character encoding is UTF-8 because it's not feasible (or useful) to support other character encodings in JavaScript.

Examples

Basic Usage

ESM

import{parseXml}from'@rgrove/parse-xml';parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

CommonJS

const{ parseXml}=require('@rgrove/parse-xml');parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

The result is anXmlDocument instance containing the parsed document, with a structure that looks like this (some properties and methods are excluded for clarity; see theAPI docs for details):

{type:'document',children:[{type:'element',name:'kittens',attributes:{fuzzy:'yes'},children:[{type:'text',text:'I like fuzzy kittens.'}],parent:{ ...},isRootNode:true}]}

All parse-xml objects havetoJSON() methods that return JSON-serializable objects, so you can easily convert an XML document to JSON:

letjson=JSON.stringify(parseXml(xml));

Friendly Errors

When something goes wrong, parse-xml throws an error that tells you exactly what happened and shows you where the problem is so you can fix it.

parseXml('<foo><bar>baz</foo>');

Output

Error: Missing end tag for element bar (line 1, column 14)  <foo><bar>baz</foo>               ^

In addition to a helpful message, error objects have the following properties:

columnNumber
Column where the error occurred (1-based).
excerptString
Excerpt from the input string that contains the problem.
lineNumber
Line where the error occurred (1-based).
posNumber
Character position where the error occurred relative to the beginning of the input (0-based).

Why another XML parser?

There are many XML parsers for Node, and some of them are good. However, most of them suffer from one or more of the following shortcomings:

Native dependencies.
Loose, non-standard parsing behavior that can lead to unexpected or even unsafe results when given input the author didn't anticipate.
Kitchen sink APIs that tightly couple a parser with DOM manipulation functions, a stringifier, or other tooling that isn't directly related to parsing and consuming XML.
Stream-based parsing. This is great in the rare case that you need to parse truly enormous documents, but can be a pain to work with when all you want is a node tree.
Poor error handling.
Too big or too Node-specific to work well in browsers.

parse-xml's goal is to be a small, fast, safe, compliant, non-streaming, non-validating, browser-friendly parser, because I think this is an under-served niche.

I think parse-xml demonstrates that it's not necessary to jettison the spec entirely or to write complex code in order to implement a small, fast XML parser.

Also, it was fun.

Benchmark

Here's how parse-xml's performance stacks up against a few comparable libraries:

fast-xml-parser, which claims to be the fastest pure JavaScript XML parser
libxmljs2, which is based on the native libxml library written in C
xmldoc, which is based onsax-js

While libxmljs2 is faster at parsing medium and large documents, its performance comes at the expense of a large C dependency, no browser support, and ahistory of security vulnerabilities in the underlying libxml2 library.

In these results, "ops/s" refers to operations per second. Higher is faster.

Node.js v22.10.0 / Darwin arm64Apple M1 MaxRunning "Small document (291 bytes)" suite...Progress: 100%  @rgrove/parse-xml 4.2.0:    253 082 ops/s, ±0.16%   | fastest  fast-xml-parser 4.5.0:    127 232 ops/s, ±0.44%   | 49.73% slower  libxmljs2 0.35.0 (native):    68 709 ops/s, ±2.77%    | slowest, 72.85% slower  xmldoc 1.3.0 (sax-js):    122 345 ops/s, ±0.15%   | 51.66% slowerFinished 4 cases!  Fastest: @rgrove/parse-xml 4.2.0  Slowest: libxmljs2 0.35.0 (native)Running "Medium document (72081 bytes)" suite...Progress: 100%  @rgrove/parse-xml 4.2.0:    1 350 ops/s, ±0.18%   | 29.5% slower  fast-xml-parser 4.5.0:    560 ops/s, ±0.48%     | slowest, 70.76% slower  libxmljs2 0.35.0 (native):    1 915 ops/s, ±2.64%   | fastest  xmldoc 1.3.0 (sax-js):    824 ops/s, ±0.20%     | 56.97% slowerFinished 4 cases!  Fastest: libxmljs2 0.35.0 (native)  Slowest: fast-xml-parser 4.5.0Running "Large document (1162464 bytes)" suite...Progress: 100%  @rgrove/parse-xml 4.2.0:    109 ops/s, ±0.17%   | 40.11% slower  fast-xml-parser 4.5.0:    48 ops/s, ±0.55%    | slowest, 73.63% slower  libxmljs2 0.35.0 (native):    182 ops/s, ±1.16%   | fastest  xmldoc 1.3.0 (sax-js):    73 ops/s, ±0.50%    | 59.89% slowerFinished 4 cases!  Fastest: libxmljs2 0.35.0 (native)  Slowest: fast-xml-parser 4.5.0

See theparse-xml-benchmark repo for instructions on how to run this benchmark yourself.

License

ISC License

About

A fast, safe, compliant XML parser for Node.js and browsers.

rgrove.github.io/parse-xml

Releases13

v4.2.0 Latest

Oct 25, 2024

+ 12 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

parse-xml

Links

Installation

Features

Not Features

Examples

Basic Usage

Friendly Errors

Why another XML parser?

Benchmark

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases13

Packages

Contributors5

Languages

Movatterモバイル変換

License

rgrove/parse-xml

Folders and files

Latest commit

History

Repository files navigation

parse-xml

Links

Installation

Features

Not Features

Examples

Basic Usage

Friendly Errors

Why another XML parser?

Benchmark

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases13

Packages0

Contributors5

Languages

Packages