- Notifications
You must be signed in to change notification settings - Fork48
A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.
License
trailofbits/graphtage
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Graphtage is a command-line utility andunderlying libraryfor semantically comparing and merging tree-like structures, such as JSON, XML, HTML, YAML, plist, and CSS files. Its name is aportmanteau of “graph” and “graftage”—the latter being the horticultural practice of joining two trees together suchthat they grow as one.
$echo Original:&& cat original.json&&echo Modified:&& cat modified.json
Original:{"foo": [1,2,3,4],"bar":"testing"}Modified:{"foo": [2,3,4,5],"zab":"testing","woo": ["foobar"]}
$graphtage original.json modified.json
{"z̟b̶ab̟r̶":"testing","foo": [1̶,̶2,3,4,̟5̟ ],̟"̟w̟o̟o̟"̟:̟ ̟[̟"̟f̟o̟o̟b̟a̟r̟"̟]̟}
$pip3 install graphtage
Graphtage performs an analysis on an intermediate representation of the trees that is divorced from the filetypes of theinput files. This means, for example, that you can diff a JSON file against a YAML file. Also, the output format can bedifferent from the input format(s). By default, Graphtage will format the output diff in the same file format as thefirst input file. But one could, for example, diff two JSON files and format the output in YAML. There are severalcommand-line arguments to specify these transformations, such as--format
; please check the--help
output for moreinformation.
By default, Graphtage pretty-prints its output with as many line breaks and indents as possible.
{"foo": [1,2,3 ],"bar":"baz"}
Use the--join-lists
or-jl
option to suppress linebreaks after list items:
{"foo": [1,2,3],"bar":"baz"}
Likewise, use the--join-dict-items
or-jd
option to suppress linebreaks after key/value pairs in a dict:
{"foo": [1,2,3],"bar":"baz"}
Use--condensed
or-j
to apply both of these options:
{"foo": [1,2,3],"bar":"baz"}
The--only-edits
or-e
option will print out a list of edits rather than applying them to the input file in place.
The--edit-digest
or-d
option is like--only-edits
but prints a more concise context for each edit that is morehuman-readable.
By default, Graphtage tries to match all possible pairs of elements in a dictionary.
Matching two dictionaries with each other is hard. Although computationally tractable, this can sometimes be onerous forinput files with huge dictionaries. Graphtage has three different strategies for matching dictionaries:
--dict-strategy match
(the most computationally expensive) tries to match all pairs of keys and values between thetwo dictionaries, resulting in a match of minimum edit distance;--dict-strategy none
(the least computationally expensive) will not attempt to match any key/value pairs unlessthey have the exact same key; and--dict-strategy auto
(the default) will automatically match the values of any key-value pairs that have identicalkeys and then use thematch
strategy for the remainder of key/value pairs.
SeePull Request #51 for some examples of how these strategiesaffect output.
The--no-list-edits
or-l
option will not consider interstitial insertions and removals when comparing two lists.The--no-list-edits-when-same-length
or-ll
option is a less drastic version of-l
that will behave normally forlists that are of different lengths but behave like-l
for lists that are of the same length.
By default, Graphtage will only use ANSI color in its output if it is run from a TTY. If, for example, you would liketo have Graphtage emit colorized output from a script or pipe, use the--color
or-c
argument. To disable color evenwhen running on a TTY, use--no-color
.
Graphtage can optionally emit the diff in HTML with the--html
option.
$graphtage --html original.json modified.json> diff.html
By default, Graphtage prints status messages and a progress bar to STDERR. To suppress this, use the--no-status
option. To additionally suppress all but critical log messages, use--quiet
. Fine-grained control of log messages isvia the--log-level
option.
Diffing tree-like structures with unordered elements is tough. Say you want to compare two JSON files.There arelimited tools available, which are effectively equivalent tocanonicalizing the JSON (e.g., sorting dictionary elements by key) and performing a standard diff. This is not alwayssufficient. For example, if a key in a dictionary is changed but its value is not, a traditional diffwill conclude that the entire key/value pair was replaced by the new one, even though the only change was the keyitself. Seeour documentation for more information.
Graphtage has a complete API for programmatically operating its diffing capabilities.When using Graphtage as a library, it is also capable of diffing in-memory Python objects.This can be useful for debugging Python code, for example, to determine a differential between two objects.Seeour documentation for more information.
Graphtage is designed to be extensible: New filetypes can easily be defined, as well as new node types, edit types,formatters, and printers. Seeour documentation formore information.
Complete API documentation is availablehere.
This research was developed byTrail of Bits with partial funding from the DefenseAdvanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor toGalois.It is licensed under theGNU Lesser General Public License v3.0.Contact us if you're looking for an exception to the terms.© 2020–2023, Trail of Bits.
About
A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.