- Notifications
You must be signed in to change notification settings - Fork0
mardukbp/daff
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a library for comparing tables, producing a summary of theirdifferences, and using such a summary as a patch file. It isoptimized for comparing tables that share a common origin, in otherwords multiple versions of the "same" table.
For a live demo, see:
Install the library for your favorite language:
npm install daff -g# node/javascriptpip install daff# pythongem install daff# rubycomposer require paulfitz/daff-php# phpinstall.packages('daff')# R wrapper by Edwin de Jongebower install daff# web/javascript
Other translations are available here:
Or use the library to view csv diffs on github via a chrome extension:
The diff format used bydaff
is specified here:
This library is a stripped down version of the coopy toolbox (seehttp://share.find.coop). To compare tables from different origins,or with automatically generated IDs, or other complications, check outthe coopy toolbox.
You can rundaff
/daff.py
/daff.rb
as a utility program:
$ daffdaff can produce and apply tabular diffs.Call as: daff a.csv b.csv daff [--color] [--no-color] [--output OUTPUT.csv] a.csv b.csv daff [--output OUTPUT.html] a.csv b.csv daff [--www] a.csv b.csv daff parent.csv a.csv b.csv daff --input-format sqlite a.db b.db daff patch [--inplace] a.csv patch.csv daff merge [--inplace] parent.csv a.csv b.csv daff trim [--output OUTPUT.csv] source.csv daff render [--output OUTPUT.html] diff.csv daff copy in.csv out.tsv daff in.csv daff git daff versionThe --inplace option to patch and merge will result in modification of a.csv.If you need more control, here is the full list of flags: daff diff [--output OUTPUT.csv] [--context NUM] [--all] [--act ACT] a.csv b.csv --act ACT: show only a certain kind of change (update, insert, delete, column) --all: do not prune unchanged rows or columns --all-rows: do not prune unchanged rows --all-columns: do not prune unchanged columns --color: highlight changes with terminal colors (default in terminals) --context NUM: show NUM rows of context (0=none) --context-columns NUM: show NUM columns of context (0=none) --fail-if-diff: return status is 0 if equal, 1 if different, 2 if problem --id: specify column to use as primary key (repeat for multi-column key) --ignore: specify column to ignore completely (can repeat) --index: include row/columns numbers from original tables --input-format [csv|tsv|ssv|psv|json|sqlite]: set format to expect for input --eol [crlf|lf|cr|auto]: separator between rows of csv output. --no-color: make sure terminal colors are not used --ordered: assume row order is meaningful (default for CSV) --output-format [csv|tsv|ssv|psv|json|copy|html]: set format for output --padding [dense|sparse|smart]: set padding method for aligning columns --table NAME: compare the named table, used with SQL sources. If name changes, use 'n1:n2' --unordered: assume row order is meaningless (default for json formats) -w / --ignore-whitespace: ignore changes in leading/trailing whitespace -i / --ignore-case: ignore differences in case daff render [--output OUTPUT.html] [--css CSS.css] [--fragment] [--plain] diff.csv --css CSS.css: generate a suitable css file to go with the html --fragment: generate just a html fragment rather than a page --plain: do not use fancy utf8 characters to make arrows prettier --unquote: do not quote html characters in html diffs --www: send output to a browser
Formats supported are CSV, TSV, Sqlite (with--input-format sqlite
orthe.sqlite
extension), and ndjson.
Rundaff git csv
to install daff as a diff and merge handlerfor*.csv
files in your repository. Rundaff git
for instructionson doing this manually. Your CSV diffs and merges will get smarter,since git will suddenly understand about rows and columns, not just lines:
You can usedaff
as a library from any supported language. We takehere the example of Javascript. To usedaff
on a webpage,first includedaff.js
:
<scriptsrc="daff.js"></script>
Or if using node outside the browser:
vardaff=require('daff');
For concreteness, assume we have two versions of a table,data1
anddata2
:
vardata1=[['Country','Capital'],['Ireland','Dublin'],['France','Paris'],['Spain','Barcelona']];vardata2=[['Country','Code','Capital'],['Ireland','ie','Dublin'],['France','fr','Paris'],['Spain','es','Madrid'],['Germany','de','Berlin']];
To make those tables accessible to the library, we wrap themindaff.TableView
:
vartable1=newdaff.TableView(data1);vartable2=newdaff.TableView(data2);
We can now compute the alignment between the rows and columnsin the two tables:
varalignment=daff.compareTables(table1,table2).align();
To produce a diff from the alignment, we first need a tablefor the output:
vardata_diff=[];vartable_diff=newdaff.TableView(data_diff);
Using default options for the diff:
varflags=newdaff.CompareFlags();varhighlighter=newdaff.TableDiff(alignment,flags);highlighter.hilite(table_diff);
The diff is now indata_diff
in highlighter format, seespecification here:
[['!','','+++',''],['@@','Country','Code','Capital'],['+','Ireland','ie','Dublin'],['+','France','fr','Paris'],['->','Spain','es','Barcelona->Madrid'],['+++','Germany','de','Berlin']]
For visualization, you may want to convert this to a HTML tablewith appropriate classes on cells so you can color-code inserts,deletes, updates, etc. You can do this with:
vardiff2html=newdaff.DiffRender();diff2html.render(table_diff);vartable_diff_html=diff2html.html();
For 3-way differences (that is, comparing two tables given knowledgeof a common ancestor) usedaff.compareTables3
(give ancestortable as the first argument).
Here is how to apply that difference as a patch:
varpatcher=newdaff.HighlightPatch(table1,table_diff);patcher.apply();// table1 should now equal table2
For other languages, you should find sample code inthe packages on theReleases page.
Thedaff
library is written inHaxe, whichcan be translated reasonably well into at least the following languages:
- Javascript
- Python
- Java
- C#
- C++
- Ruby (using anunofficial haxe target developed for
daff
) - PHP
Some translations are done for you on theReleases page.To make another translation, or to compile from sourcefirst follow theHaxe language introduction for thelanguage you care about. At the time of writing, if you are on OSX, you shouldinstall haxe usingbrew install haxe
. Then do one of:
make jsmake phpmake pymake javamake csmake cpp
For each language, thedaff
library expects to be handed an interface to tables you create, rather than creating themitself. This is to avoid inefficient copies from one format to another. You'll find aSimpleTable
class you can use ifyou find this awkward.
Other possibilities:
- There's a daff wrapper for R written byEdwin de Jonge, seehttps://github.com/edwindj/daff andhttp://cran.r-project.org/web/packages/daff
- There's a hand-written ruby port byJames Smith, seehttps://github.com/theodi/coopy-ruby
- You can browse the
daff
classes athttp://paulfitz.github.io/daff-doc/
- https://specs.frictionlessdata.io/tabular-diff : a specification of the diff format we use.
- http://theodi.org/blog/csvhub-github-diffs-for-csv-files : using this library with github.
- ropensci/unconf15#19 : a thread about diffing data in which daff shows up in at least four guises (see if you can spot them all).
- http://theodi.org/blog/adapting-git-simple-data : using this library with gitlab.
- http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html : a summary of where the library came from.
- http://blog.okfn.org/2013/07/02/git-and-github-for-data/ : a post about storing small data in git/github.
- http://blog.ouseful.info/2013/08/27/diff-or-chop-github-csv-data-files-and-openrefine/ : counterpoint - a post discussing tracked-changes rather than diffs.
- http://blog.byronjsmith.com/makefile-shortcuts.html : a tutorial on using
make
for data, with daff in the mix. "Since git considers changes on a per-line basis,looking at diffs of comma-delimited and tab-delimited files can get obnoxious. The program daff fixes this problem."
daff is distributed under the MIT License.
About
align and compare tables
Resources
License
Stars
Watchers
Forks
Packages0
Languages
- Java80.0%
- JavaScript9.1%
- Makefile2.8%
- Python1.7%
- Shell1.7%
- CMake1.2%
- Other3.5%