I've taken a crack at implementingSimple Diff in Rebol (versions 2 and 3). Simple Diff works by finding the longest common sequence in two series, then recursively applies itself either side of this sequence until there is nothing further in common. It returns a block containing additions (+), omissions (-) and matches (=). Example:
>> diff "The kuick brown fix." "The quick brown fox."== [= "The " - "k" + "q" = "uick brown f" - "i" + "o" = "x."]Originally this function was designed to work only on blocks, but since comparisons are done using the series API, I've permitted the use of strings and it seems to work.
I'm looking for feedback on:
The algorithm: is there a more efficient way to build the index (although the index is only as large as the number of unique components of the series), and to perform the comparisons?
The result: is the result block in a useful form?
Any bugs?
Note: the use of FUNC (not FUNCTION) is deliberate in development. This function is currently designed for use in both Rebol 2 and Rebol 3.
Simple Diff in Rebol
Rebol [ Title: "SimpleDiff" Date: 22-May-2014 Author: "Christopher Ross-Gill" Type: 'module Name: 'simplediff Exports: [diff] Comment: { Based on Simple Diff for Python, CoffeeScript v0.1 (C) Paul Butler 2008 <http://www.paulbutler.org/> https://github.com/paulgb/simplediff }]diff: func [ { Find the differences between two blocks. Returns a block of pairs, where the first value is in [+ - =] and represents an insertion, deletion, or no change for that list. The second value of the pair is the element. } before [block! string!] after [block! string!] /local items-before starts-before starts-after run this-run tests limit][ assert [equal? type? before type? after] ; Build a block with elements from 'before as keys, and ; each position starting with each element as values. items-before: make block! 100 ; arbitrary block memory allocation forall before [ append/only any [ select/case items-before first before last repend items-before [first before make block! 100] ] before ] ; Find the largest substring common to before and after forall after [ if tests: select/case items-before first after [ limit: length? after run: any [run 0] foreach before tests [ repeat offset min limit length? before [ this-run: :offset unless before/:offset == after/:offset [ this-run: offset - 1 break ] ] if this-run > run [ run: :this-run starts-before: :before starts-after: :after ] ] ] ] collect [ either run [ ; The common substring is considered to have no change, and we recurse ; on the text before and after the substring keep diff copy/part before starts-before copy/part after starts-after keep reduce ['= copy/part starts-after run] keep diff copy skip starts-before run copy skip starts-after run ][ ; Otherwise if no common substring is found, assume that an ; insert and delete has taken place unless tail? before [keep reduce ['- before]] unless tail? after [keep reduce ['+ after]] ] ]]- \$\begingroup\$I notice
diff [[=] [+] [-]] [[-] [+] [=]]produces[- [[=] [+] [-]] + [[-] [+] [=]]]. Offhand I can't tell if[- [[=] [+]] = [[-]] + [[+] [=]]]would be more useful...? It would seem to paralleldiff [= + -] [- + =]producing[- [= +] = [-] + [+ =]].(I guess this example would be less confusing if I didn't use +, -, and =...but I was just testing with them to see what happened. :-P)\$\endgroup\$HostileFork says dont trust SE– HostileFork says dont trust SE2014-06-17 19:06:36 +00:00CommentedJun 17, 2014 at 19:06 - \$\begingroup\$@HostileFork Yikes!\$\endgroup\$rgchris– rgchris2014-06-17 19:12:30 +00:00CommentedJun 17, 2014 at 19:12
- \$\begingroup\$@HostileFork I'd need to look to figure out why it's different—I'm not certain it should be.\$\endgroup\$rgchris– rgchris2014-06-17 19:17:19 +00:00CommentedJun 17, 2014 at 19:17
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.
