I've taken a crack at implementing Simple Diff in Rebol (versions 2 and 3). Simple Diff works by finding the longest common sequence in two series, then recursively applies itself either side of this sequence until there is nothing further in common. It returns a block containing additions (+
), omissions (-
) and matches (=
). Example:
>> diff "The kuick brown fix." "The quick brown fox."
== [= "The " - "k" + "q" = "uick brown f" - "i" + "o" = "x."]
Originally this function was designed to work only on blocks, but since comparisons are done using the series API, I've permitted the use of strings and it seems to work.
I'm looking for feedback on:
The algorithm: is there a more efficient way to build the index (although the index is only as large as the number of unique components of the series), and to perform the comparisons?
The result: is the result block in a useful form?
Any bugs?
Note: the use of FUNC (not FUNCTION) is deliberate in development. This function is currently designed for use in both Rebol 2 and Rebol 3.
Simple Diff in Rebol
Rebol [
Title: "SimpleDiff"
Date: 22-May-2014
Author: "Christopher Ross-Gill"
Type: 'module
Name: 'simplediff
Exports: [diff]
Comment: {
Based on Simple Diff for Python, CoffeeScript v0.1
(C) Paul Butler 2008 <http://www.paulbutler.org/>
https://github.com/paulgb/simplediff
}
]
diff: func [
{
Find the differences between two blocks. Returns a block of pairs, where the first value
is in [+ - =] and represents an insertion, deletion, or no change for that list.
The second value of the pair is the element.
}
before [block! string!] after [block! string!]
/local items-before starts-before starts-after run this-run tests limit
][
assert [equal? type? before type? after]
; Build a block with elements from 'before as keys, and
; each position starting with each element as values.
items-before: make block! 100 ; arbitrary block memory allocation
forall before [
append/only any [
select/case items-before first before
last repend items-before [first before make block! 100]
] before
]
; Find the largest substring common to before and after
forall after [
if tests: select/case items-before first after [
limit: length? after
run: any [run 0]
foreach before tests [
repeat offset min limit length? before [
this-run: :offset
unless before/:offset == after/:offset [
this-run: offset - 1
break
]
]
if this-run > run [
run: :this-run
starts-before: :before
starts-after: :after
]
]
]
]
collect [
either run [
; The common substring is considered to have no change, and we recurse
; on the text before and after the substring
keep diff copy/part before starts-before copy/part after starts-after
keep reduce ['= copy/part starts-after run]
keep diff copy skip starts-before run copy skip starts-after run
][
; Otherwise if no common substring is found, assume that an
; insert and delete has taken place
unless tail? before [keep reduce ['- before]]
unless tail? after [keep reduce ['+ after]]
]
]
]
diff [[=] [+] [-]] [[-] [+] [=]]
produces[- [[=] [+] [-]] + [[-] [+] [=]]]
. Offhand I can't tell if[- [[=] [+]] = [[-]] + [[+] [=]]]
would be more useful...? It would seem to paralleldiff [= + -] [- + =]
producing[- [= +] = [-] + [+ =]]
. (I guess this example would be less confusing if I didn't use +, -, and =...but I was just testing with them to see what happened. :-P) \$\endgroup\$