Implementing Simple Diff in Rebol

Ask Question

Asked 11 years, 1 month ago

Modified 11 years, 1 month ago

Viewed 184 times

I've taken a crack at implementing Simple Diff in Rebol (versions 2 and 3). Simple Diff works by finding the longest common sequence in two series, then recursively applies itself either side of this sequence until there is nothing further in common. It returns a block containing additions (+), omissions (-) and matches (=). Example:

>> diff "The kuick brown fix." "The quick brown fox."
== [= "The " - "k" + "q" = "uick brown f" - "i" + "o" = "x."]

Originally this function was designed to work only on blocks, but since comparisons are done using the series API, I've permitted the use of strings and it seems to work.

I'm looking for feedback on:

The algorithm: is there a more efficient way to build the index (although the index is only as large as the number of unique components of the series), and to perform the comparisons?
The result: is the result block in a useful form?
Any bugs?

Note: the use of FUNC (not FUNCTION) is deliberate in development. This function is currently designed for use in both Rebol 2 and Rebol 3.

Simple Diff in Rebol

Rebol [
    Title: "SimpleDiff"
    Date: 22-May-2014
    Author: "Christopher Ross-Gill"
    Type: 'module
    Name: 'simplediff
    Exports: [diff]
    Comment: {
        Based on Simple Diff for Python, CoffeeScript v0.1
        (C) Paul Butler 2008 <http://www.paulbutler.org/>
        https://github.com/paulgb/simplediff
    }
]

diff: func [
    {
        Find the differences between two blocks. Returns a block of pairs, where the first value
        is in [+ - =] and represents an insertion, deletion, or no change for that list.
        The second value of the pair is the element.
    }
    before [block! string!] after [block! string!]
    /local items-before starts-before starts-after run this-run tests limit
][
    assert [equal? type? before type? after]

    ; Build a block with elements from 'before as keys, and
    ; each position starting with each element as values.
    items-before: make block! 100 ; arbitrary block memory allocation
    forall before [
        append/only any [
            select/case items-before first before
            last repend items-before [first before make block! 100]
        ] before
    ]

    ; Find the largest substring common to before and after
    forall after [
        if tests: select/case items-before first after [
            limit: length? after
            run: any [run 0]

            foreach before tests [
                repeat offset min limit length? before [
                    this-run: :offset
                    unless before/:offset == after/:offset [
                        this-run: offset - 1
                        break
                    ]
                ]

                if this-run > run [
                    run: :this-run
                    starts-before: :before
                    starts-after: :after
                ]
            ]
        ]
    ]

    collect [
        either run [
            ; The common substring is considered to have no change, and we recurse
            ; on the text before and after the substring

            keep diff copy/part before starts-before copy/part after starts-after
            keep reduce ['= copy/part starts-after run]
            keep diff copy skip starts-before run copy skip starts-after run
        ][
            ; Otherwise if no common substring is found, assume that an
            ; insert and delete has taken place

            unless tail? before [keep reduce ['- before]]
            unless tail? after [keep reduce ['+ after]]
        ]
    ]
]

edited Jun 17, 2014 at 18:41

asked Jun 17, 2014 at 18:33

rgchris

4892 silver badges9 bronze badges

\$\begingroup\$ I notice diff [[=] [+] [-]] [[-] [+] [=]] produces [- [[=] [+] [-]] + [[-] [+] [=]]]. Offhand I can't tell if [- [[=] [+]] = [[-]] + [[+] [=]]] would be more useful...? It would seem to parallel diff [= + -] [- + =] producing [- [= +] = [-] + [+ =]]. (I guess this example would be less confusing if I didn't use +, -, and =...but I was just testing with them to see what happened. :-P) \$\endgroup\$
– HostileFork says dont trust SE
Commented Jun 17, 2014 at 19:06
\$\begingroup\$ @HostileFork Yikes! \$\endgroup\$
– rgchris
Commented Jun 17, 2014 at 19:12
\$\begingroup\$ @HostileFork I'd need to look to figure out why it's different—I'm not certain it should be. \$\endgroup\$
– rgchris
Commented Jun 17, 2014 at 19:17

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Implementing Simple Diff in Rebol

Simple Diff in Rebol

0

Your Answer

Hot Network Questions

Implementing Simple Diff in Rebol

Simple Diff in Rebol

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions