benchmarkstt.diff.core module

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e7f2fa', 'lineColor': '#2980B9' }}}%% classDiagram RatcliffObershelp Differ <|-- RatcliffObershelp class RatcliffObershelp { +get_opcodes() a b **kwargs }

Core Diff algorithms

class benchmarkstt.diff.core.RatcliffObershelp(a, b, **kwargs)[source]

Bases: benchmarkstt.diff.Differ

Diff according to Ratcliff and Obershelp (Gestalt) matching algorithm.

From difflib.SequenceMatcher (Copyright 2001-2020, Python Software Foundation.)

SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (R-O doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people.

get_opcodes()[source]

Return list of 5-tuples describing how to turn a into b.

Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple has i1 == j1 == 0, and remaining tuples have i1 equals the i2 from the tuple preceding it, and likewise for j1 equals the previous j2.

The tags are strings, with these meanings:

  • 'replace': a[i1:i2] should be replaced by b[j1:j2]

  • 'delete': a[i1:i2] should be deleted. Note that j1==j2 in this case.

  • 'insert': b[j1:j2] should be inserted at a[i1:i1]. Note that i1==i2 in this case.

  • 'equal': a[i1:i2] == b[j1:j2]