difflib --- 計算差異的輔助工具

原始碼:Lib/difflib.py


This module provides classes and functions for comparing sequences. Itcan be used for example, for comparing files, and can produce informationabout file differences in various formats, including HTML and context and unifieddiffs. For comparing directories and files, see also, thefilecmp module.

classdifflib.SequenceMatcher

This is a flexible class for comparing pairs of sequences of any type, so longas the sequence elements arehashable. The basic algorithm predates, and is alittle fancier than, an algorithm published in the late 1980's by Ratcliff andObershelp under the hyperbolic name "gestalt pattern matching." The idea is tofind the longest contiguous matching subsequence that contains no "junk"elements; these "junk" elements are ones that are uninteresting in somesense, such as blank lines or whitespace. (Handling junk is anextension to the Ratcliff and Obershelp algorithm.) The sameidea is then applied recursively to the pieces of the sequences to the left andto the right of the matching subsequence. This does not yield minimal editsequences, but does tend to yield matches that "look right" to people.

Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worstcase and quadratic time in the expected case.SequenceMatcher isquadratic time for the worst case and has expected-case behavior dependent in acomplicated way on how many elements the sequences have in common; best casetime is linear.

Automatic junk heuristic:SequenceMatcher supports a heuristic thatautomatically treats certain sequence items as junk. The heuristic counts how manytimes each individual item appears in the sequence. If an item's duplicates (afterthe first one) account for more than 1% of the sequence and the sequence is at least200 items long, this item is marked as "popular" and is treated as junk forthe purpose of sequence matching. This heuristic can be turned off by settingtheautojunk argument toFalse when creating theSequenceMatcher.

在 3.2 版的變更:新增autojunk 參數。

classdifflib.Differ

This is a class for comparing sequences of lines of text, and producinghuman-readable differences or deltas. Differ usesSequenceMatcherboth to compare sequences of lines, and to compare sequences of characterswithin similar (near-matching) lines.

Each line of aDiffer delta begins with a two-letter code:

Code

含義

'-'

line unique to sequence 1

'+'

line unique to sequence 2

' '

line common to both sequences

'?'

line not present in either input sequence

Lines beginning with '?' attempt to guide the eye to intraline differences,and were not present in either input sequence. These lines can be confusing ifthe sequences contain whitespace characters, such as spaces, tabs or line breaks.

classdifflib.HtmlDiff

This class can be used to create an HTML table (or a complete HTML filecontaining the table) showing a side by side, line by line comparison of textwith inter-line and intra-line change highlights. The table can be generated ineither full or contextual difference mode.

The constructor for this class is:

__init__(tabsize=8,wrapcolumn=None,linejunk=None,charjunk=IS_CHARACTER_JUNK)

Initializes instance ofHtmlDiff.

tabsize is an optional keyword argument to specify tab stop spacing anddefaults to8.

wrapcolumn is an optional keyword to specify column number where lines arebroken and wrapped, defaults toNone where lines are not wrapped.

linejunk andcharjunk are optional keyword arguments passed intondiff()(used byHtmlDiff to generate the side by side HTML differences). Seendiff() documentation for argument default values and descriptions.

The following methods are public:

make_file(fromlines,tolines,fromdesc='',todesc='',context=False,numlines=5,*,charset='utf-8')

Comparesfromlines andtolines (lists of strings) and returns a string whichis a complete HTML file containing a table showing line by line differences withinter-line and intra-line changes highlighted.

fromdesc andtodesc are optional keyword arguments to specify from/to filecolumn header strings (both default to an empty string).

context andnumlines are both optional keyword arguments. Setcontext toTrue when contextual differences are to be shown, else the default isFalse to show the full files.numlines defaults to5. WhencontextisTruenumlines controls the number of context lines which surround thedifference highlights. Whencontext isFalsenumlines controls thenumber of lines which are shown before a difference highlight when using the"next" hyperlinks (setting to zero would cause the "next" hyperlinks to placethe next difference highlight at the top of the browser without any leadingcontext).

備註

fromdesc andtodesc are interpreted as unescaped HTML and should beproperly escaped while receiving input from untrusted sources.

在 3.5 版的變更:charset keyword-only argument was added. The default charset ofHTML document changed from'ISO-8859-1' to'utf-8'.

make_table(fromlines,tolines,fromdesc='',todesc='',context=False,numlines=5)

Comparesfromlines andtolines (lists of strings) and returns a string whichis a complete HTML table showing line by line differences with inter-line andintra-line changes highlighted.

The arguments for this method are the same as those for themake_file()method.

difflib.context_diff(a,b,fromfile='',tofile='',fromfiledate='',tofiledate='',n=3,lineterm='\n')

Comparea andb (lists of strings); return a delta (ageneratorgenerating the delta lines) in context diff format.

Context diffs are a compact way of showing just the lines that have changed plusa few lines of context. The changes are shown in a before/after style. Thenumber of context lines is set byn which defaults to three.

By default, the diff control lines (those with*** or---) are createdwith a trailing newline. This is helpful so that inputs created fromio.IOBase.readlines() result in diffs that are suitable for use withio.IOBase.writelines() since both the inputs and outputs have trailingnewlines.

For inputs that do not have trailing newlines, set thelineterm argument to"" so that the output will be uniformly newline free.

The context diff format normally has a header for filenames and modificationtimes. Any or all of these may be specified using strings forfromfile,tofile,fromfiledate, andtofiledate. The modification times are normallyexpressed in the ISO 8601 format. If not specified, thestrings default to blanks.

>>>importsys>>>fromdifflibimport*>>>s1=['bacon\n','eggs\n','ham\n','guido\n']>>>s2=['python\n','eggy\n','hamster\n','guido\n']>>>sys.stdout.writelines(context_diff(s1,s2,fromfile='before.py',...tofile='after.py'))*** before.py--- after.py****************** 1,4 ****! bacon! eggs! ham  guido--- 1,4 ----! python! eggy! hamster  guido

一個更詳盡的範例請見A command-line interface to difflib

difflib.get_close_matches(word,possibilities,n=3,cutoff=0.6)

Return a list of the best "good enough" matches.word is a sequence for whichclose matches are desired (typically a string), andpossibilities is a list ofsequences against which to matchword (typically a list of strings).

Optional argumentn (default3) is the maximum number of close matches toreturn;n must be greater than0.

Optional argumentcutoff (default0.6) is a float in the range [0, 1].Possibilities that don't score at least that similar toword are ignored.

The best (no more thann) matches among the possibilities are returned in alist, sorted by similarity score, most similar first.

>>>get_close_matches('appel',['ape','apple','peach','puppy'])['apple', 'ape']>>>importkeyword>>>get_close_matches('wheel',keyword.kwlist)['while']>>>get_close_matches('pineapple',keyword.kwlist)[]>>>get_close_matches('accept',keyword.kwlist)['except']
difflib.ndiff(a,b,linejunk=None,charjunk=IS_CHARACTER_JUNK)

Comparea andb (lists of strings); return aDiffer-styledelta (agenerator generating the delta lines).

Optional keyword parameterslinejunk andcharjunk are filtering functions(orNone):

linejunk: A function that accepts a single string argument, and returnstrue if the string is junk, or false if not. The default isNone. Thereis also a module-level functionIS_LINE_JUNK(), which filters out lineswithout visible characters, except for at most one pound character ('#')-- however the underlyingSequenceMatcher class does a dynamicanalysis of which lines are so frequent as to constitute noise, and thisusually works better than using this function.

charjunk: A function that accepts a character (a string of length 1), andreturns if the character is junk, or false if not. The default is module-levelfunctionIS_CHARACTER_JUNK(), which filters out whitespace characters (ablank or tab; it's a bad idea to include newline in this!).

>>>diff=ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),...'ore\ntree\nemu\n'.splitlines(keepends=True))>>>print(''.join(diff),end="")- one?  ^+ ore?  ^- two- three?  -+ tree+ emu
difflib.restore(sequence,which)

Return one of the two sequences that generated a delta.

Given asequence produced byDiffer.compare() orndiff(), extractlines originating from file 1 or 2 (parameterwhich), stripping off lineprefixes.

範例:

>>>diff=ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),...'ore\ntree\nemu\n'.splitlines(keepends=True))>>>diff=list(diff)# materialize the generated delta into a list>>>print(''.join(restore(diff,1)),end="")onetwothree>>>print(''.join(restore(diff,2)),end="")oretreeemu
difflib.unified_diff(a,b,fromfile='',tofile='',fromfiledate='',tofiledate='',n=3,lineterm='\n')

Comparea andb (lists of strings); return a delta (ageneratorgenerating the delta lines) in unified diff format.

Unified diffs are a compact way of showing just the lines that have changed plusa few lines of context. The changes are shown in an inline style (instead ofseparate before/after blocks). The number of context lines is set byn whichdefaults to three.

By default, the diff control lines (those with---,+++, or@@) arecreated with a trailing newline. This is helpful so that inputs created fromio.IOBase.readlines() result in diffs that are suitable for use withio.IOBase.writelines() since both the inputs and outputs have trailingnewlines.

For inputs that do not have trailing newlines, set thelineterm argument to"" so that the output will be uniformly newline free.

The unified diff format normally has a header for filenames and modificationtimes. Any or all of these may be specified using strings forfromfile,tofile,fromfiledate, andtofiledate. The modification times are normallyexpressed in the ISO 8601 format. If not specified, thestrings default to blanks.

>>>s1=['bacon\n','eggs\n','ham\n','guido\n']>>>s2=['python\n','eggy\n','hamster\n','guido\n']>>>sys.stdout.writelines(unified_diff(s1,s2,fromfile='before.py',tofile='after.py'))--- before.py+++ after.py@@ -1,4 +1,4 @@-bacon-eggs-ham+python+eggy+hamster guido

一個更詳盡的範例請見A command-line interface to difflib

difflib.diff_bytes(dfunc,a,b,fromfile=b'',tofile=b'',fromfiledate=b'',tofiledate=b'',n=3,lineterm=b'\n')

Comparea andb (lists of bytes objects) usingdfunc; yield asequence of delta lines (also bytes) in the format returned bydfunc.dfunc must be a callable, typically eitherunified_diff() orcontext_diff().

Allows you to compare data with unknown or inconsistent encoding. Allinputs exceptn must be bytes objects, not str. Works by losslesslyconverting all inputs (exceptn) to str, and callingdfunc(a,b,fromfile,tofile,fromfiledate,tofiledate,n,lineterm). The output ofdfunc is then converted back to bytes, so the delta lines that youreceive have the same unknown/inconsistent encodings asa andb.

在 3.5 版被加入.

difflib.IS_LINE_JUNK(line)

ReturnTrue for ignorable lines. The lineline is ignorable ifline isblank or contains a single'#', otherwise it is not ignorable. Used as adefault for parameterlinejunk inndiff() in older versions.

difflib.IS_CHARACTER_JUNK(ch)

ReturnTrue for ignorable characters. The characterch is ignorable ifchis a space or tab, otherwise it is not ignorable. Used as a default forparametercharjunk inndiff().

也參考

Pattern Matching: The Gestalt Approach

Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. Thiswas published inDr. Dobb's Journal in July, 1988.

SequenceMatcher 物件

TheSequenceMatcher class has this constructor:

classdifflib.SequenceMatcher(isjunk=None,a='',b='',autojunk=True)

Optional argumentisjunk must beNone (the default) or a one-argumentfunction that takes a sequence element and returns true if and only if theelement is "junk" and should be ignored. PassingNone forisjunk isequivalent to passinglambdax:False; in other words, no elements are ignored.For example, pass:

lambdax:xin"\t"

if you're comparing lines as sequences of characters, and don't want to synch upon blanks or hard tabs.

The optional argumentsa andb are sequences to be compared; both default toempty strings. The elements of both sequences must behashable.

The optional argumentautojunk can be used to disable the automatic junkheuristic.

在 3.2 版的變更:新增autojunk 參數。

SequenceMatcher objects get three data attributes:bjunk is theset of elements ofb for whichisjunk isTrue;bpopular is the set ofnon-junk elements considered popular by the heuristic (if it is notdisabled);b2j is a dict mapping the remaining elements ofb to a listof positions where they occur. All three are reset wheneverb is resetwithset_seqs() orset_seq2().

在 3.2 版被加入:Thebjunk andbpopular attributes.

SequenceMatcher objects have the following methods:

set_seqs(a,b)

Set the two sequences to be compared.

SequenceMatcher computes and caches detailed information about thesecond sequence, so if you want to compare one sequence against manysequences, useset_seq2() to set the commonly used sequence once andcallset_seq1() repeatedly, once for each of the other sequences.

set_seq1(a)

Set the first sequence to be compared. The second sequence to be comparedis not changed.

set_seq2(b)

Set the second sequence to be compared. The first sequence to be comparedis not changed.

find_longest_match(alo=0,ahi=None,blo=0,bhi=None)

Find longest matching block ina[alo:ahi] andb[blo:bhi].

Ifisjunk was omitted orNone,find_longest_match() returns(i,j,k) such thata[i:i+k] is equal tob[j:j+k], wherealo<=i<=i+k<=ahi andblo<=j<=j+k<=bhi. For all(i',j',k') meeting those conditions, the additional conditionsk>=k',i<=i', and ifi==i',j<=j' are also met. In other words, ofall maximal matching blocks, return one that starts earliest ina, andof all those maximal matching blocks that start earliest ina, returnthe one that starts earliest inb.

>>>s=SequenceMatcher(None," abcd","abcd abcd")>>>s.find_longest_match(0,5,0,9)Match(a=0, b=4, size=5)

Ifisjunk was provided, first the longest matching block is determinedas above, but with the additional restriction that no junk element appearsin the block. Then that block is extended as far as possible by matching(only) junk elements on both sides. So the resulting block never matcheson junk except as identical junk happens to be adjacent to an interestingmatch.

Here's the same example as before, but considering blanks to be junk. Thatprevents'abcd' from matching the'abcd' at the tail end of thesecond sequence directly. Instead only the'abcd' can match, andmatches the leftmost'abcd' in the second sequence:

>>>s=SequenceMatcher(lambdax:x==" "," abcd","abcd abcd")>>>s.find_longest_match(0,5,0,9)Match(a=1, b=0, size=4)

If no blocks match, this returns(alo,blo,0).

This method returns anamed tupleMatch(a,b,size).

在 3.9 版的變更:新增預設引數。

get_matching_blocks()

Return list of triples describing non-overlapping matching subsequences.Each triple is of the form(i,j,n),and means thata[i:i+n]==b[j:j+n]. Thetriples are monotonically increasing ini andj.

The last triple is a dummy, and has the value(len(a),len(b),0). Itis the only triple withn==0. If(i,j,n) and(i',j',n')are adjacent triples in the list, and the second is not the last triple inthe list, theni+n<i' orj+n<j'; in other words, adjacenttriples always describe non-adjacent equal blocks.

>>>s=SequenceMatcher(None,"abxcd","abcd")>>>s.get_matching_blocks()[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
get_opcodes()

Return list of 5-tuples describing how to turna intob. Each tuple isof the form(tag,i1,i2,j1,j2). The first tuple hasi1==j1==0, and remaining tuples havei1 equal to thei2 from the precedingtuple, and, likewise,j1 equal to the previousj2.

Thetag values are strings, with these meanings:

Value

含義

'replace'

a[i1:i2] should be replaced byb[j1:j2].

'delete'

a[i1:i2] should be deleted. Note thatj1==j2 in this case.

'insert'

b[j1:j2] should be inserted ata[i1:i1]. Note thati1==i2 inthis case.

'equal'

a[i1:i2]==b[j1:j2] (the sub-sequencesare equal).

舉例來說:

>>>a="qabxcd">>>b="abycdf">>>s=SequenceMatcher(None,a,b)>>>fortag,i1,i2,j1,j2ins.get_opcodes():...print('{:7}   a[{}:{}] --> b[{}:{}]{!r:>8} -->{!r}'.format(...tag,i1,i2,j1,j2,a[i1:i2],b[j1:j2]))delete    a[0:1] --> b[0:0]      'q' --> ''equal     a[1:3] --> b[0:2]     'ab' --> 'ab'replace   a[3:4] --> b[2:3]      'x' --> 'y'equal     a[4:6] --> b[3:5]     'cd' --> 'cd'insert    a[6:6] --> b[5:6]       '' --> 'f'
get_grouped_opcodes(n=3)

Return agenerator of groups with up ton lines of context.

Starting with the groups returned byget_opcodes(), this methodsplits out smaller change clusters and eliminates intervening ranges whichhave no changes.

The groups are returned in the same format asget_opcodes().

ratio()

Return a measure of the sequences' similarity as a float in the range [0,1].

Where T is the total number of elements in both sequences, and M is thenumber of matches, this is 2.0*M / T. Note that this is1.0 if thesequences are identical, and0.0 if they have nothing in common.

This is expensive to compute ifget_matching_blocks() orget_opcodes() hasn't already been called, in which case you may wantto tryquick_ratio() orreal_quick_ratio() first to get anupper bound.

備註

Caution: The result of aratio() call may depend on the order ofthe arguments. For instance:

>>>SequenceMatcher(None,'tide','diet').ratio()0.25>>>SequenceMatcher(None,'diet','tide').ratio()0.5
quick_ratio()

Return an upper bound onratio() relatively quickly.

real_quick_ratio()

Return an upper bound onratio() very quickly.

The three methods that return the ratio of matching to total characters can givedifferent results due to differing levels of approximation, althoughquick_ratio() andreal_quick_ratio()are always at least as large asratio():

>>>s=SequenceMatcher(None,"abcd","bcde")>>>s.ratio()0.75>>>s.quick_ratio()0.75>>>s.real_quick_ratio()1.0

SequenceMatcher 範例

This example compares two strings, considering blanks to be "junk":

>>>s=SequenceMatcher(lambdax:x==" ",..."private Thread currentThread;",..."private volatile Thread currentThread;")

ratio() returns a float in [0, 1], measuring the similarity of thesequences. As a rule of thumb, aratio() value over 0.6 means thesequences are close matches:

>>>print(round(s.ratio(),3))0.866

If you're only interested in where the sequences match,get_matching_blocks() is handy:

>>>forblockins.get_matching_blocks():...print("a[%d] and b[%d] match for%d elements"%block)a[0] and b[0] match for 8 elementsa[8] and b[17] match for 21 elementsa[29] and b[38] match for 0 elements

Note that the last tuple returned byget_matching_blocks()is always a dummy,(len(a),len(b),0), and this is the only case in which the lasttuple element (number of elements matched) is0.

If you want to know how to change the first sequence into the second, useget_opcodes():

>>>foropcodeins.get_opcodes():...print("%6s a[%d:%d] b[%d:%d]"%opcode) equal a[0:8] b[0:8]insert a[8:8] b[8:17] equal a[8:29] b[17:38]

也參考

Differ Objects

Note thatDiffer-generated deltas make no claim to beminimaldiffs. To the contrary, minimal diffs are often counter-intuitive, because theysynch up anywhere possible, sometimes accidental matches 100 pages apart.Restricting synch points to contiguous matches preserves some notion oflocality, at the occasional cost of producing a longer diff.

TheDiffer class has this constructor:

classdifflib.Differ(linejunk=None,charjunk=None)

Optional keyword parameterslinejunk andcharjunk are for filter functions(orNone):

linejunk: A function that accepts a single string argument, and returns trueif the string is junk. The default isNone, meaning that no line isconsidered junk.

charjunk: A function that accepts a single character argument (a string oflength 1), and returns true if the character is junk. The default isNone,meaning that no character is considered junk.

These junk-filtering functions speed up matching to finddifferences and do not cause any differing lines or characters tobe ignored. Read the description of thefind_longest_match() method'sisjunkparameter for an explanation.

Differ objects are used (deltas generated) via a single method:

compare(a,b)

Compare two sequences of lines, and generate the delta (a sequence of lines).

Each sequence must contain individual single-line strings ending withnewlines. Such sequences can be obtained from thereadlines() method of file-like objects. The deltagenerated also consists of newline-terminated strings, ready to beprinted as-is via thewritelines() method of afile-like object.

Differ Example

This example compares two texts. First we set up the texts, sequences ofindividual single-line strings ending with newlines (such sequences can also beobtained from thereadlines() method of file-like objects):

>>>text1='''  1. Beautiful is better than ugly....  2. Explicit is better than implicit....  3. Simple is better than complex....  4. Complex is better than complicated....'''.splitlines(keepends=True)>>>len(text1)4>>>text1[0][-1]'\n'>>>text2='''  1. Beautiful is better than ugly....  3.   Simple is better than complex....  4. Complicated is better than complex....  5. Flat is better than nested....'''.splitlines(keepends=True)

Next we instantiate a Differ object:

>>>d=Differ()

Note that when instantiating aDiffer object we may pass functions tofilter out line and character "junk." See theDiffer() constructor fordetails.

Finally, we compare the two:

>>>result=list(d.compare(text1,text2))

result is a list of strings, so let's pretty-print it:

>>>frompprintimportpprint>>>pprint(result)['    1. Beautiful is better than ugly.\n', '-   2. Explicit is better than implicit.\n', '-   3. Simple is better than complex.\n', '+   3.   Simple is better than complex.\n', '?     ++\n', '-   4. Complex is better than complicated.\n', '?            ^                     ---- ^\n', '+   4. Complicated is better than complex.\n', '?           ++++ ^                      ^\n', '+   5. Flat is better than nested.\n']

As a single multi-line string it looks like this:

>>>importsys>>>sys.stdout.writelines(result)    1. Beautiful is better than ugly.-   2. Explicit is better than implicit.-   3. Simple is better than complex.+   3.   Simple is better than complex.?     ++-   4. Complex is better than complicated.?            ^                     ---- ^+   4. Complicated is better than complex.?           ++++ ^                      ^+   5. Flat is better than nested.

A command-line interface to difflib

This example shows how to use difflib to create adiff-like utility.

""" Command line interface to difflib.py providing diffs in four formats:* ndiff:    lists every line and highlights interline changes.* context:  highlights clusters of changes in a before/after format.* unified:  highlights clusters of changes in an inline format.* html:     generates side by side comparison with change highlights."""importsys,os,difflib,argparsefromdatetimeimportdatetime,timezonedeffile_mtime(path):t=datetime.fromtimestamp(os.stat(path).st_mtime,timezone.utc)returnt.astimezone().isoformat()defmain():parser=argparse.ArgumentParser()parser.add_argument('-c',action='store_true',default=False,help='Produce a context format diff (default)')parser.add_argument('-u',action='store_true',default=False,help='Produce a unified format diff')parser.add_argument('-m',action='store_true',default=False,help='Produce HTML side by side diff ''(can use -c and -l in conjunction)')parser.add_argument('-n',action='store_true',default=False,help='Produce a ndiff format diff')parser.add_argument('-l','--lines',type=int,default=3,help='Set number of context lines (default 3)')parser.add_argument('fromfile')parser.add_argument('tofile')options=parser.parse_args()n=options.linesfromfile=options.fromfiletofile=options.tofilefromdate=file_mtime(fromfile)todate=file_mtime(tofile)withopen(fromfile)asff:fromlines=ff.readlines()withopen(tofile)astf:tolines=tf.readlines()ifoptions.u:diff=difflib.unified_diff(fromlines,tolines,fromfile,tofile,fromdate,todate,n=n)elifoptions.n:diff=difflib.ndiff(fromlines,tolines)elifoptions.m:diff=difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile,context=options.c,numlines=n)else:diff=difflib.context_diff(fromlines,tolines,fromfile,tofile,fromdate,todate,n=n)sys.stdout.writelines(diff)if__name__=='__main__':main()

ndiff 範例:

This example shows how to usedifflib.ndiff().

"""ndiff [-q] file1 file2    orndiff (-r1 | -r2) < ndiff_output > file1_or_file2Print a human-friendly file difference report to stdout.  Both inter-and intra-line differences are noted.  In the second form, recreate file1(-r1) or file2 (-r2) on stdout, from an ndiff report on stdin.In the first form, if -q ("quiet") is not specified, the first two linesof output are-: file1+: file2Each remaining line begins with a two-letter code:    "- "    line unique to file1    "+ "    line unique to file2    "  "    line common to both files    "? "    line not present in either input fileLines beginning with "? " attempt to guide the eye to intralinedifferences, and were not present in either input file.  These lines can beconfusing if the source files contain tab characters.The first file can be recovered by retaining only lines that begin with"  " or "- ", and deleting those 2-character prefixes; use ndiff with -r1.The second file can be recovered similarly, but by retaining only "  " and"+ " lines; use ndiff with -r2; or, on Unix, the second file can berecovered by piping the output through    sed -n '/^[+ ] /s/^..//p'"""__version__=1,7,0importdifflib,sysdeffail(msg):out=sys.stderr.writeout(msg+"\n\n")out(__doc__)return0# open a file & return the file object; gripe and return 0 if it# couldn't be openeddeffopen(fname):try:returnopen(fname)exceptIOErrorasdetail:returnfail("couldn't open "+fname+": "+str(detail))# open two files & spray the diff to stdout; return false iff a problemdeffcompare(f1name,f2name):f1=fopen(f1name)f2=fopen(f2name)ifnotf1ornotf2:return0a=f1.readlines();f1.close()b=f2.readlines();f2.close()forlineindifflib.ndiff(a,b):print(line,end=' ')return1# crack args (sys.argv[1:] is normal) & compare;# return false iff a problemdefmain(args):importgetopttry:opts,args=getopt.getopt(args,"qr:")exceptgetopt.errorasdetail:returnfail(str(detail))noisy=1qseen=rseen=0foropt,valinopts:ifopt=="-q":qseen=1noisy=0elifopt=="-r":rseen=1whichfile=valifqseenandrseen:returnfail("can't specify both -q and -r")ifrseen:ifargs:returnfail("no args allowed with -r option")ifwhichfilein("1","2"):restore(whichfile)return1returnfail("-r value must be 1 or 2")iflen(args)!=2:returnfail("need 2 filename args")f1name,f2name=argsifnoisy:print('-:',f1name)print('+:',f2name)returnfcompare(f1name,f2name)# read ndiff output from stdin, and print file1 (which=='1') or# file2 (which=='2') to stdoutdefrestore(which):restored=difflib.restore(sys.stdin.readlines(),which)sys.stdout.writelines(restored)if__name__=='__main__':main(sys.argv[1:])