The file content difference comparison is realized through the difflib module. As a standard library module of Python, difflib does not need to be installed. Its function is to compare the differences between files and support the output of relatively readable HTML documents, similar to the diff command under Linux. We can use difflib to compare the difference between code and configuration files, which is very useful in version control. Official document: here.
This example uses the difflib module to compare the differences between two strings, and then outputs them in a version control style. The sample code is as follows:
import difflib from pprint import pprint text1_lines = ''' 1. Beautiful is better than ugly. 2. Explicit is better than implicit. 3. Simple is better than complex. 4. Complex is better than complicated.'''.splitlines(keepends=True) # Split by rows text2_lines = ''' 1. Beautiful is better than ugly. 3. Simple is better than complex. 4. Complicated is better than complex. 5. Flat is better than nested.'''.splitlines(keepends=True) d = difflib.Differ() # Create a Differ() object result = list(d.compare(text1_lines, text2_lines)) # Use the "compare" method to compare strings pprint(result)
The example uses the
Differ() class to compare two strings. In addition, the
SuquenceMatcher() class of difflib supports the comparison of any type of sequence. The
HtmlDiff() class supports the output of the comparison result in HTML format. The running results of the example are as follows
Each line of a Differ delta begins with a two-letter code：
The sample code is as follows:
import difflib text1_lines = ''' 1. Beautiful is better than ugly. 2. Explicit is better than implicit. 3. Simple is better than complex. 4. Complex is better than complicated.'''.splitlines(keepends=True) # Split by rows text2_lines = ''' 1. Beautiful is better than ugly. 3. Simple is better than complex. 4. Complicated is better than complex. 5. Flat is better than nested.'''.splitlines(keepends=True) d = difflib.HtmlDiff() # Create HtmlDiffer() object with open("test.html", "w") as file: # Use the make_file method to compare the strings and write them into the html file file.write(d.make_file(text1_lines, text2_lines))
make_file method to compare the strings and write them into the html file:
When we perform code audits or verify backup results, we often need to check the consistency of the original and target files. Python's standard library has its own module filecmp that meets this requirement. filecmp can realize the difference comparison function of files, directories, and traversing subdirectories. For example, in the report, the output target is more than the original file or subdirectory, even if the file has the same name, it will be judged whether it is the same file (content-level comparison), etc. Python2.3 or higher version comes with the filecmp module by default, and no additional installation is required. Official document: here. filecmp provides three operation methods, cmp (single file comparison) is as follows:
filecmp.cmp(f1, f2, shallow=True) Compare the files named f1 and f2, returning True if they seem equal, False otherwise.
cmpfiles (multi-file comparison) are as follows:
filecmp.cmpfiles(dir1, dir2, common, shallow=True) Compare the files in the two directories dir1 and dir2 whose names are given by common. Returns three lists of file names: match, mismatch, errors. For example, cmpfiles('a', 'b', ['c', 'd/e']) will compare a/c with b/c and a/d/e with b/d/e. 'c' and 'd/e' will each be in one of the three returned lists.
dircmp (directory comparison) is as follows:
class filecmp.dircmp(a, b, ignore=None, hide=None) Construct a new directory comparison object, to compare the directories a and b. ignore is a list of names to ignore, and defaults to filecmp.DEFAULT_IGNORES. hide is a list of names to hide, and defaults to [os.curdir, os.pardir].
Single file comparison: Use
filecmp.cmp(f1, f2, shallow=True) method to compare files named
f2, return True for the same, return False for different, shallow defaults to True, which means only based on
os.stat () The basic information of the file returned by the method is compared, such as the last access time, modification time, status change time, etc. The comparison of the file content will be ignored. When shallow is False,
os.stat() and the file content will be verified at the same time. The contents of the file are as follows:
import filecmp print(filecmp.cmp("test1.txt", "test2.txt")) # False print(filecmp.cmp("test2.txt", "test3.txt")) # True
Multi-file comparison: Use the
filecmp.cmpfiles(dir1, dir2, common, shallow=True) method to compare the file lists given in the
dir2 directories. This method returns three lists of file names, namely match, mismatch, and error. Matching is a list that contains matched files. Otherwise, the error list contains a list of files that cannot be compared due to no files in the directory, no read permission, or other reasons. The directory file list is as follows:
The complete sample code is as follows:
import filecmp print(filecmp.cmpfiles('one', 'two', ['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt', 'test5.txt']))
Create a directory comparison object through the
filecmp.dircmp(a, b, ignore=None, hide=None) class, where
bare the names of the directories to be compared.
ignorerepresents the list of file names to ignore,
hiderepresents the hidden list, the default is [os.curdir, os.pardir].
The dircmp class can obtain detailed information about directory comparison, such as only the files included in the a directory, the subdirectories where both a and b exist, and the matching files. It also supports recursion. dircmp provides three methods for outputting reports:
report()：Print (to sys.stdout) a comparison between a and b.
report_partial_closure()：Print a comparison between a and b and common immediate subdirectories.
report_full_closure()：Print a comparison between a and b and common subdirectories (recursively).
The dircmp class offers a number of interesting attributes that may be used to get various bits of information about the directory trees being compared.
Example: Compare the directory differences between one and two. The
dircmp() method is called to realize the function of directory difference comparison, and at the same time output all the attribute information of the directory comparison object. code show as below:
import filecmp cmp = filecmp.dircmp("one", "two") print(cmp.report())