diffutils: Hunks
1.1 Hunks
=========
When comparing two files, 'diff' finds sequences of lines common to both
files, interspersed with groups of differing lines called "hunks".
Comparing two identical files yields one sequence of common lines and no
hunks, because no lines differ. Comparing two entirely different files
yields no common lines and one large hunk that contains all lines of
both files. In general, there are many ways to match up lines between
two given files. 'diff' tries to minimize the total hunk size by
finding large sequences of common lines interspersed with small hunks of
differing lines.
For example, suppose the file 'F' contains the three lines 'a', 'b',
'c', and the file 'G' contains the same three lines in reverse order
'c', 'b', 'a'. If 'diff' finds the line 'c' as common, then the command
'diff F G' produces this output:
1,2d0
< a
< b
3a2,3
> b
> a
But if 'diff' notices the common line 'b' instead, it produces this
output:
1c1
< a
---
> c
3c3
< c
---
> a
It is also possible to find 'a' as the common line. 'diff' does not
always find an optimal matching between the files; it takes shortcuts to
run faster. But its output is usually close to the shortest possible.
You can adjust this tradeoff with the '--minimal' ('-d') option (⇒
diff Performance).