Artima Weblogs | Guido van van Rossum's Weblog | Discuss | Email | Print | Bloggers | Previous | Next
Sponsored Link •
I couldn't find a thorough spec for the format called "unified diff" so I decided to research it. Here are my findings.
I haven't found a satisfactory specification of the unified diff format (the one on the GNU website is hopelessly incomplete). Here's what I've discovered by experimenting with diff(1) on Red Hat Linux; this identifies itself as 'diff (GNU diffutils) 2.8.1'. Hopefully this is useful for someone who needs to generate unified diffs or who needs to parse them. (I had both needs recently. :-)
The header lines look like this:
indicator ' ' filename '\t' date ' ' time ' ' timezone
- indicator is '---' for the old file and '+++' for the new
- date has the form YYYY-MM-DD
- time has the form hh:mm:ss.nnnnnnnnn on a 24-hour clock
- timezone is has the form ('+'|'-') hhmm where hhmm is hours and minutes east (if the sign is +) or west (if the sign is -) of GMT/UTC
Each chunk starts with a line that looks like this:
'@@ -' range ' +' range ' @@'
where range is either one unsigned decimal number or two separated by a comma. The first number is the start line of the chunk in the old or new file. The second number is chunk size in that file; it and the comma are omitted if the chunk size is 1. (Email from a reader suggests that this omission is optional and may be phased out.) If the chunk size is 0, the first number is one lower than one would expect (it is the line number after which the chunk should be inserted or deleted; in all other cases it gives the first line number or the replaced range of lines).
A chunk then continues with lines starting with ' ' (common line), '-' (only in old file), or '+' (only in new file). If the last line of a file doesn't end in a newline character, it is displayed with a newline characer, and the following line in the chunk has the literal text (starting in the first column):
'\ No newline at end of file'
Have an opinion? Readers have already posted 5 comments about this weblog entry. Why not add yours?
If you'd like to be notified whenever Guido van van Rossum adds a new entry to his weblog, subscribe to his RSS feed.
|Guido van Rossum is the creator of Python, one of the major programming languages on and off the web. The Python community refers to him as the BDFL (Benevolent Dictator For Life), a title straight from a Monty Python skit. He moved from the Netherlands to the USA in 1995, where he met his wife. Until July 2003 they lived in the northern Virginia suburbs of Washington, DC with their son Orlijn, who was born in 2001. They then moved to Silicon Valley where Guido now works for Google (spending 50% of his time on Python!).|