Weblogs Forum - Unified Diff Format

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
Unified Diff Format

5 replies on 1 page. Most recent reply: Apr 16, 2008 12:16 PM by Guido van van Rossum

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 5 replies on 1 page

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Unified Diff Format (View in Weblogs)

Posted: Jun 14, 2006 7:16 AM

Summary
I couldn't find a thorough spec for the format called "unified diff" so I decided to research it. Here are my findings.

I haven't found a satisfactory specification of the unified diff format (the one on the GNU website is hopelessly incomplete). Here's what I've discovered by experimenting with diff(1) on Red Hat Linux; this identifies itself as 'diff (GNU diffutils) 2.8.1'. Hopefully this is useful for someone who needs to generate unified diffs or who needs to parse them. (I had both needs recently. :-)

The header lines look like this:

indicator ' ' filename '\t' date ' ' time ' ' timezone

where:

indicator is '---' for the old file and '+++' for the new

date has the form YYYY-MM-DD

time has the form hh:mm:ss.nnnnnnnnn on a 24-hour clock

timezone is has the form ('+'|'-') hhmm where hhmm is hours and minutes east (if the sign is +) or west (if the sign is -) of GMT/UTC

Each chunk starts with a line that looks like this:

'@@ -' range ' +' range ' @@'

where range is either one unsigned decimal number or two separated by a comma. The first number is the start line of the chunk in the old or new file. The second number is chunk size in that file; it and the comma are omitted if the chunk size is 1. (Email from a reader suggests that this omission is optional and may be phased out.) If the chunk size is 0, the first number is one lower than one would expect (it is the line number after which the chunk should be inserted or deleted; in all other cases it gives the first line number or the replaced range of lines).

A chunk then continues with lines starting with ' ' (common line), '-' (only in old file), or '+' (only in new file). If the last line of a file doesn't end in a newline character, it is displayed with a newline characer, and the following line in the chunk has the literal text (starting in the first column):

'\ No newline at end of file'

Aaron Bentley

Posts: 1
Nickname: abentley
Registered: Jun, 2006

Re: Unified Diff Format

Posted: Jun 14, 2006 2:10 PM

I had the same need to a parse unified diffs a while ago. Your description of the format looks accurate, though there are some other details about deleted and created files:

When a file is deleted (rather than just made empty), the +++ date is set to the epoch. Similarly, when a file is created, the --- date is set to the epoch.

This can be shown by doing a recursive directory diff with the -N option on:

TZ='GMT' diff old new -r -N -u
diff -r -N -u old/goodbye new/goodbye
--- old/goodbye 2006-06-14 20:52:08.000000000 +0000
+++ new/goodbye 1970-01-01 00:00:00.000000000 +0000
@@ -1 +0,0 @@
-Goodbye
diff -r -N -u old/hello new/hello
--- old/hello 1970-01-01 00:00:00.000000000 +0000
+++ new/hello 2006-06-14 20:52:16.000000000 +0000
@@ -0,0 +1 @@
+Hello

I found the option of omitting the comma in the range description wasn't very useful, and the patchutils maintainer agrees, so that will probably be phased out.

You can find my code for parsing and applying unified diffs here:
http://panoramicfeedback.com/opensource/bzrtools/patches.py

Paul Boddie

Posts: 26
Nickname: pboddie
Registered: Jan, 2006

Re: Unified Diff Format

Posted: Jun 14, 2006 3:18 PM

I guess the description you found was the one located here:

http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Unified%20Format

Perhaps not complete, but it makes for a reasonable starting point for investigation, I suppose.

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: Unified Diff Format

Posted: Aug 11, 2006 9:07 AM

A colleague just thanked me for writing this, and added:

"Just fyi, there seems to be some small variations in the header ---/+++ lines. SVN replaces the timestamps with repository paths and revision info which invalidates Aaron's removed/created logic."

Dan Lenski

Posts: 1
Nickname: moxfyre
Registered: Jun, 2007

Re: Unified Diff Format

Posted: Jun 19, 2007 12:32 PM

Hi Guido, I know this is an old post, but I just found it and here's a (hopefully) useful addition:

The other way to tell if a file has been created/deleted is to look at the range of lines affected in the first chunk.
-0,0 means that it's a created file.
+0,0 means that it's a deleted file.

0,0 will never appear in the ranges except in these cases.

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: Unified Diff Format

Posted: Apr 16, 2008 12:16 PM

In addition, svn sets the revision to 0 for an added file.

Flat View: This topic has 5 replies on 1 page

Previous Topic

Next Topic


	Web Artima.com