The Artima Developer Community
Sponsored Link

Weblogs Forum
Unified Diff Format

5 replies on 1 page. Most recent reply: Apr 16, 2008 12:16 PM by Guido van van Rossum

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 5 replies on 1 page
Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Unified Diff Format (View in Weblogs)
Posted: Jun 14, 2006 7:16 AM
Reply to this message Reply
Summary
I couldn't find a thorough spec for the format called "unified diff" so I decided to research it. Here are my findings.
Advertisement

I haven't found a satisfactory specification of the unified diff format (the one on the GNU website is hopelessly incomplete). Here's what I've discovered by experimenting with diff(1) on Red Hat Linux; this identifies itself as 'diff (GNU diffutils) 2.8.1'. Hopefully this is useful for someone who needs to generate unified diffs or who needs to parse them. (I had both needs recently. :-)

The header lines look like this:

indicator ' ' filename '\t' date ' ' time ' ' timezone

where:

  • indicator is '---' for the old file and '+++' for the new
  • date has the form YYYY-MM-DD
  • time has the form hh:mm:ss.nnnnnnnnn on a 24-hour clock
  • timezone is has the form ('+'|'-') hhmm where hhmm is hours and minutes east (if the sign is +) or west (if the sign is -) of GMT/UTC

Each chunk starts with a line that looks like this:

'@@ -' range ' +' range ' @@'

where range is either one unsigned decimal number or two separated by a comma. The first number is the start line of the chunk in the old or new file. The second number is chunk size in that file; it and the comma are omitted if the chunk size is 1. (Email from a reader suggests that this omission is optional and may be phased out.) If the chunk size is 0, the first number is one lower than one would expect (it is the line number after which the chunk should be inserted or deleted; in all other cases it gives the first line number or the replaced range of lines).

A chunk then continues with lines starting with ' ' (common line), '-' (only in old file), or '+' (only in new file). If the last line of a file doesn't end in a newline character, it is displayed with a newline characer, and the following line in the chunk has the literal text (starting in the first column):

'\ No newline at end of file'


Aaron Bentley

Posts: 1
Nickname: abentley
Registered: Jun, 2006

Re: Unified Diff Format Posted: Jun 14, 2006 2:10 PM
Reply to this message Reply
I had the same need to a parse unified diffs a while ago. Your description of the format looks accurate, though there are some other details about deleted and created files:

When a file is deleted (rather than just made empty), the +++ date is set to the epoch. Similarly, when a file is created, the --- date is set to the epoch.

This can be shown by doing a recursive directory diff with the -N option on:

TZ='GMT' diff old new -r -N -u
diff -r -N -u old/goodbye new/goodbye
--- old/goodbye 2006-06-14 20:52:08.000000000 +0000
+++ new/goodbye 1970-01-01 00:00:00.000000000 +0000
@@ -1 +0,0 @@
-Goodbye
diff -r -N -u old/hello new/hello
--- old/hello 1970-01-01 00:00:00.000000000 +0000
+++ new/hello 2006-06-14 20:52:16.000000000 +0000
@@ -0,0 +1 @@
+Hello

I found the option of omitting the comma in the range description wasn't very useful, and the patchutils maintainer agrees, so that will probably be phased out.

You can find my code for parsing and applying unified diffs here:
http://panoramicfeedback.com/opensource/bzrtools/patches.py

Paul Boddie

Posts: 26
Nickname: pboddie
Registered: Jan, 2006

Re: Unified Diff Format Posted: Jun 14, 2006 3:18 PM
Reply to this message Reply
I guess the description you found was the one located here:

http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Unified%20Format

Perhaps not complete, but it makes for a reasonable starting point for investigation, I suppose.

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: Unified Diff Format Posted: Aug 11, 2006 9:07 AM
Reply to this message Reply
A colleague just thanked me for writing this, and added:

"Just fyi, there seems to be some small variations in the header ---/+++ lines. SVN replaces the timestamps with repository paths and revision info which invalidates Aaron's removed/created logic."

Dan Lenski

Posts: 1
Nickname: moxfyre
Registered: Jun, 2007

Re: Unified Diff Format Posted: Jun 19, 2007 12:32 PM
Reply to this message Reply
Hi Guido, I know this is an old post, but I just found it and here's a (hopefully) useful addition:

The other way to tell if a file has been created/deleted is to look at the range of lines affected in the first chunk.
-0,0 means that it's a created file.
+0,0 means that it's a deleted file.

0,0 will never appear in the ranges except in these cases.

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: Unified Diff Format Posted: Apr 16, 2008 12:16 PM
Reply to this message Reply
In addition, svn sets the revision to 0 for an added file.

Flat View: This topic has 5 replies on 1 page
Topic: Unified Diff Format Previous Topic   Next Topic Topic: Religion's Newfound Restraint on Progress

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use