The Artima Developer Community
Sponsored Link

Weblogs Forum
F'ing Microsoft

13 replies on 1 page. Most recent reply: Jun 8, 2008 2:44 PM by Jeff Ratcliff

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 13 replies on 1 page
Rick Kitts

Posts: 48
Nickname: rkitts
Registered: Jan, 2003

F'ing Microsoft (View in Weblogs)
Posted: May 14, 2008 9:28 AM
Reply to this message Reply
Summary
A pointless rant
Advertisement
In my upcoming new job I have to write C# code on Windows. This was a choice of mine and the reasons for the choice are outside of the context of public disclosure. I am happy to say I haven't used a Microsoft product for much of anything for about 10 years or so. I write code for a living and my position remains that developing the sorts of plumbing code I write on Windows is like going to a gunfight with a knife. It's just the wrong tool.

Anyway, when I get some time at night I've been futzing about with a little program to add an XML element to Visual Studio project files. Because I have to use Visual Studio and it's doing something I don't like and to tell it not to do that something I'd have to modify 57 project files using a GUI. Which is stupid in many dimensions. I'm ok with stupid at some level. Wild cards with java generics for example. But I digress.

These project files have an extension of .csproj. If you happen to be on Windows and have one of these lying around open it up in some random editor. I used both notepad and textpad. Looks like normal XML right? As you'd expect. I mean it's either XML or some obviously non-XML thing. Right? How else would you do it? One or the other. Either, or.

Now do something like this (warning: C# code ahead):

XMLDocument document = new XMLDocument();
document.LoadXML("SomeProject.csproj");

watch in surprise as LoadXML() throws an exception telling you that the content is illegal at line 1, position 1. Look at the file in the editors. See XML? Yup. Go back and futz with the program. Try e.g.

document.Load(someStreamYouCreated)

Same exception. Be confused and frustrated.

In a fit of frustrated inspiration at 5:30AM fire up the machine, open CMD.EXE (truly an abortion of a CLI) and do this:

more SomeProject.csproj

Marvel at the 3 squiggly little characters at the beginning of the file. Indeed at line 1, position 1. WTF? In your program read past those 3 bytes and then do

document.Load(someStreamAfterReadingThreeBytes);

and see your program start to function.

Of the many reasons I truly detest Microsoft this is probably one of the largest. What absolute arrogance, in my mind, to jam 3 bytes at the front of this XML like file. Because it's not XML is it? So why make it look like it is? Is this an example of the vaunted Microsoft "innovation"? There may be some better way of loading these files. Some Microsoft approved way like VisualStudioProjectFileParser or whatever. I didn't find one when looking casually and, honestly, I don't think I should have to.

Hey, fellas! There's a whole world out here. You're not special or cool anymore. At all. Believe it.

C# is a rocking language though. I think I might prefer it to Java.


Alpha Chen

Posts: 1
Nickname: kejadlen
Registered: May, 2008

Re: F'ing Microsoft Posted: May 14, 2008 10:27 AM
Reply to this message Reply
Er... that's just the byte order mark, which isn't uncommon for specifying UTF-16 XML files. As for why it doesn't load via XmlDocument, I have no clue.

Don McCaughey

Posts: 7
Nickname: donmcc
Registered: Feb, 2006

Re: F'ing Microsoft Posted: May 14, 2008 11:00 AM
Reply to this message Reply
Those three bytes are the UTF-8 byte order mark. See:

http://unicode.org/faq/utf_bom.html#BOM

for more info.

Sounds like you're opening the file as binary instead of using the appropriate text input stream and encoding.

Mike Ivanov

Posts: 23
Nickname: mikeivanov
Registered: Jul, 2007

Re: F'ing Microsoft Posted: May 14, 2008 12:14 PM
Reply to this message Reply
The truth, though, is that nobody except Microsoft seems to be using BOM marks.

Darrell Wright

Posts: 2
Nickname: beached
Registered: May, 2008

Re: F'ing Microsoft Posted: May 14, 2008 1:12 PM
Reply to this message Reply
It is probably the unicode BOM stuff that identifies the file as UTF-8. The first 3 bytes where probably EF BB BF. I haven't used the XML load stuff yet but I wonder if there is a method or parameter for Unicode files vs ansi/ascii.

Rick Kitts

Posts: 48
Nickname: rkitts
Registered: Jan, 2003

Re: F'ing Microsoft Posted: May 14, 2008 2:50 PM
Reply to this message Reply
Well I've learned something. I still think it's stupid. But I'll look for the UTF stuff.

Text and binary files. How fun is that? I totally forgot about them.

Sigh.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: F'ing Microsoft Posted: May 17, 2008 5:53 PM
Reply to this message Reply
> Er... that's just the byte order mark, which isn't
> uncommon for specifying UTF-16 XML files. As for why it
> doesn't load via XmlDocument, I have no clue.

Doesn't XML already have a place to specify the encoding of the file?

http://en.wikipedia.org/wiki/XML#International_use

Rinie Kervel

Posts: 26
Nickname: rinie
Registered: Oct, 2005

Re: F'ing Microsoft Posted: May 19, 2008 1:58 AM
Reply to this message Reply
> Doesn't XML already have a place to specify the encoding
> of the file?
>
> http://en.wikipedia.org/wiki/XML#International_use
Nope, you have to be able to read that first 'encoding' line.
If that line is UTF-16 you need a byte order mark for correct processing.
So for multi-byte character sets you need some Byte Order Mark.
You only need a BOM in UTF-8 to be able to distinguish this stream from UTF-16

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: F'ing Microsoft Posted: May 19, 2008 5:43 PM
Reply to this message Reply
> > Doesn't XML already have a place to specify the
> encoding
> > of the file?
> >
> > http://en.wikipedia.org/wiki/XML#International_use
> Nope, you have to be able to read that first 'encoding'
> line.
> If that line is UTF-16 you need a byte order mark for
> correct processing.

I've never needed it. I find your argument hard to believe for that reason.

Nemanja Trifunovic

Posts: 172
Nickname: ntrif
Registered: Jun, 2004

Re: F'ing Microsoft Posted: May 21, 2008 8:16 PM
Reply to this message Reply
Heh, don't you love when you can swear a big company when your code is not working? :)

Anyway, the answers are in the XML standard: http://www.w3.org/TR/2006/REC-xml11-20060816/#charencoding

"Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark described in ISO/IEC 10646 [ISO/IEC 10646] or Unicode [Unicode] (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors MUST be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents."

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: F'ing Microsoft Posted: May 23, 2008 7:26 AM
Reply to this message Reply
> "Entities encoded in UTF-16 MUST and entities encoded in
> UTF-8 MAY begin with the Byte Order Mark described in
> ISO/IEC 10646 [ISO/IEC 10646] or Unicode [Unicode] (the
> ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an
> encoding signature, not part of either the markup or the
> character data of the XML document. XML processors MUST be
> able to use this character to differentiate between UTF-8
> and UTF-16 encoded documents."

That would mean the .NET parsing API being used has a bug, right?

Gregor Zeitlinger

Posts: 108
Nickname: gregor
Registered: Aug, 2005

Re: F'ing Microsoft Posted: May 25, 2008 1:36 AM
Reply to this message Reply
> That would mean the .NET parsing API being used has a bug,
> right?
correct. The crimson parser (in Java) behaves the same way, though.

Besides, there are good reasons to use the BOM - even if you're using UTF8 only: Some editors (Intellij IDEA for example) won't understand the <?xml .. UTF8> and screw your file - unless the BOM is present.

Then again, some CVS interfaces will ignore the BOM and not show it in the history. (maybe it's in the CVS server, too...)

Jeff Ratcliff

Posts: 10
Nickname: inhgtpoly1
Registered: Feb, 2006

Re: F'ing Microsoft Posted: Jun 8, 2008 2:23 PM
Reply to this message Reply
> That would mean the .NET parsing API being used has a bug,
> right?

No. It means the programmer didn't read the documentation and just assumed he knew what the method was designed to do. The ReadXML function reads from a string and converts it to XML:

public virtual void LoadXml(
string xml
)


Since the first "character" in the file isn't a valid string character, an exception is thrown (as it should be).

(This part isn't addressed to James specifically)
I think a general principle that all of us as developers should keep in mind is that there is a danger in using tools or languages that we are prejudiced against. We can waste a lot of time blaming our own bugs on others.

Jeff Ratcliff

Posts: 10
Nickname: inhgtpoly1
Registered: Feb, 2006

Re: F'ing Microsoft Posted: Jun 8, 2008 2:44 PM
Reply to this message Reply
> > That would mean the .NET parsing API being used has a
> bug,
> > right?
>
> No. It means the programmer didn't read the documentation
> and just assumed he knew what the method was designed to
> do. The ReadXML function reads from a string and converts
> it to XML:
>
> public virtual void LoadXml(
> string xml
> )
>
>
> Since the first "character" in the file isn't a valid
> string character, an exception is thrown (as it should
> be).
>
> (This part isn't addressed to James specifically)
> I think a general principle that all of us as developers
> should keep in mind is that there is a danger in using
> tools or languages that we are prejudiced against. We can
> waste a lot of time blaming our own bugs on others.

OOPS, I wrote "ReadXML" instead of "LoadXML". That damned Microsoft!

Flat View: This topic has 13 replies on 1 page
Topic: F'ing Microsoft Previous Topic   Next Topic Topic: Puppets Do JRuby

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use