The Artima Developer Community
Sponsored Link

Python Buzz Forum
Connection negotiation

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
Connection negotiation Posted: Jul 26, 2007 8:46 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: Connection negotiation
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement

Catching up on the LSID wars I saw a post by Eric Jain at the Swiss Institute of Bioinformatics. He wrote:

http://purl.uniprot.org/uniprot/P12345 is linked to a machine-readable representation [http://beta.uniprot.org/uniprot/P12345.rdf] via two mechanisms: 1. there is a link-rel=alternate in the header of the web page, and 2. you can set an Accept header if you want to skip directly to that.
In other words, the server implements content-negotiation (also called "conneg" for short).

This is something I advocated in DAS/2. The summary of my experience with conneg is in the DAS/2 mail archive.

I don't closely track what's going on in bioinformatics these days so Eric's comment is the first real public example I've seen of conneg in use in the life sciences. Affymetrix uses it for some internal work, and while the DAS/2 spec talks about it I don't know of anyone publically supporting it.

I'm curious to know about people's experiences with conneg. It hasn't had wide uptake in the general software world, so I haven't been learned much about the practicalities of using it in real systems. But I don't have a comment system so, hmm, well, email me about it or send me a link to a page describing your experiences.

Getting a specific representation

One of the longer term problems in conneg is that only dispatches on the requested format, and not the meaning. If you request an "image/png" for a chemical compound do you get the 2D or 3D depiction of that compound? But that's theoretical problems that are best solved only after running into problems in practice, I think.

A more immediate problem I had with conneg was trying to link to a specific format for a resource. For example, if you want to link to specifically the RDF version in HTML you have to do something like

Take a look at the <a href="wherever" type="application/rdf+xml">RDF</a>
and in email you want to say
Why are the oxygens colored yellow in the PNG version at http://example.com/image ?
(I haven't tried that to see if the 'type' attribute actually works like this in modern browsers. I just don't have real world experience in using conneg.)

Eric solved that by using a several redirections, with different URLs for each final representations. That is:

  • The first request to 'http://purl.uniprot.org/uniprot/P12345' does a 303 "See Other" redirect to ...
  • 'http://beta.uniprot.org/?query=purl:uniprot/P12345', which understands the "Accept" header and does a 302 "Found" redirect to either of:
    • 'http://beta.uniprot.org/uniprot/P12345' for html, or
    • 'http://beta.uniprot.org/uniprot/P12345.rdf' for RDF
I'm beginning to think that multiple final URLs (with an intermediate redirect) is the right solution for this. Though I would like some way for user agents to get from each final representation to the other, or to the "main" URL. For example, I can't "Accept: application/rdf+xml" on the HTML page and be redirected to the other. Shouldn't there be an HTTP header for that?

The conneg spec has a section describing "alternatives", which looks like

     Alternates: {"paper.html.en" 0.9 {type text/html} {language en}},
                 {"paper.html.fr" 0.7 {type text/html} {language fr}},
                 {"paper.ps.en"   1.0 {type application/postscript}
                     {language en}}
It's meant so the server can inform the client (the "user agent") that alternate forms are available, and let the client decide which is the best form. It might be nice for the UniProt server to support this as well, but that's definitely something to hold off doing until there are clients that might actually use that data.

This is where the "chicken and egg" meets YAGNI. And I'm on the YAGNI side of the balance. Except abstractly.

Quality

For me, the hardest thing in conneg was supporting the quality factor in the request. A quality of "q=1.0" is best and "q=0.0" means "do not want." If two content types are requested then the one with the highest quality (after applying the scoring algorithm) wins. As long as the best is not 0.

It looks Uniprot ignores the "q" field. I think that's reasonable for now. It's annoying to get right and currently no one uses it.

% curl -H "Accept: application/rdf+xml; q=0" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 26 Jul 2007 13:51:18 GMT
Server: Apache-Coyote/1.1
Location: http://beta.uniprot.org/uniprot/P12345.rdf
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 0
The "q=0" means "do not send me RDF" and you can see I'm being pointed to the .rdf file.

Here's a request where the RDF should be returned instead of the HTML

% curl -H "Accept: text/html;q=0.1, application/rdf+xml;q=0.2" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 26 Jul 2007 14:00:51 GMT
Server: Apache-Coyote/1.1
Location: http://beta.uniprot.org/uniprot/P12345
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 0
Most likely the internal code is using the first format given rather than the best format given. So remember boys and girls, always send your prefered format first, even if the spec says there shouldn't be a difference.

Hmm, and it does look like they do a substring test on the Accept string. This should not (probably, debatably) return RDF.

% curl -H "Accept: application/rdf+xml2" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 26 Jul 2007 13:50:36 GMT
Server: Apache-Coyote/1.1
Location: http://beta.uniprot.org/uniprot/P12345.rdf
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 0

All of these problems I just listed? They should be low on the list of things to worry about doing conneg. It's a bunch of details that aren't needed for the first pass at getting conneg working. They are only needed once there are multiple format variants, and clients which except to get the different format types, and which can handle multiple formats.

Accept: */*

Eric also wrote:

The main reason why I don't default to the machine-readable representation (no doubt that would be useful for people writing semweb applications) is that the large majority of resources does not have a machine readable representation, and web pages happen to be the greatest common denominator.
Perfectly cromulent justification. Especially since semantic web apps can be expected to send the correct content type, while browsers (Safari; grrr!) do things like send "Accept: */*".

Read: Connection negotiation

Topic: Offline Wikis and Google Gears Previous Topic   Next Topic Topic: Hanging out at OSCon in Portland and the TechCrunch party on Friday

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use