Not Invented Here: 2009

Friday, November 20, 2009

Defining a new URI or URN scheme properly turns out to be really difficult. I've been sponsoring drafts for such schemes for almost four years, and the same problems come up again and again. The major themes:

Comparisons
Uniqueness
Stability

Comparisons

Comparing one URI to another turns out to be a commonly desired feature. Browsers look up cached pages based on URI comparison. If I click a link to bookmark in delicious.com, I'd like it to be bookmarked only once and if I've already bookmarked it, bring up that page so I can see what I tagged it with and when. Outside of http URIs, I'd still like to know if I'm already subscribed to an XMPP user, etc.

Comparison is harder than it sounds, but you already know that if you've dealt with any code requiring canonicalization, conversion or encoding. If a link in a Web page contains a space, at some point my browser has to convert that space to %20 to use in an HTTP request. Should the browser do that conversion before or after looking up the URL in the cache? Should delicious.com bookmark the URL with the space or the one that's used in HTTP requests? This is the tip of a very large iceberg that potentially includes all of internationalization and Unicode. Be very clear on what character set is used by each part of a URI, and if it's all ASCII, say so.

Case sensitivity is a frequent issue. Be very clear on which parts of the URI are case sensitive.

For new URIs, giving options makes the comparison job much harder. Let's say a new URI scheme needed to include a country designation: it seems nice to let users put a two-letter country code, a three-letter country code, a TLD or a OID in there. Only now one needs a horrid table to convert and compare these, and string comparison is no longer enough.

Optional syntaxes are similarly difficult; even allowing for '/' vs '\' can lead to error.

When the URI form is an alternate form for an identifier that already exists, now the URI may have to be comparable to something that's not a URI. For example, both IRI and URN forms exist for ISO OIDs. Don't they need to be compared to each other?

Can the URI form have query syntax? Is that part of the comparison or is that stripped off first? In HTTP URIs if I stripped off the query syntax I'd retrieve quite a different resource, but in some URI forms, the query syntax is used to carry information other than resource-identifying information. For example, can I compare two mailto URIs that have the same mail address, even if one of them has a query part with "?subject=Hey%20There"?

Can the URI form contain multiple values? The SMS URI definition had to include text on comparison when multiple SMS addresses were packed into the same URI. Does order matter to comparison?

Uniqueness

Frequently URI schemes need to avoid collisions, so that there isn't an attempt to give two different things the same identifier. The problem here is delegating the ability to create new URIs, while still avoiding collisions. The major fallback here is the DNS: URIs that contain a domain name, where the resource being named belongs to that domain, effectively delegate the uniqueness concern to that domain holder.

For example, we don't need to worry that "xmpp:lisa@jabber.org" will conflict with other resources, because the domain 'jabber.org' assigns usernames uniquely within that domain and prevents collisions.

The other main option is to use registries. For example, all OIDs, defined by ISO, use the ISO process to register numerical values and string values for use in the parts of an OID. Other times, IANA is the registry (e.g. for port numbers in HTTP URIs). If there is a new registry needed by the URI, this is more work and more to get right.

Some processes for ensuring uniqueness are quite heavyweight. Many IANA registries have processes which can take weeks or months to resolve. If the registrar is not IANA, who is going to actually run the registration process and under which rules? The OGC URN defined in RFC 5165 includes sub-namespaces issued by OGC itself . The first consequence of this is that the OGC organization must be referred to for any new OGC URN unless it explicitly delegates that part of the namespace. To reduce the burden of being a registrar in the case of non-permanent, test or experimental OGC URNs, the URN definition mentions the possibility of an experimental sub-namespace and the possibility of collisions within that namespace. Now implementors have to consider the possibility of leaked experimental names and dealing with collisions. The approval discussion of RFC5165 was lengthy, because of these nuances.

Stability

Some URIs need to refer to the same thing over only a short time, but typically the desired stable period is long or even longer. Domain names can be a problem here. Initially it might seem great to use HTTP URIs as XML namespaces, but consider whether the holder of the "example.org" domain will change over time, and whether the new holder will have the same policies regarding use and allocation of URIs in that namespace.

If a registry is used to achieve unique assignment, and the registrar is not IANA, then the stability of the registry must be considered. How long is the organization going to exist and maintain the registry publicly? We look for a public commitment, existing Web pages, a long-lived organization and so on. An explanation of the process and deciding factors for how names are assigned and how the organization ensures they are not reassigned, shows that they've thought about this commitment. See RFC 5328 for an example.

Random List of Gotchas

Does your URI scheme include use of fragment identifiers (like the #iri part of http://example.org/faq#iri)? Forget it; fragment identifiers relate to the media type of the *resource*, not the type of the URI. So if the URI "foo:bar:baz" retrieves a HTML page, then the fragment identifier would act like a HTML document fragment identifier.
ABNF is hard to get right. Get it reviewed by an expert. Use a ABNF generator or something like that to test your instincts. Refer to existing productions where possible. One common issue is to use a separator like "=" between two constructs, and then define one of those constructs in a way that it includes the separator character itself.
Another ABNF/syntax issue is to accidentally use a character that has an obscure meaning in URI syntax, or is simply reserved.
URIs that contain phone numbers include a whole barrel of troublesome monkeys. It's so hard to get telephone numbers right, with variable length and special encodings, '+' prefixes, dashes or spaces, and extension numbers, that it's worth trying very hard to use an existing phone-based URI instead of defining a new one.
If query parameters are used, can key values be extended? Can new key value be defined by anybody? Do they have global meaning (like mailto URIs) or purely local meaning (like HTTP URIs)?
The "community considerations" section required for URN registrations is frequently misunderstood. What the IETF looks for in this section is an indication that the work done to standardize this scheme and allocate a new scheme or URN type will add value; that there is benefit to the Internet community and not only to a private consortium or private company.
If any reviewer blithely says "whyever invent a new URI scheme, use HTTP for everything!" just ignore them until they provide actual reasoning for this proposal.
Embedding URIs within URIs, or any syntax that is infinitely extensible, is asking for trouble.

Final advice; provide more examples of actual URIs or URNs than you think people will need. Along with an example, explain how that example would be assigned, derived, and if applicable, dereferenced.

References

Here are the documents and registries that govern the registration and syntax of new URI schemes and new URNs.

RFC3406:How to define a new URN namespace or "NID" or "Namespace Identifier".
RFC2141: URN syntax, or the syntax of URIs that begin with 'urn:'.
RFC3986: URI syntax, how to parse all URIs and URNs regardless of scheme.
RFC4395: Guidelines and Registration Procedures for new URI schemes.
IANA Scheme Registry: Existing registered URI schemes
IANA URN-NID registry: Existing URN Namespace registrations

Monday, June 22, 2009

I remember reading Neal Stephenson's Snow Crash and his description of the Metaverse, his conception of virtual reality and online communication, thrilled me. I knew in many ways it was more realistic than Gibson's cyberspace. For instance, Stephenson described how people can choose their own avatars and it's a sign of a newbie or at least a non-programmer to have an "off-the-shelf" avatar, and indeed we see this in places from static online forums all the way to Second Life

One thing nagged at me back then: Stephenson realized that there's no reason not to teleport in virtual reality, but explained that the programming rules forbid it.

You can't just materialize anywhere in the Metaverse, like Captain Kirk beaming down from on high. This would be confusing and irritating to the people around you. It would break the metaphor... Once you have materialized in a Port, you can walk down the Street or hop on the monorail or whatever.

This is unrealistic in a virtual reality which is supposed to be the predominant way hackers like Hiro interact with each other online. Today, online gamers tolerate some limitations on teleporting in game environments like World of Warcraft or Puzzle Pirates, but even there, friction caused by do-nothing travel time is minimized. And in a more general communication milieu -- Web forums, Facebook, Twitter -- there isn't a single, limiting place. I can "be" on two forums at the same time on Ravelry, open two or more Facebook windows and chat with multiple people and I'm "there" with them all for some value of "there". Not only is there the ability to go immediately where I want to be in most online fora, but it doesn't even involve leaving the other "places" I already am.

Ok, here's another piece of the picture that didn't bother me in 1993 but does today:

Most avatars nowadays are anatomically correct, and naked as a babe when they are first created, so in any case, you have to make yourself decent before you emerge onto the Street... [Hiro sees] A liberal sprinkling of black-and-white people -- persons who are accessing the Metaverse through cheap public terminals, and who are rendered in jerky, grainy black-and-white.

This assumes an architecture where the client renders their own avatar. Even in that architecture, a proxy for a public terminal could render a classier avatar. Low-res displays would more likely affect the receiver than the sender -- somebody accessing the online universe through a poor public terminal might see every other avatar equally low-res, but their own avatar could still appear fantastic to people on good computers. It's complicated.

I guess the lessons are that today's online fora are less like the real-world than we could imagine fifteen years ago, and future online fora are less like the real-world than we are yet capable of imagining. We're still sending messages that look like paper mail and have envelope icons, and we still think of "bulletin boards" as a real model. We haven't integrated IM or twitter-like experiences fully into other experiences. Today, I'm downloading the Adium beta to see how twitter "what I'm doing" messages and community are integrated with IM and whether that improves on the old IM concept of presence in a significant way. Trivial interface changes in these sites and software can be significant in how people use them.

To borrow Ted's analogy when we touched on this over coffee today, we're in the same phase cinema was in when a movie camera was pointed at a stage, and a stage play acted upon it: the unique affordances of cinema weren't discovered immediately and are still being discovered even today. With online interaction, we're only beginning to discover how different it is from experience in the physically-limited real world.

Wednesday, June 17, 2009

I have a bunch of baking books, or cookbooks that include serious sections on baking. The Joy of Cooking and The New Best Recipe are my favourites by a long shot, and often I enjoy making the "best" scone even if the recipe is basically white sugar, white flour and a pound of butter.

However, sometimes I'm looking for a healthier scone, muffin or coffee cake -- something that I can eat for breakfast without too much guilt, or offer to health-conscious friends -- and I don't have resources that are just right for me. Ideally, a book on healthy baking would balance out a number of factors without being fanatical on any one of them:

How is the whole grain content? Can some of the white flour be replaced with wheat, or can the recipe handle an optional addition of wheat germ, ground flax seeds, oats or so on?
Can the sugar be cut down and/or replaced with honey or maple syrup?
Can the fat be cut down without sacrificing moistness, shelf life, texture and flavour?
Is the protein ratio good? Is substituting soy flour an option? Adding nuts?
Are the ingredients readily available or can rare ingredients be optional?
Is the taste pumped up? I eat less of pure, dark chocolate or tongue-tingling ginger sweets because my palate is satisfied earlier.

I understand some people get fanatical about one thing, just the sugar or fat or whole wheat content to a recipe, but I rather think balance is important and certainly taste is.

Along these lines, here's an adapted recipe for Mango Chutney coffee cake, derived from Light and Easy Baking. That book focuses only on reducing fat content, which I brought up a little again, but the pumped-up taste is there and the hot pepper is surprisingly good.

2-1/2 cups all-purpose flour
3 t. baking powder
1 t. salt
1/2 c. sugar
1/2 c. brown sugar
1 c. milk
1/3 c. canola oil
1 egg, slightly beaten
2 T. orange marmelade
1/3 c. raisins
3/4 c. chopped mango chutney
Additional pepper, cinnamon or cardamom, particularly if chutney is mild

Mix the dry ingredients together then mix the rest in. Bake in a loaf pan at 350 for 65 minutes.

Wednesday, May 20, 2009

Quick mommy blogging, just to say I'm still here.

A couple days ago I hear a scream and "Get it off me!" from the two year old in the next room. I go running. It's a piece of sticky fluff on his index finger from him poking under the furniture. Sarcasm kicks in but doesn't work:

Me: "Oh noes! It's a disaster!"

Him: "I has a zaster on my finger!!"

Monday, March 23, 2009

We had a really great IETF APPs area meeting today. We invited a whole bunch of people to talk about their topics and moved through the presentations quickly. These topics were:
- HTTP Resource Discovery
- Service/server Discovery
- Timezone publication
- SCRAM (Salted Challenge Response Authentication in SASL)
- Bayeux and cometd: JSON pubsub over HTTP
- BOSH for tunneling XMPP over HTTP
- rHTTP: reverse REST
- Analysis of all these server-initiated HTTP schemes plus Web Sockets
- Massive Multiparticipant Online Experience: the Overview

The slides are all already available here. Here's my favorite, a slide from Mark Lentczner's deck, for tying a whole bunch of things together. Even though Tufte would probably cry as Mark said.

Thursday, February 05, 2009

I'm trying to read some code in Objective-C. This is hard, but it's solidifying my abstract understanding of programming languages. I'm not that hardcore a programmer, but I guess I've picked up a few things over the years (gawd that makes me sound old).

One of the neat things about Objective-C is that using a class or instance method involves sending a message. C++, in contrast, calls those methods. A C++ object has a fixed number of methods that can knowably be called. An Objective-C object might be able to handle arbitrary messages. This makes some things harder and somethings easier: polymorphism is easier; finding cases of using the wrong type of object or having a null object are harder to detect (must be done at runtime, not compile time).

The thing is, this is very familiar to me because this is how wire protocols work. In fact Objective-C has "protocols" which are interfaces, or a set of messages, that an object claims to be able to handle, so the terminology overlaps quite a bit. Anyway, in a wire protocol the client sends a message, and because anything can happen to that message, the client has to be able to handle a large number of outcomes. Polymorphism? You bet; a server that appears to be a HTTP server (implements the HTTP protocol) can also be a WebDAV server, a CalDAV server and an FTP server.

Designing protocols can be hard for people who think in terms of fixed interfaces à la C++. RPC-style protocols embody this thinking, making Remote Procedure Calls and expecting predictable, limited results. It makes more sense to me now, that RPC-style protocols are so brittle: designers and implementors are acting as if there's compile-time checking of the remote interface, whereas since the remote interface is on somebody else's computer that may have been upgraded or may just have a different implementation, of course there's no compile-time checking.

Sunday, February 01, 2009

Mommy blogging today: Natasha said I should post about toddler sleeping stuff.

I have a kid that naps and goes to bed willingly and easily. Clearly there is a huge part of sheer luck in this because I've seen little correlation between loving, wise, firm parents and perfect kids, and I'm not always wise and firm. But we did luck out on a few things that have added to his personality to make for the easiest bedtimes ever.

We introduced an attachment toy early on. This toy, known here as "sleepy bear" because his eyes appear closed, is a blanket-with-head style minky toy.
We taught the sign for bear early on (bear hug yourself with arms crossed, and scratch your upper arms with each opposite hand) so he could ask for the toy pre-speech.

We attached a pacifier to sleepy bear with a folded strip of fabric. The fabric loop goes through the pacifier loop and around the pacifier, so it can come off for easy machine laundering of the toy.
We bought a second identical sleepy bear (and attached another fabric loop) when it became clear this was the favourite toy. Usually one remains hidden to be brought out very conveniently when there's contamination events or simply the bear has gotten too dirty from grubby hands.
Since sleepy bear is always "sleepy", he has to stay in the kid's bedroom most of the time. We make exceptions for when he's sick or at difficult times like coming home in the car after bedtime.

So now, when we say "It's bedtime" his response is (whining) "Nooooo...." but the next phase is "Let's go find sleepy bear" and he responds "OK" and follows us to the bedroom. Sleepy bear is closely associated with sleeping and triggers him to lie down and relax.

He is too young to say recognize fatigue and say "I'm tired, I'm ready for a little nap" but he's easily old enough to ask for bear. If we're at home and his hands are clean, we ask him to go into his bedroom and cuddle with bear until he's ready to come out without bear (and if he comes out with bear we bring him back to the bedroom and ask him to say bye to bear before he can go out and play again). So yesterday morning he did this and actually fell asleep, getting a bonus morning nap which he doesn't usually need any more.

When we want him to fall asleep in a new place (traveling or spending an evening at friends') we just bring sleepy bear. We pull out the toy and any old blanket, and that's enough for the kid to sleep in a new room fairly easily.

Finally, this somewhat limits pacifier use without ruling it out entirely. I don't care too deeply, but there's something annoying about a kid that talks through a pacifier all the time. Having the main pacifier attached to a toy limited to the bedroom means that most of the time when he's playing he doesn't have one. We do have a couple extras on leashes for carseat or stroller travel where it appears to seriously improve patience levels.

I don't mean to give advice because all parents are different and all kids are different, but I did agree it was worth explaining how this works for us. Good luck!

Tuesday, January 27, 2009

Messaging Architects put out a very nice press release about my joining the company. I used it as a bit of a soapbox to talk about what makes a fully Open Standard: free to read, free to implement and free to participate in.

Thursday, January 22, 2009

I now work for Messaging Architects. I started a couple weeks ago but it's been busy; I traveled to Montreal last week to visit HQ and meet the management team.

It's going to be a fun job. The company is smallish (small enough for everybody to be on IM and see each other) but growing and building its product line. The M+Guardian product does spam control and other policy enforcement on email in transit, and I can definitely get behind spam control. The M+Archive is a bread-and-butter product for any company that has to follow regulations on email retention, which is a growing number, and I like the focus on swift retrieval. The M+NetMail email server was aquired from Novell a year ago and the company is now putting its stamp on the product (the team in Utah, who I met last October). In addition there's calendar integration, which you know I'm interested in, and possibly some file-sharing technology.

There's a lot of attention to customer needs at Messaging Architects, and a lot of enthusiasm and dedication. It's not hard for that to rub off on me even working from my own home! I'm in the midst of establishing a more fixed and attractive working spot at home, getting on IM with all my co-workers, joining regular meetings and getting the products running myself. Of course, I'm context swapping this new stuff with ongoing IETF work as I continue to handle the Applications Area Director responsibilities.

Thank-you to everybody who was looking out for me during the job hunt and if you're hunting, may you be as lucky as I.

Friday, January 09, 2009

My fellowship at CommerceNet just came to its expected end -- CommerceNet hires fellows for limited periods, to encourage them, rightly, to get cracking. I figure the fellowship was 75% successful.

Half my work was to continue as Applications Area Director for the IETF, which I continue to do. I got more efficient, I continue to learn a lot, and I even managed to publish a few documents (on consensus processes, interoperability testing and reporting and HTTP Email) that might lead somewhere. The HTTPBIS, IDNABIS and ALTO WGs all got launched. I helped 40 documents become RFCs. It's hard to measure success because there's always more one could do.

The other half my time was even more nebulously defined: I was to build a new venture or otherwise do stuff that would either reflect very well on CommerceNet or be a good investment. I wrote a paper with Rachna Dhamija, another Fellow, before she moved on to launch Usable. I worked with Phil Crosby to build a prototype of a sleep tracker Web application, and with Jeff Lindsay to build a prototype Web site for making public health information more accessible.

By the end of 2007, I focused on the public health information accessibility site. I built a short-term development plan and budget (or actually, several) and a long-term vision. The vision included supporting research by letting scientists share information with each other, benefiting from the same visualizations and data transformations that are required to make data accessible to ordinary people. The site was also intended to be highly contributory in the long-run, so that data collectors (whether government departments, academic or industry researchers or private organizations) could publish their data in this accessible forum.

In early 2008 I still had very little budget so I decided to build the live site myself. I brushed off my Python, taught myself Django, and wrangled with databases, graphing packages and CSS stylesheets, then with running a Linux server. I even did a logo myself. In the summer I launched openfindings.org.

During all this time, I presented and demonstrated to everybody I could meet: at over sixty people, that was more than one demo a week. I was looking for partners (I actually hate building stuff alone), investment, grants or a home for the project. This is where I signally failed; although I often saw enthusiasm and expressions of support, I didn't manage to get enough concrete support to keep the project going.

I hope the ideas make it out there, though. There's no excuse for taking public money to create vast collections of public data, and then make the public interface as bad as this, or this, or this. Even CDC has daunting Web forms and codes to know. Users need to be able to discover data in an exploratory way, learning about it as they go, rather than be forced to know all about the data in order to know how to query it before they can even see any of it. Can you do that? Yes. Users can browse topics and see thumbnails of data visualizations, filtering as they go, never having to fill out a Web form or learn an ICD code. Faceted browsing and rich interlinking of related topics/graphs (more than I was able to implement on openfindings.org) would make data browsing even easier and richer.