Friday, February 29, 2008

Nearly two years ago, I made a prediction that Atom would replace WebDAV. At the time I was even working on revising WebDAV and just beginning my involvement with Atom standardization. I told this prediction to Cullen Jennings and documented it in a note dated 5/9/06. "Replace" is a fuzzy term, because they're not equivalents, but here's what I see today.
  • Google turns Atom, RSS and AtomPub into GData and uses that for blogs, calendars and task lists for starters. Google does not use WebDAV as far as I know.
  • Microsoft is unifying its developer platform protocols on Atom and AtomPub, and using those protocols for unstructured application storage, e.g. photo albums to blogs. (h/t James Snell. Hey, I used to work with David Treadwell, haven't talked to him in years.) WebDAV support in projects like Exchange is downplayed.
  • IBM has a bunch of projects using Atom but it has a bunch of projects period and I haven't seen strategy announcements about either standard.
  • Apple uses Atom and RSS in quite a few applications, although it's also using WebDAV and CalDAV on .mac and in its calendar server.
  • Blog service sites are ubiquitous and all use Atom or at least RSS. File sharing sites or other WebDAV public services are rare. Photo sharing sites are more likely to use Atom than WebDAV by a long shot.
Were I to propose CalDAV today it would probably be CalAtom -- some things would be easier, some harder, but it would catch a wave instead of drifting in the tail of something that was never much of a popular wave. Oh well, we needed something then, and WebDAV gave the most leverage at the time.
The Django stuff was going so well, I switched over to learning R a couple days ago just to depress myself. It's tricky to get into. The documentation is very scattered (kudos to those who are improving it at the wiki but that's a work in progress). I haven't yet found a way to get documentation for arbitrary libaries. The cute name 'R' makes it very hard to Google for R libraries. E.g. I haven't yet found a core graph or a library function that creates horizontal bar graphs (though maybe somebody will tell me to do vertical bar graphs with all the labels turned, then turn the whole image).

One headache in particular was trying to work with display strings that needed to be converted to numbers. I had a table containing this tiny fraction of data (out of 8 columns and 15000 rows):

ICD.Chapter    Crude.Rate
A00-B99 22.0
C00-D48 193.2
D50-D89 3.2 (Unreliable)
E00-E88 33.7
F01-F99 21.0

To graph the Crude Rate without the occasional "Unreliable" strings, I needed to break the information about which measurements were unreliable into a separate column (not losing the information), reduce the Crude Rate to be a number and convert it into an R 'numeric' type. I spent about 8 hours. The first four I managed to solve the first problem: I looked for a "contains(string, substring)" style function but there was none; "grep" produced errors that were really hanging me up whenever the input cell did not have the string I was looking for. Finally I wrote my own little contains function using regexpr:

contains <- function(pattern, string) { 
regexpr(pattern, string) != -1

It looks simple now, but it was tricky to get a function that worked properly when converting 15000 cells to 15000 new cells.

The next four hours were spent trying to remove the "(Unreliable)" string from the cells that had it. Again poring through documentation, I tried various approaches and got the closest with this one:

first.word <- function(x) { 
substr(x, 0, regexpr(" ", x)[1]-1)

The problem was that although this one worked fine on single strings when I tested it, it didn't work on turning a whole column into a whole new column.

Today at 11:00 am I turned to Excel to see how quickly I could do it in Excel. It's been a long time since I've used Excel formulas, but I knew this kind of thing could be done. In many ways the Excel formula documentation is worse, but it's a smaller set of controls (or at least a clear subset) and it was comprehensive and organized.

In Excel the two new columns are created like this from the H column:

=NOT(ISERROR(SEARCH("Unreliable", H2)))
=IF(ISTEXT(H2), VALUE(REPLACE(H2, SEARCH(" ",H2), 13, "")), H2)

I'm sure this could still be done in R but making those two formulas work for this particular table took me 20 min in Excel instead of four hours.

Tuesday, February 26, 2008

Development work went much smoother today. I set aside installation of Apache and MovableType and decided I could do something easier at least for now. Believe it or not it was easier -- for me at least -- to implement a minimalist blog in Django, from the database on up to the display pages, than it was to install and hook together software.

I also wrote some of the core data model for the site using Django's standard model tools. It's a site that will improve the accessibility of health-related statistics (costs, health surveys, incidence registries and other epidemiological data) by offering good search and lookup and simple clean visualizations. So I started by creating data models for data sources, report names and conditions that are tracked, and hooking those data models primitively into the Web pages where I want them to show up.

Of course development involves writing new code, but that's not nearly all of it. Even when writing new code, part of development is debugging: when you know what you want to do and have chosen a way to do it, but it's not working. Another chunk of new development time is decision-making, e.g. deciding how objects will relate or how interfaces will look. It's interesting how fast this style of development goes from decision-making to debugging and back, without a lot of time spent writing new code.

Any quick pointers on
- unittesting Django sites?
- Integrating C libraries into Django projects?
- Departing from Django DB models -- when to create one's own tables and how?

Monday, February 25, 2008

Since Ravelry ate my blog, I'm going to see if I can't restart this with a different idea. I originally started a blog to try to say clever (and subtly snarky) things about news, particularly science news. Then this turned into a lame knitting blog. Now I feel like I don't know what to post here. So how about I post what I like, eh?

I am trying to learn how to use a camera and a flash better. I bought a Canon Speedlite 480ex so that I could point my flash in different directions, dial up or down the light intensity, and use a diffuser. I also tried to buy a hot shoe adapter and cord so that I can use the flash at a distance from the camera, but it looks like I might have bought the wrong model of something. The cord has a little plug just like a single earphone plug, and there's a dock space on the Speedlite, but it's empty. Behind the hole there's no socket, just plastic.

Still, the Speedlite is better than nothing. I used the diffuser for some of the shots taken of people in a big, poorly lit conference room at the Ravelry meetup.

I think it turned out better than with a standard Rebel built-in flash. I had fun anyway.

At work, I'm immersing myself in Django and oh-my-god real setup of Web, database and other server software. This intimidates me. When I get something working (like an index.html page with working stylesheet and logo links in Django) I take a little "success" break to let the feeling last.

I hate how easily this kind of work can go wrong. There have got to be common ways of doing things that just work. But I don't know what those are, and software for developers errs on the side of flexibility and power, rather than on the side of making choices for you so that it just works. And some choices work but are awkward. Where do I install Apache? Where do I locate the static files served by Apache? Where do I install MySql so that it can be used by MovableType as well as Django? Do I upgrade to Python 2.5 inplace or have it co-exist with Python 2.3? Invariably on the Mac there are permissions issues: do I change the permission of the file so I can do what I'm trying to do with my login account or do I switch to 'sudo' the command? I normally get things working eventually but it's sooo slow.

Blog Archive

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.