Friday, March 28, 2008

I miss working closely with other developers; that's part of why I'm blogging right now.
Python exacerbates that, because there's so many ways to do things, particularly the kinds of things I'm doing right now, like parsing text files with tables in human-readable but not-particularly-computer-readable formats.

Here's a tiny example: I needed to remove commas from strings like "1,207" before converting them to integers to graph them. I thought of using slicing, which is a powerful Python feature on lists, strings and more:
>>> string="1,207"
>>> string[:string.find(',')] + string [string.find(',')+1:]
'1207'
It got ugly fast so I looked for something else:
>>> string="1,207"
>>> string.replace(',','')
'1207'
Splits are also powerful:
>>> string = "1,207"
>>> ''.join(string.split(','))
'1207'
Of course I could also define a simple "removefrom(char, string)" function:
>>> string="1,207"
>>> def removefrom(char, string):
... return string.replace(',','')
...
>>> removefrom(',', string)
'1207'



To find good ways of doing things, I end up browsing a lot of Python code online. That's frustrating because some sites have started hosting "sample code" mostly as a way to put advertisements on-screen and in pop-ups. And I'm sure the most trivial code review would identify plenty of areas my code could be much more Python-clever.

Thursday, March 20, 2008

Internships for CommerceNet this summer: I'll be the one hiring and managing. If we get a great proposal for a project from the right intern candidate, CommerceNet might simply hire that intern to work on that project. Otherwise, here's what I have in mind.

CommerceNet is an entrepreneurial research institute, dedicated to fulfill the promise of the Internet. We are currently seeking Software Engineer interns to implement a data visualization Web application for public health information. Involves JavaScript and Python, both data access and graphics. CommerceNet may also accept proposals for internships to work on well-specified projects of the intern's own design.

What you'll do
  • Develop open source libraries or widgets for graphing and data visualization
  • Build public service, community oriented Web site
  • Be part of a small team or work nearly independently
  • Develop with minimal guidance, using rapid iteration and feedback loop and with leeway in choices of tools.
  • Borrow, create or collaborate on visual design and visual elements
Required Skills:
  • Web Applications development, including CSS and JavaScript
  • Python or demonstrated ability to pick up languages
  • MySQL or similar data management experience
  • Great ability to extrapolate from raw ideas to realistic implementations.
  • Demonstrated initiative pulling a project forward
  • Some experience using graphics libraries
  • Familiarity with Cleveland or Tufte principles would be a bonus
Email ldusseault@commerce.net with questions or cover letter and resume.

Tuesday, March 18, 2008

There is so much IETF work on email these days. There are five separate WGs (one in the Security area), a Research Group and a couple informal efforts. I tend to have to summarize the work to pull it together in my head. Here's the post-71st-IETF sitrep. Also see Barry's post for more detail on SIEVE, DKIM, LEMONADE and ASRG. Document links collected at bottom.
  • The IMAPEXT WG is so close to shutting down, it did not meet last week. One of its work items was to internationalize parts of IMAP (including mailbox names, and how to sorting strings like subjects) and those documents were delayed but finally got approved.
  • The LEMONADE WG met, and seems to be winding down. Although its extensions are all linked by being useful to mobile email clients, there are some extensions there of general interest.
  • The SIEVE WG just finished publishing a whack of documents around its new core spec, RFC5228. At its meeting, the group discussed whether to recharter to do another round on the core SIEVE documents and standardize some more filtering extensions.
  • EAI WG has requested publication for most of its documents. These are Experimental Standards for using non-ASCII characters in email addresses, which affects IMAP, POP3, SMTP in interconnected and complicated ways.
  • A design team is nearly done updating SMTP (RFC2821) and the Internet Message Format standard (RFC2822). They're handling "last call" issues on the list.
  • DKIM -- having previously published its core signatures doc (RFC4871) and requirements for signing practices (RFC5016) -- is now working on the signing practices standard itself.
  • ASRG, the Anti-Spam Research Group, is documenting various anti-spam techniques.
  • In informal discussions, several of us keep talking about rearchitecting email access. However, nobody's ready to predict, let alone commit, that their company will implement something new. Does that mean that there's not really enough pain around using IMAP? Or that the pain is the user's and not the software vendor's?
Documents:

Monday, March 17, 2008

One of my favourite tools for keeping on track is doing a regular report to my management.
Ever since Max Dunn, a former co-worker, told me about the "four P's" (progress, plans, problems, personnel) mnemonic, that's been a useful concept. Writing down some notes in each of those categories makes me remember stuff better, catch up, and plan better. I get almost all the value even without sending the report.

Right now I send a somewhat different monthly activity report to the Apps Area Discuss mailing list and to the Working Group chairs of the WG's I advise. It serves some of the same purpose to me, and other people have found it useful. I indicate what I think is the status of each document I'm sponsoring to become an RFC, and if the authors think the status is different (e.g. one of those common "I was waiting for you while you were waiting for me" traps) they tell me. I mention a couple notable things at the top of the report, and try to summarize what state each WG I advise is in.

I try to send these in the first week of the month, which means I'm two weeks late for March due to being sick and having the IETF meet last week. Maybe later today??

Tuesday, March 11, 2008

Although I'm mostly doing IETF stuff at IETF this week, I took advantage of Andrew's expertise while he's here too, to help me get a Python library running and working. I had gotten blocked last week and we worked through several more problems this afternoon. Some notes follow for the record, to add to the couple blog posts out there that help one out on these problems.

I was trying to install pycha and play with its simple charting. To do this, I needed to have py-cairo and cairo version 1.4.12 installed on python 2.5.

The easiest way to get cairo installed is using MacPorts. This is fine, although it installs cairo into the MacPorts version of Python, not the Mac OS X version of Python. The Mac OS X version of Python already has cairo installed, but it's apparently an older version. I didn't know how to upgrade the system cairo library so I focused on doing this with the Python installed by MacPorts. This led to later complication.

After apparently successfully installing cairo version 1.4.14, attempts to install py-cairo failed. It kept on insisting that only cairo 1.4.10 was available. Andrew tracked this down, and we're pretty conviced that the information that the MacPorts cairo package applies to pkg-config is false: it says version 1.4.10 right inside the 1.4.14 cairo.pc file. We edited this file directly and went on.

Next we tried to install py-cairo using the regular "python setup.py install", but that failed -- it is not set up to work on Mac OS/X and there were numerous errors. MacPorts rescued me again, thanks to a hint we ran across from KenKeiter -- this port package has a bunch of hacks (string replacements in directory paths it looks like) to make py-cairo work on Mac.

Finally, back to installing pycha itself. This has a nice handy Python Egg available, so I installed "easy_install" to install that. I goofed and installed that in the default System version of Python, so while installing easy_install worked, installing the pycha egg failed. It turns out there is a MacPorts port of the same thing, called py-setuptools. Only you don't want that one, you want py25-setuptools if you have Python 2.5. I installed that.

OK, back to installing pycha for real again and it finally worked. One gotcha in following the main documentation page example, which is otherwise very nice -- I needed to "import pycha.bar" for the sample code to work (Update: Lorenzo Gil Sànchez fixed this already). And it did work.

Proof:

Tuesday, March 04, 2008

Mike from work pointed me to EpiSpider, which does dynamic graphs and uses in part an in-browser library for faceted browsing called Exhibit, from the SIMILE project. This project also has a beautiful Timeplot library, again in-browser.

The basic approach here is minimalist: all a site developer needs is a place to host Web pages, some of which are data files. No database, no content management system. Still, those have their places too, and I can already see how to make great use of Django features as well as Exhibit and Timeplot features, and I know databases. To tie these together, one just needs a bit of AJAX code to call a server-side process to pull the appropriate subset of data from the database in JSON or other simple format, and I know that exists too.

I had been ready to try generation of static chart images in order to save time -- that was why I was trying to use R (also the data import, manipulation, analysis and export features could be useful). But ultimately good online charts need features like mousing over to see point values, scrolling, and zooming; static charts can't provide that but in-browser canvas code like that used in Timeplot can.

Sample here.

Update: removed annoying iframe

Blog Archive

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.