Monday, May 05, 2008

I'm getting into geographical distributions and shapefiles now for the health data accessibility project. Ekr and Cullen got the basic R and maptools extension commands working for me. I tweaked it a bit and now have thumbnails and detail views (shown here, deaths attributed to HIV in white males):



  



One thing I've learned about freely-available shapefiles is that they have a lot of crap in them. Crap is, of course, in the eye of the beholder. In my case, I want the shapes of the regions only, and not the population, households, whites, blacks, males, females, density, age distributions, divorced, married, never married, single-parent household, household units, vacancies, mobile homes, farms, crop acres and more counts/rates, for every region.

I can probably get a shapefile editor and remove that stuff but it will only slim down the files slightly. Even bigger than all that data are the latitude and longitude of every vertex required to draw a state outline. Think coastline and all those islands in Alaska, Puget Sound, etc. Since my map sizes are limited anyway, I wonder:
  • how many vertexes I could afford to lose without losing any
    resolution in the larger size, and how I could find a shapefile appropriate for that
  • how much smaller the resulting shapefiles would be
  • whether that would be any faster anyway
Performance is a problem right now! I can trivially do caching of drawn maps but it's at the point of slowing development down at times I do not want to cache.

1 comment:

Anonymous said...

I don't know what format you have to work with, but gpsbabel can apply filters to 'track' files that remove proximal points and even just thin / prune them. You might be able to use it to reduce the complexity of your shapefiles.

Blog Archive

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.