Wednesday, October 30, 2013

Using DynamoDB, work in progress

At work we're using Amazon Web Services' DynamoDB for a backend.  This is early days and a work in progress, but I thought I'd post about what we're doing so far because I've seen so little elsewhere about it.

Our Web framework is Ruby on Rails.  Rails is a system that favours convention over configuration.  Most RoR developers use ActiveRecord, Rails' built-in system for object modeling and abstracting away SQL database access.  If you stay on the rails, this works fantastically.  Rails automates or partially automates many tasks and systems, from migrating your data when the model changes, to setting up unit tests that conveniently setup and instantiate the things you want to test.  Building on top of this, many Ruby gems extend your Rails functionality in powerful ways (Web UI test automation, authentication and user management, access to social network sites).

As soon as a project starts to diverge from Rails conventions, trouble begins.  Trouble may be contained if the difference can be contained and made as conformant as possible to the default components.  For example, when writing a API that serves RESTful JSON resources instead of HTML, it's best to figure out how to use views to serve the JSON in the same way that HTML views are generated (a topic of a few posts I did a couple years ago).

Which brings me to Dynamoid.  Amazon's Ruby gem for access to DynamoDB is very basic and exposes DynamoDB architecture directly.  That can be useful but it doesn't behave anything like ActiveRecord, and in order to use Rails' powerful tools and extensions, we need something that behaves as much like ActiveRecord as possible.  The only ActiveRecord replacement for DynamoDB that I could find, that was at all active, was Dynamoid.  So I'm pinning my hopes on it.  AFAICT so far, it is incomplete but has "good bones".  I've already fixed one tiny thing and submitted a pull request, and intend to continue contributing.

Next post will be about testing in this setup.

Monday, October 14, 2013

Correctness impedes expression

In kindergarten and grade one these days, teachers encourage kids to get their thoughts onto paper any old way.  They don't explain how to spell every word and they certainly don't stop kids and correct their spelling.  For a beginning writer, it will likely be all caps, and the teacher may not even suggest spaces between words.  Here's my first grader's recent work:

wanda-yi wan-t toa school 
and i wat to room 
four it was nis 
iasst the tishr wat 
harnamwas martha She 
was nisdat war day sH- 

This means "One day I went to a school and I went to room four.  It was nice.  I asked the teacher what her name was, Martha.  She was nice that(?) were day she (unfinished?)"   Don't you love the phonetic "wandayi" for "One day I" ?  I do.   Note that the letter "I" is used for long sounds like in "nice", because that makes sense before one learns that that sound can be spelled many ways including "ie", "i?e", "aye", or "y".

Okay, cuteness aside, I flashed to thinking about XCode while Martha explained why they teach writing this way:  it's hard enough for a kid, writing slowly and awkwardly, to get three words out onto paper, let alone a whole page of writing.  Many kids get intimidated by corrections and worry of mistakes.  Instead of answers, she gives them a whole bunch of resources: try sounding it out, think of a similar word, try looking somewhere else in your own writing, see if the word is somewhere else in the room.  Above all, she encourages practice and resourcefulness rather than perfection.

Unlike Martha, XCode is like the stereotypical teacher from 60 years ago who would stand over you constantly and warn if she even thinks you're about to make a mistake.  "That's wrong."  "No, it's still wrong".  "That's somewhat better, but still not good."  "Now that's right, but now this is wrong."

Maybe that's why I still use TextMate for Ruby.  If the code doesn't have the right syntax, I'll learn about it later.  (I write tests.)  But for getting an algorithm out of my head and onto the screen, I much prefer not to be corrected and warned constantly while I'm doing it.

Friday, October 04, 2013

AWS Persistence for Core Data

I like DynamoDB, and I like architecture that reduces the amount of backend engineering one needs to do in a company whose product is an app.  So I was quite interested to investigate AWS Persistence for Core Data (APCD, for lack of a better short name) in practice.

APCD Overview according to me

APCD is a framework you can install on an iOS app such that when the app wants to save an object to the cloud, it can set APCD to do so silently in the background.  Not only does it save the object to the cloud, but changes made in the cloud can be magically synched back to the app's Core Data.  There's a parallel framework for Android which is promising for supporting that platform with the same architecture.

On the server end, if the server needs to do some logic based on client data, the server can access the DynamoDB tables and view or modify objects created by the applications.  In theory one doesn't have to design a REST/other interface for synchronizing client data to the server or to other clients. That's a significant savings and acceleration of development, so we read up on APCD around the Web and implemented it.

While there were a bunch of minor problems that we could have overcome, the primarily one was: nowhere does Amazon seem to document how to architect to use AWS Persistence, or explain what is it for. In the main article linked above, the sample code and objects are "Checkin" and "Location".  But where's the context?  Are these Checkin and Location objects in the same table?  Is there one giant table for all data?  Does each client have its own private table for a total of N tables?  Or are there two tables? Or 2N?   It really helps if new technology documentation  includes some fully fleshed out applications to give context.  Full source code isn't even what I'm talking about, but at least tell us what the application does, why it's architected the way it is use the new technology, and some other examples of what the new technology is for.

What I think APCD is for

Well we recently put together a couple facts which suggest what APCD is for.

  • You can't have more than 256 tables in a DynamoDB for an account, even when using APCD.  This limitation is very relevant to architectural choices made with APCD.*
  • If an installed app has the key to access any part of a table, the app can access the whole table, all objects.  There's no object-level permissions yet, and because the app access the data on DynamoDB through APCD, the server can't intercede to add permissions checking.
All right, so that tells us we can't architect the application so that each app instance saves its own table separate from other apps' tables.  We run out of table space at 256 installed users if not sooner.  It also tells us that if apps are going to share larger tables, the information in those tables has to be public information.  

So that suggests to me that APCD is for apps to synchronize shared public data.  For example, an application that crowd-sources information on safe, clean public bathrooms.

How my sample app would work

The crowd-sourced bathroom app could have all the bathrooms' data objects in one big table, and each instance of the application can contribute a new bathroom data object or modify an existing one.  A server can access the bathrooms data too, so Web developers could build a Web front-end that interoperates smoothly as long as the data model is stable.  

Now to use the service, even if the whole dataset is too large to download and keep, an app could query for bathrooms within a range of X kilometers or in a city limit, and even synchronize local data for offline use.  When the app boots up it doesn't have to download local bathroom data if it has done so before, instead APCD is supposed to fill in new data objects matching the query, and update the client with changes. 

For security, we have to trust each app to identify users so we can identify and block simple bad actors (somebody using the app interface to insert false information), and we have to have some backup for dealing with the contingency where the app is completely hacked, its key is used to access the bathroom data, and somebody quite malicious destroys all the useful bathroom data.  

What we did

We ended up not using APCD because what we're building does not involve a shared public database. We have semi-private data objects shared among a small set of trusted users.  Doing that with APCDs limitations seemed too far off APCD's garden path of easy progress.

Is there a better way to use APCD? 

*   Yes, you can have the 256 table count lifted, but not by much.  Not, say, to 1 million. That's not how DynamoDB is architected to work well.

Blog Archive

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.