Monday, 3 October 2011

Mapping associations with CouchDB, Scala and Lift

After a successful start with CouchDB and Lift (see my previous post), I started to implement a richer domain model for Couch persistence, including associations. In this post I will share the approaches that I tried and some pitfalls and ideas I gathered in the process.
We will start with the simple User object we introduced in the previous post. To remind ourselves, here is the source code for the User object in Scala:
Listing 1: Simple Scala object representing the User
import net.liftweb.record.field.{DateTimeField, PasswordField, StringField}
import net.liftweb.couchdb.CouchRecord
import com.opencredo.nicheplatform.couchdb.CouchMetaRecord

class User extends CouchRecord[User] {

def meta = User
object username extends StringField(this, 20)
object fullName extends StringField(this, 100)
object birthDate extends DateTimeField(this)
}
object User extends User with CouchMetaRecord[User] {
}


Let's make our User object from previous post more realistic, by adding address details to it. There are different approaches we can take in modelling the User/Address association. We will take a look at three common approaches in this post, and evaluate their applicability to document database and Lift's Record framework.

I) Single document approach
First and simplest option would be to add address fields to the User object/document directly, and use them just like any other User field. To make the model cleaner, we can extract the address fields to a separate trait:
Listing 2: Address trait for modelling single document Couch mapping
trait Address[T <: CouchRecord[T]]  {
  this: T=>

  object addressLine1 extends StringField(this.asInstanceOf[T],100)
  object addressLine2 extends StringField(this.asInstanceOf[T],100)
  object addressLine3 extends StringField(this.asInstanceOf[T],100)
  object postcode extends StringField(this.asInstanceOf[T],10)
  object city extends StringField(this.asInstanceOf[T],100)
  object country extends StringField(this.asInstanceOf[T],100)
}

class User extends CouchRecord[User] with Address {
  def meta = User

  object username extends StringField(this, 20)
  object fullName extends StringField(this,100)
  object birthDate extends DateTimeField(this)
}
object User extends User with CouchMetaRecord[User] {}

When we instantiate and persist this document, the resulting JSON document will look like this in CouchDB:
Listing 3: JSON representation of User and Address objects in the single document
{
   "_id": "706a63a5ae0adc3325347c8a1d039682",
   "_rev": "1-ee182844a6d03be7c547d0b57d5363f9",
   "addressLine1": "The Whitehall",
   "addressLine2": "220 Yew Tree Road",
   "addressLine3": "",
   "birthDate": "Wed, 7 Sep 2011 16:47:31 UTC",
   "city": "London",
   "country": "UK",
   "fullName": "Bob Bobic",
   "postcode": "WC1 4EQ",
   "type": "User",
   "username": "mrbob"
}

Our model is abstracted so that Address is modelled as separate trait, but we still persist user data (name, username, birthDate) in the same Couch document as address data. There is no requirement for any kind of join operation, as all fields are available directly from the document.
The drawback of this approach is that there is no separation of data at storage level, so if you would like to get all addresses stored in the database from this structure, you may have to do quite a lot of work yourself. In addition, you cannot easily model multiple addresses for user using this approach - you would have to define the fields for all addresses you plan to store within the document.
Note that the property names (keys) in the document are sorted naturally and there is no differentiation between user data and address data in the stored data structure. So addressLine fields (address data) are followed by birthDate field (user data), followed by city and country (address data again), making the context of the data in the document difficult to understand.

II) Using foreign key - relational approach
What about relational approach? Can we store the address as separate CouchDB document, and keep the "foreign key" reference in the User document? Of course we can. We won't have the RDMS built-it referential integrity checks though, so the management of foreign keys will have to managed within our application code.
Although this approach is more suitable for relational databases, sometimes it's required to keep data structures that have different lifecycles as separate documents.
The Address CouchRecord implementation would look like this:
Listing 4: Address object as separate CouchRecord
class Address extends CouchRecord[Address] {
def meta = Address

object addressLine1 extends StringField(this,100)
object addressLine2 extends StringField(this,100)
object addressLine3 extends StringField(this,100)
object postcode extends StringField(this,10)
object city extends StringField(this,100)
object country extends StringField(this,100)
}
object Address extends Address with CouchMetaRecord[Address] {
}

The reference to the address will be stored as String key in the User class. We can add a function that will fetch the address based on stored key, when required:
Listing 5: User object with addressKey fields for storing the id of the referenced object
class User extends CouchRecord[User] with Address[User] {
def meta = User

object username extends StringField(this, 20)
object fullName extends StringField(this,100)
object birthDate extends DateTimeField(this)

object addressKey extends StringField(this,100)

def getAddress : Address = {
val address = Address.fetch(addressKey.get).open_!;................#1
address
}
}
object User extends User with CouchMetaRecord[User] {
}

We will have 2 CouchDB documents representing the User and Address objects in the database:
Listing 6: User and Address as separate JSON documents in the CouchDB
{
"_id": "706a63a5ae0adc3325347c8a1d03927e",
"_rev": "1-652d878ac30d6f6bd0b90737135eb783",
"addressLine1": "The Whitehall",
"addressLine2": "220 Yew Tree Road",
"addressLine3": "",
"city": "London",
"country": "UK",
"postcode": "WC1 4EQ",
"type": "Address"
}
{
"_id": "706a63a5ae0adc3325347c8a1d039682",
"_rev": "1-ee182844a6d03be7c547d0b57d5363f9",
"addressKey": "706a63a5ae0adc3325347c8a1d03927e",
"birthDate": "Wed, 7 Sep 2011 16:47:31 UTC",
"fullName": "Bob Bobic",
"type": "User",
"username": "mrbob"
}

Note that the addressKey field value of the User document matches the _id of the Address document.
And that's how we can have a "foreign key" association in CouchDB with the Record framework. The approach is simple and clean.
Fetching the actual referenced document is not performed by the framework itself, instead the developer needs to fetch it programatically, as a separate CouchDB operation (line #1 in Listing 5). When using foreign key association mapping in a relational database (using Lift's Mapper framework or Java's JPA for example), the framework will automatically manage the association object, so that developers do not have to think about it. Some may say that this is actually more CouchDB way - considering only Documents and Views as your data structure - leaving relationships to relational databases!
The code in the last listing assumes that the every user must have associated address document - the code can be improved to handle user's without an address, incorrect reference keys, and even many-to-one relationships - but I'll leave that for you to practice.
Storing the foreign keys of the associated objects can sometimes be a desirable solution for modelling associations in a document database. For example, when the associated entities are managed separately (and can be queried separately), it makes sense to store them as separate documents to keep the amount of duplication at minimum. One use case could be modelling orders and products. Another potential use case is when you need to model many-to-many relationships - although it's possible in that case that you may be better of using the relational database to store your data.

III) Associations as Inline JSON Documents
While the referential approach may be good for orders and products model, when we talk about users and addresses we're typically talking about relationship owned by the user object, so we would like to keep the address data closer to the user (in the same document ideally).
We tried to have address data fields as part of single User documents (in section I) - the challenge was to store multiple addresses for a user, and still keep the rich data structure, so that addresses "belong to" users.
The obvious solution is to store addresses as (an array of) JSON documents within the user document. That way we can have a separate data structure for addresses, and still store them within the parent user document. This approach is sometimes called inline document associations.
The only problem with inline JSON documents is that there are not supported as yet by the Lift's CouchDB-Record framework. After having a look at the implementation available in the MongoDB-Record component, i decided to take the inspiration from it and implement CouchDB-Record version.
The first place to look was the JsonObject trait available in the MongoDB-Record component. This class contains the convenient methods for extracting Scala objects from Lift's JObject instances and vice versa for creating JObjects from existing Scala objects:
Listing 7: JsonObject trait for transformation between JSON documents and Scala's JObjects
trait JsonObject[BaseDocument] {
self: BaseDocument =>

def meta: JsonObjectMeta[BaseDocument]

// convert class to a json value
def asJObject()(implicit formats: Formats): JObject = meta.toJObject(this)

}

class JsonObjectMeta[BaseDocument](implicit mf: Manifest[BaseDocument]) {

import net.liftweb.json.Extraction._

// create an instance of BaseDocument from a JObject
def create(in: JObject)(implicit formats: Formats): BaseDocument =
extract(in)(formats, mf)

// convert class to a JObject
def toJObject(in: BaseDocument)(implicit formats: Formats): JObject =
decompose(in)(formats).asInstanceOf[JObject]
}

To avoid MongoDB-Record dependency in the project that works with CouchDB (!), i simply repackaged this trait within my project code.
Next step was to implement JsonObjectField for CouchDB-Record, based on the Mongo implementation of the same class (net.liftweb.mongo.record.field.JsonObjectField). By removing the Mongo-specific code, I had a new CouchDB-Record field in no time:
Listing 8: Implemenation JsonObjectField for JSON support in CouchRecord
abstract class JsonObjectField[OwnerType <: Record[OwnerType], JObjectType <: JsonObject[JObjectType]] (rec: OwnerType, valueMeta: JsonObjectMeta[JObjectType]) extends Field[JObjectType, OwnerType] with MandatoryTypedField[JObjectType] { def owner = rec implicit val formats = DefaultFormats.lossless; /** * Convert the field value to an XHTML representation */ override def toForm: Box[NodeSeq] = Empty // TODO still missing /** Encode the field value into a JValue */ def asJValue: JValue = { if(value == null){ return JNothing } value.asJObject } /* * Decode the JValue and set the field to the decoded value. * Returns Empty or Failure if the value could not be set */ def setFromJValue(jvalue: JValue): Box[JObjectType] = jvalue match { case JNothing|JNull if optional_? => setBox(Empty)
case o: JObject => setBox(tryo(valueMeta.create(o)))
case other => setBox(FieldHelpers.expectedA("JObject", other))
}

def setFromAny(in: Any): Box[JObjectType] = in match {
case value: JObjectType => setBox(Full(value))
case Some(value: JObjectType) => setBox(Full(value))
case Full(value: JObjectType) => setBox(Full(value))
case (value: JObjectType) :: _ => setBox(Full(value))
case s: String => setFromString(s)
case Some(s: String) => setFromString(s)
case Full(s: String) => setFromString(s)
case null|None|Empty => setBox(defaultValueBox)
case f: Failure => setBox(f)
case o => setFromString(o.toString)
}

// parse String into a JObject
def setFromString(in: String): Box[JObjectType] = tryo(JsonParser.parse(in)) match {
case Full(jv: JValue) => setFromJValue(jv)
case f: Failure => setBox(f)
case other => setBox(Failure("Error parsing String into a JValue: "+in))
}

def asJs = asJValue match {
case JNothing => JsNull
case jv => JsRaw(Printer.compact(render(jv)))
}
}

The custom JsonObjectField class is composed from traits from the Record framework, and the methods defined are responsible for parsing the formatting the JSON value to and from various formats (Strings, HTML, Javascript). It uses the JsonObject trait and the JsonObjectMeta class to parse the JSON document to and from Scala objects.
We can now a use a Scala class that extends JsonObject that will hold the address details, with the companion JsonObjectMeta object:
Listing 9: Modelling Address as JsonObjectField
case class Address(line1: String, line2: String, line3: String, postcode: String, city: String, country: String) extends JsonObject[Address] {
def meta = Address
}
object Address extends JsonObjectMeta[Address]

The User class will now have a address as a JsonObjectField field
class User extends CouchRecord[User] with Address[User] {
def meta = User

object username extends StringField(this, 20)
object fullName extends StringField(this,100)
object birthDate extends DateTimeField(this)

object homeAddress extends JsonObjectField(this, Address) {
def defaultValue = null.asInstanceOf[Address]
}

}
object User extends User with CouchMetaRecord[User] {}

The JSON representation of the persisted User will now look like this:
Listing 10: JSON representation of inline Couch document representing user
{
"_id": "706a63a5ae0adc3325347c8a1d039682",
"_rev": "1-ee182844a6d03be7c547d0b57d5363f9",
"birthDate": "Wed, 7 Sep 2011 16:47:31 UTC",
"fullName": "Bob Bobic",
"homeAddress": {
"line1": "The Whitehall",
"line2": "220 Yew Tree Road",
"line3": "",
"postcode": "WC1 4EQ",
"city": "London",
"country": "UK"
},
"type": "User",
"username": "mrbob"
}

The User and Address are now modelled as separate objects in CouchRecord, but they are storred together as one CouchDB document. The single CouchDB document represent the relationship between the user and addresses (user owns address), and the data is contextually split within the document - by using the inline JSON representation of the address field.
This required some additional work, due to missing support in CouchRecord framework - but it gave us probably the best representation of associated objects in both Lift and CouchB.

Each of the approaches for mapping object associations on CouchDB and Lift that I described in this post can be best options in different scenarios. Because of the specifics of the the user/address object model, the inline JSON document solution was the best approach for my requirements. The support for the JSON object fields in the CouchRecord framework is missing, so I had to implement it myself - which I eventually succeeded with, after few hurdles.
Hopefully this post can help you model associations persisted in any document database in your code. And if you're using CouchDB and Scala/Lift, you can save yourself some time and learn from my experience.

4 comments: