Monday, 3 October 2011

Mapping associations with CouchDB, Scala and Lift

After a successful start with CouchDB and Lift (see my previous post), I started to implement a richer domain model for Couch persistence, including associations. In this post I will share the approaches that I tried and some pitfalls and ideas I gathered in the process.
We will start with the simple User object we introduced in the previous post. To remind ourselves, here is the source code for the User object in Scala:
Listing 1: Simple Scala object representing the User
import net.liftweb.record.field.{DateTimeField, PasswordField, StringField}
import net.liftweb.couchdb.CouchRecord
import com.opencredo.nicheplatform.couchdb.CouchMetaRecord

class User extends CouchRecord[User] {

def meta = User
object username extends StringField(this, 20)
object fullName extends StringField(this, 100)
object birthDate extends DateTimeField(this)
}
object User extends User with CouchMetaRecord[User] {
}


Let's make our User object from previous post more realistic, by adding address details to it. There are different approaches we can take in modelling the User/Address association. We will take a look at three common approaches in this post, and evaluate their applicability to document database and Lift's Record framework.

I) Single document approach
First and simplest option would be to add address fields to the User object/document directly, and use them just like any other User field. To make the model cleaner, we can extract the address fields to a separate trait:
Listing 2: Address trait for modelling single document Couch mapping
trait Address[T <: CouchRecord[T]]  {
  this: T=>

  object addressLine1 extends StringField(this.asInstanceOf[T],100)
  object addressLine2 extends StringField(this.asInstanceOf[T],100)
  object addressLine3 extends StringField(this.asInstanceOf[T],100)
  object postcode extends StringField(this.asInstanceOf[T],10)
  object city extends StringField(this.asInstanceOf[T],100)
  object country extends StringField(this.asInstanceOf[T],100)
}

class User extends CouchRecord[User] with Address {
  def meta = User

  object username extends StringField(this, 20)
  object fullName extends StringField(this,100)
  object birthDate extends DateTimeField(this)
}
object User extends User with CouchMetaRecord[User] {}

When we instantiate and persist this document, the resulting JSON document will look like this in CouchDB:
Listing 3: JSON representation of User and Address objects in the single document
{
   "_id": "706a63a5ae0adc3325347c8a1d039682",
   "_rev": "1-ee182844a6d03be7c547d0b57d5363f9",
   "addressLine1": "The Whitehall",
   "addressLine2": "220 Yew Tree Road",
   "addressLine3": "",
   "birthDate": "Wed, 7 Sep 2011 16:47:31 UTC",
   "city": "London",
   "country": "UK",
   "fullName": "Bob Bobic",
   "postcode": "WC1 4EQ",
   "type": "User",
   "username": "mrbob"
}

Our model is abstracted so that Address is modelled as separate trait, but we still persist user data (name, username, birthDate) in the same Couch document as address data. There is no requirement for any kind of join operation, as all fields are available directly from the document.
The drawback of this approach is that there is no separation of data at storage level, so if you would like to get all addresses stored in the database from this structure, you may have to do quite a lot of work yourself. In addition, you cannot easily model multiple addresses for user using this approach - you would have to define the fields for all addresses you plan to store within the document.
Note that the property names (keys) in the document are sorted naturally and there is no differentiation between user data and address data in the stored data structure. So addressLine fields (address data) are followed by birthDate field (user data), followed by city and country (address data again), making the context of the data in the document difficult to understand.

II) Using foreign key - relational approach
What about relational approach? Can we store the address as separate CouchDB document, and keep the "foreign key" reference in the User document? Of course we can. We won't have the RDMS built-it referential integrity checks though, so the management of foreign keys will have to managed within our application code.
Although this approach is more suitable for relational databases, sometimes it's required to keep data structures that have different lifecycles as separate documents.
The Address CouchRecord implementation would look like this:
Listing 4: Address object as separate CouchRecord
class Address extends CouchRecord[Address] {
def meta = Address

object addressLine1 extends StringField(this,100)
object addressLine2 extends StringField(this,100)
object addressLine3 extends StringField(this,100)
object postcode extends StringField(this,10)
object city extends StringField(this,100)
object country extends StringField(this,100)
}
object Address extends Address with CouchMetaRecord[Address] {
}

The reference to the address will be stored as String key in the User class. We can add a function that will fetch the address based on stored key, when required:
Listing 5: User object with addressKey fields for storing the id of the referenced object
class User extends CouchRecord[User] with Address[User] {
def meta = User

object username extends StringField(this, 20)
object fullName extends StringField(this,100)
object birthDate extends DateTimeField(this)

object addressKey extends StringField(this,100)

def getAddress : Address = {
val address = Address.fetch(addressKey.get).open_!;................#1
address
}
}
object User extends User with CouchMetaRecord[User] {
}

We will have 2 CouchDB documents representing the User and Address objects in the database:
Listing 6: User and Address as separate JSON documents in the CouchDB
{
"_id": "706a63a5ae0adc3325347c8a1d03927e",
"_rev": "1-652d878ac30d6f6bd0b90737135eb783",
"addressLine1": "The Whitehall",
"addressLine2": "220 Yew Tree Road",
"addressLine3": "",
"city": "London",
"country": "UK",
"postcode": "WC1 4EQ",
"type": "Address"
}
{
"_id": "706a63a5ae0adc3325347c8a1d039682",
"_rev": "1-ee182844a6d03be7c547d0b57d5363f9",
"addressKey": "706a63a5ae0adc3325347c8a1d03927e",
"birthDate": "Wed, 7 Sep 2011 16:47:31 UTC",
"fullName": "Bob Bobic",
"type": "User",
"username": "mrbob"
}

Note that the addressKey field value of the User document matches the _id of the Address document.
And that's how we can have a "foreign key" association in CouchDB with the Record framework. The approach is simple and clean.
Fetching the actual referenced document is not performed by the framework itself, instead the developer needs to fetch it programatically, as a separate CouchDB operation (line #1 in Listing 5). When using foreign key association mapping in a relational database (using Lift's Mapper framework or Java's JPA for example), the framework will automatically manage the association object, so that developers do not have to think about it. Some may say that this is actually more CouchDB way - considering only Documents and Views as your data structure - leaving relationships to relational databases!
The code in the last listing assumes that the every user must have associated address document - the code can be improved to handle user's without an address, incorrect reference keys, and even many-to-one relationships - but I'll leave that for you to practice.
Storing the foreign keys of the associated objects can sometimes be a desirable solution for modelling associations in a document database. For example, when the associated entities are managed separately (and can be queried separately), it makes sense to store them as separate documents to keep the amount of duplication at minimum. One use case could be modelling orders and products. Another potential use case is when you need to model many-to-many relationships - although it's possible in that case that you may be better of using the relational database to store your data.

III) Associations as Inline JSON Documents
While the referential approach may be good for orders and products model, when we talk about users and addresses we're typically talking about relationship owned by the user object, so we would like to keep the address data closer to the user (in the same document ideally).
We tried to have address data fields as part of single User documents (in section I) - the challenge was to store multiple addresses for a user, and still keep the rich data structure, so that addresses "belong to" users.
The obvious solution is to store addresses as (an array of) JSON documents within the user document. That way we can have a separate data structure for addresses, and still store them within the parent user document. This approach is sometimes called inline document associations.
The only problem with inline JSON documents is that there are not supported as yet by the Lift's CouchDB-Record framework. After having a look at the implementation available in the MongoDB-Record component, i decided to take the inspiration from it and implement CouchDB-Record version.
The first place to look was the JsonObject trait available in the MongoDB-Record component. This class contains the convenient methods for extracting Scala objects from Lift's JObject instances and vice versa for creating JObjects from existing Scala objects:
Listing 7: JsonObject trait for transformation between JSON documents and Scala's JObjects
trait JsonObject[BaseDocument] {
self: BaseDocument =>

def meta: JsonObjectMeta[BaseDocument]

// convert class to a json value
def asJObject()(implicit formats: Formats): JObject = meta.toJObject(this)

}

class JsonObjectMeta[BaseDocument](implicit mf: Manifest[BaseDocument]) {

import net.liftweb.json.Extraction._

// create an instance of BaseDocument from a JObject
def create(in: JObject)(implicit formats: Formats): BaseDocument =
extract(in)(formats, mf)

// convert class to a JObject
def toJObject(in: BaseDocument)(implicit formats: Formats): JObject =
decompose(in)(formats).asInstanceOf[JObject]
}

To avoid MongoDB-Record dependency in the project that works with CouchDB (!), i simply repackaged this trait within my project code.
Next step was to implement JsonObjectField for CouchDB-Record, based on the Mongo implementation of the same class (net.liftweb.mongo.record.field.JsonObjectField). By removing the Mongo-specific code, I had a new CouchDB-Record field in no time:
Listing 8: Implemenation JsonObjectField for JSON support in CouchRecord
abstract class JsonObjectField[OwnerType <: Record[OwnerType], JObjectType <: JsonObject[JObjectType]] (rec: OwnerType, valueMeta: JsonObjectMeta[JObjectType]) extends Field[JObjectType, OwnerType] with MandatoryTypedField[JObjectType] { def owner = rec implicit val formats = DefaultFormats.lossless; /** * Convert the field value to an XHTML representation */ override def toForm: Box[NodeSeq] = Empty // TODO still missing /** Encode the field value into a JValue */ def asJValue: JValue = { if(value == null){ return JNothing } value.asJObject } /* * Decode the JValue and set the field to the decoded value. * Returns Empty or Failure if the value could not be set */ def setFromJValue(jvalue: JValue): Box[JObjectType] = jvalue match { case JNothing|JNull if optional_? => setBox(Empty)
case o: JObject => setBox(tryo(valueMeta.create(o)))
case other => setBox(FieldHelpers.expectedA("JObject", other))
}

def setFromAny(in: Any): Box[JObjectType] = in match {
case value: JObjectType => setBox(Full(value))
case Some(value: JObjectType) => setBox(Full(value))
case Full(value: JObjectType) => setBox(Full(value))
case (value: JObjectType) :: _ => setBox(Full(value))
case s: String => setFromString(s)
case Some(s: String) => setFromString(s)
case Full(s: String) => setFromString(s)
case null|None|Empty => setBox(defaultValueBox)
case f: Failure => setBox(f)
case o => setFromString(o.toString)
}

// parse String into a JObject
def setFromString(in: String): Box[JObjectType] = tryo(JsonParser.parse(in)) match {
case Full(jv: JValue) => setFromJValue(jv)
case f: Failure => setBox(f)
case other => setBox(Failure("Error parsing String into a JValue: "+in))
}

def asJs = asJValue match {
case JNothing => JsNull
case jv => JsRaw(Printer.compact(render(jv)))
}
}

The custom JsonObjectField class is composed from traits from the Record framework, and the methods defined are responsible for parsing the formatting the JSON value to and from various formats (Strings, HTML, Javascript). It uses the JsonObject trait and the JsonObjectMeta class to parse the JSON document to and from Scala objects.
We can now a use a Scala class that extends JsonObject that will hold the address details, with the companion JsonObjectMeta object:
Listing 9: Modelling Address as JsonObjectField
case class Address(line1: String, line2: String, line3: String, postcode: String, city: String, country: String) extends JsonObject[Address] {
def meta = Address
}
object Address extends JsonObjectMeta[Address]

The User class will now have a address as a JsonObjectField field
class User extends CouchRecord[User] with Address[User] {
def meta = User

object username extends StringField(this, 20)
object fullName extends StringField(this,100)
object birthDate extends DateTimeField(this)

object homeAddress extends JsonObjectField(this, Address) {
def defaultValue = null.asInstanceOf[Address]
}

}
object User extends User with CouchMetaRecord[User] {}

The JSON representation of the persisted User will now look like this:
Listing 10: JSON representation of inline Couch document representing user
{
"_id": "706a63a5ae0adc3325347c8a1d039682",
"_rev": "1-ee182844a6d03be7c547d0b57d5363f9",
"birthDate": "Wed, 7 Sep 2011 16:47:31 UTC",
"fullName": "Bob Bobic",
"homeAddress": {
"line1": "The Whitehall",
"line2": "220 Yew Tree Road",
"line3": "",
"postcode": "WC1 4EQ",
"city": "London",
"country": "UK"
},
"type": "User",
"username": "mrbob"
}

The User and Address are now modelled as separate objects in CouchRecord, but they are storred together as one CouchDB document. The single CouchDB document represent the relationship between the user and addresses (user owns address), and the data is contextually split within the document - by using the inline JSON representation of the address field.
This required some additional work, due to missing support in CouchRecord framework - but it gave us probably the best representation of associated objects in both Lift and CouchB.

Each of the approaches for mapping object associations on CouchDB and Lift that I described in this post can be best options in different scenarios. Because of the specifics of the the user/address object model, the inline JSON document solution was the best approach for my requirements. The support for the JSON object fields in the CouchRecord framework is missing, so I had to implement it myself - which I eventually succeeded with, after few hurdles.
Hopefully this post can help you model associations persisted in any document database in your code. And if you're using CouchDB and Scala/Lift, you can save yourself some time and learn from my experience.

Thursday, 8 September 2011

Starting with Lift and CouchDB: Resolving Some CouchDB-Record Inconsistencies

In recent months I have been introduced to Lift web framework by my scala-enthusiastic colleagues @ OpenCredo. The scala expressiveness and conciseness along with view-first web approach looked very interesting from the start, especially since great part of my previous experience comes from verbose Java frameworks with traditional MVC architecture.
After initial interest and research I did my first project, using a relational database and the well-known Lift-Mapper framework. Much of the data we needed to store represented documents, so I started experimenting with CouchDB. While the REST interface to Documents and Views in Couch does not require a rich mapping layer (as opposed to the richness  of SQL ORM frameworks), I still found it cumbersome to work directly with HTTP all the time in order to store/query my model objects to/from CouchDB.
One of the nice features of Lift's Record framework is it's agnostic approach to the underlying persistence mechanism, which means that it is relatively easy to plug in any storage solution. Currently the Lift-Record framework has a Record implementation for relational databases (Squeryl), as well as  Mongo-Record and CouchDB-Record implementations which gained prominence with recent increased interest in NoSQL storage. While the CouchDB-Record implementation is not complete, it has enough features to make it useful when working with CouchDB and Lift.  Using CouchDB-Record I could easily map my domain model to the underlying database, and abstract the HTTP layer when I wanted to. The CouchDB-Record implementation did not offer everything that you might expect from a typical O(R)M framework - specifically there are some missing features such as associations mapping, lazy loading, caching. However these features weren't required for this specific project, and some would argue that those are not even a features that a non-relational storage should use (there is no R in ORM if using CouchDB).
First thing to do when using the CouchDB with lift is to add database configuration to the Lift's Boot class. To do this, you need only three lines of code, as Listing 1 illustrates
Listing 1:
val couch = new Database("127.0.0.1",5984, "lift-test")
couch.createIfNotCreated(new Http())
CouchDB.defaultDatabase = couch

To map a model object to CouchDB persistence, you'll need to extend CouchRecord class which is part of the couchdb-record framework. In addition, you define its companion object with the same name, which mixes CouchMetaRecord trait.
Listing 2 shows the sample User class mapped to the CouchDB database using Lift's Record framework
Listing 2:
import net.liftweb.record.field.{DateTimeField, PasswordField, StringField}
import net.liftweb.couchdb.CouchRecord
import com.opencredo.nicheplatform.couchdb.CouchMetaRecord
class User extends CouchRecord[User] {
  def meta = User

  object username extends StringField(this, 20)
  object fullName extends StringField(this, 100)
  object birthDate extends DateTimeField(this)
}
object User extends User with CouchMetaRecord[User] {
}

We demonstrated 2 different field types here: StringField and DateTimeField. For a full list of supported filed types in couchdb-record implementation, you can check the online documentation at http://exploring.liftweb.net/master/index-8.html.
Ok, let's persist a user now. To check that the user is actually peristed we're going to write a test, as illustrated in Listing 3:
Listing 3:
@RunWith(classOf[JUnitRunner])
class UserTest extends FunSuite with LazyLoggable {
  // reset database and boot lift
  if (!Boot.initialized_?) {
    try{
      Http(new Database("127.0.0.1", 5984, "lift-test").delete)
    }catch {
      case e : Exception => Console.print(e)
    }
    val boot: Boot = new Boot()
    boot.boot
  }

 test("Create a new User") {
   val userRecord = User.createRecord #1
   userRecord.username.set("mrbob") #2
   userRecord.fullName.set("Bob Bobic") #3
   val user: Box[User] = userRecord.save #4
   assert(user.isDefined, user.compoundFailMsg("Failure")) #5
 }
}


To create a persistable couchdb-backed document, we simple call the createRecord fuction on our user object (#1). After we set the properties we require (#2, #3), we simply save the created object (#4).
The save function persists the object (by issuing a POST request to the CouchDb server) and returns a Box. The Box will be full if the operation succeeded, in which case the Box will contain the peristed object - with assigned id and rev fields from the CouchDb server.
If the POST operation fails to create the new document the box will be empty, and will contain the Failure message (#5)

The code looks nice, so we run the test, excited about our first couch db persisted object - but the test FAILS miserably!!! What could be wrong.
The test fails at the assertion point (so at least we did the configuration right), and the message is as follows:
Failure(User not defined,Empty,Full(Failure(ok not present in reply or not true: JObject(List(JField(ok,JBool(true)), JField(id,JString(706a63a5ae0adc3325347c8a1d020160)), JField(rev,JString(1-8eaacb2ca383a965037251676b25dbd6)))),Empty,Empty)))

What does this mean? "ok not present in reply or not true", but the JSON object referenced in the rest of the message clearly has the OK message (JField(ok,JBool(true))?
I found this confusing.
What was even more confusing was that the object was actually saved to CouchDB successfully, as I could find it using the Futon web interface.
The fact that the object was actually persisted, and the error message suggested that there was a bug in response parsing code within Lift. I wasn't that familiar with the Lift code base, so after numerous tries,
I found the thread that suggested someone else was having a similar problem (http://groups.google.com/group/liftweb/browse_thread/thread/15d17b9348d4d88e#)
What was the problem? Unfortunately, CouchDB Record implementation didn't follow the work done on other parts of lift framework. The culprit is the following code in CouchDB Record implementation (net.liftweb.couchdb.Database.scala file):
Listing 4:
private[couchdb] object DatabaseHelpers {
  /** Handles the JSON result of an update action by parsing out id and rev, updating the given original object with the values and returning it */
  def handleUpdateResult(original: JObject)(json: JValue): Box[JObject] =
    for {
      obj <- Full(json).asA[JObject] ?~ ("update result is not a JObject: " + json)
      ok  <- Full(json \ "ok" ).asA[JField].map(_.value).asA[JBool].filter(_.value) ?~ ("ok not present in reply or not true: "+json)
      id  <- Full(json \ "id" ).asA[JField].map(_.value).asA[JString].map(_.s)    ?~ ("id not present or not a string: " + json)
      rev <- Full(json \ "rev").asA[JField].map(_.value).asA[JString].map(_.s)    ?~ ("rev not present or not a string: " + json)
    } yield updateIdAndRev(original, id, rev)
}

The argument json: JValue used to be an instance of JField, which the code above correctly maps to concrete JBool/JString instance in order to check the response of CouchDB update operation.
However, some changes to Json parsing since liftweb 2.2 improved this, so that \ operator on the JValue object returns the correct JBool/JString objects directly. This method is called whenever the document is created/updated in CouchDB.
The correct code should look like this:
Listing 5:
private[couchdb] object DatabaseHelpers {
  /** Handles the JSON result of an update action by parsing out id and rev, updating the given original object with the values and returning it */
  /** This is a fix as the original code still depends on old JSON library -- Aleksa**/
  def handleUpdateResult(original: JObject)(json: JValue): Box[JObject] =
    for {
      obj <- Full(json).asA[JObject] ?~ ("update result is not a JObject: " + json)
      ok  <- Full(json \ "ok" ).asA[JBool].filter(_.value) ?~ ("ok not present in reply or not true: "+json)
      id  <- Full(json \ "id" ).asA[JString].map(_.s)    ?~ ("id not present or not a string: " + json)
      rev <- Full(json \ "rev").asA[JString].map(_.s)    ?~ ("rev not present or not a string: " + json)
    } yield updateIdAndRev(original, id, rev)
}

Unfortunately, this change hasn't propagated to CouchDB Record code, and at the time of the writing, there is an open ticket for this bug (http://www.assembla.com/spaces/liftweb/tickets/961-saving-to-couchdb-doesn-t-fetch-the-document--id--property)
Since DatabaseHelpers is the private scala object, it cannot be extended - so you have 2 options:
1. Checkout liftweb, fix the source, build it and use the custom built artefacts in your project. Of course, you'll lose ability to follow future releases of liftweb until the bug is fixed - but hopefully that won't be for too long
2. Override the methods that depend on it, by injecting the correct implementation.
You can easily overwrite the post method on the Database class, which is used by CouchRecord, which all of your CouchDB-mapped objects extend:
Listing 6:
val couch = new Database("127.0.0.1",5984, "lift-test") {
      override def post(doc: JObject): Handler[Box[JObject]] = {
        (JSONRequest(this) <<# doc) ># handleUpdateResultCorrected(doc) _
      }
      def handleUpdateResultCorrecred(original: JObject)(json: JValue): Box[JObject] = {
        //bug-free implementation, as above
      }
}
couch.createIfNotCreated(new Http())
CouchDB.defaultDatabase = couch

This solves the problem of storing new documents to CouchDB - however updates are not handled by this overridden method.
Updates are defined in the Document trait buried within the Database.scala, and this trait is then extend and used for composition quite a few times - not easy to follow, let alone to figure out the clean way to overrirde its behaviour.
You could argue that it is not a significant problem for updates, as you know the document's id and revision values up front, you do not need to confirm them from the response when it succeeds. However, you still have incorrect behaviour, and that is something I cannot live with :)
Rather then spending too much time figuring out how to cleanly override the post method in Document trait, I went down the hard route, checked out the code and fixed the bug, and from there on I used my own custom build lift-couchdb_2.9.0-2.4M3-Aleksa.jar

Now that we have basic tests passing, let's see how we can load the persisted object from CouchDB using its id.
We will continue working on the same test:
Listing 7:
test("Create a new User") {
   val userRecord = User.createRecord
   userRecord.username.set("mrbob")
   userRecord.fullName.set("Bob Bobic")
   val user: Box[User] = userRecord.save

   assert(user.isDefined, user.compoundFailMsg("User not defined"))

   expect("mrbob"){
      user.open_!.username.get
   }
   assert(!user.open_!.id.get.get.isEmpty) #1
   assert(!user.open_!.rev.get.get.isEmpty) #2

   val loadedUser: Box[User] = User.fetch(user.open_!.id.get.get) #3
   assert(loadedUser.isDefined, loadedUser.compoundFailMsg("User not defined"))
   expect("mrbob"){
      loadedUser.open_!.username.get
   }
   expect("Bob Bobic"){
      loadedUser.open_!.fullName.get
   }
   expect(user.open_!.rev.get){
      loadedUser.open_!.rev.get
   }
}


First, we check that the ide and revision fields are now populated as expected (#1, #2).
And the we load the user using the existing id using the fetch method (#3) - it's that simple!
The assertions that follow prove that we loaded all user properties, and that we loaded the document with the latest (expected) revision.
NOTE: Because the CouchRecord's id field is optional, the get function returns the Option object, so we need another get to get to the actual id.

You can use the save function to update the existing record as well. The document to be updated must have id and rev fields set.
Listing 8:
val u2: User = loadedUser.open_!
u2.username.set("bobby")
val updatedUser: Box[User] = u2.save
expect("bobby") {
  loadedUser.open_!.username.get
}
expect(loadedUser.open_!.id.get) {
  updatedUser.open_!.id.get
}


And that's it. Even with few challenges from the Lift's CouchDb implementation, the persistence and fetching of Scala objects to/from CouchDB worked nicelly, using Lift's Record framework.

I will be talking about my experiences on Lift/Scala-CouchDB integration at the NoSQL Exchange at SkillsMatter in London on November 2nd 2011.


Friday, 2 September 2011

NoSQL Exchange - SkillsMatter, London


This November SkillsMatter are organizing NoSQL Exchange conference in London.
Full day of in-depth talks from the industry experts discussing latest developments in most popular NoSQL storage solutions, and community members sharing their practical NoSQL experiences.
Topics cover all main non-relational storage solutions - Neo4j, Mongo, Riak, Couch, Cassandra and polyglot persistence for mixing everything together.
I will be delivering talk about recent experiences in migrating web-based application written in Scala/Lift from RDMS to CouchDB, using Lift's Record framework.
WHERE: The Skills Matter eXchange, London, EC1V 7DP 
WHEN: Nov 2 2011 , 9:30AM