April 24, 2005

...Learn TDD with Codemanship

Object-Relational Mapping Patterns

One aspect of software architecture that almost invariably comes up when we're building business applications is how we store our nice, clean objects in horrible, dirty relational databases. Relational technology has a stranglehold on business applications that isn't about to be relinquished any time soon, and with all the useful data-related tools (like reporting frameworks and data mining tools) focused on SQL databases, businesses are going to need their object data stored in relational tables.

While we usually end up having to map our objects to tables, it's odd that we always seem to have to faff about on every project figuring out how we're going to do it. Surely there's some standard pattern for object-relational mapping that we can apply "out of the box" without the need to constantly reinvent the wheel?

Well, yes there is, and this is a very brief guide on how I (and many others) do it.

Mapping classes to tables

The first thing we need to figure out is how we will store objects of one specific class. The pattern for this is very simple: just create a table for that class, and a column for every attribute that has a multiplicity equal to 0..1 (we'll talk about mapping collections later).



There are a couple of things we need to think about to make this simple mapping work. First of all, in object oriented programming, every instance has a unique identity denoted by its memory address or hashcode - so to reference an object we need only point to its location in memory (hence the term "object pointer"). To make an application work in memory, we don't need to worry about the identity of each object - it's all taken care of by the runtime. But when we're referencing objects in another application's memory space - or in a file - that won't work because we don't have access to object's internal memory addresses, and therefore their identities. We need a persistent identity that is unique throughout all the applications (and files) that object can be referenced in. So, we need to add an extra attribute to store the persistent object identity. In a relational database, the most obvious equivalent of object identity is a primary key, which is the column where the identifier is stored.

The second thing we must note is the correlation between classes and tables, and objects and rows (or tuples, if we're being scientific about it):

  • Class = Table
  • Object = Row
  • Identity = Primary Key



A quick note about object identities. In this example, I've been literal and mapped the identity shown in the object diagram to the primary key column in the table. I don't recommend doing this. It's much safer to use an artificially generated identifier value - like an incremented integer value or GUID - to be sure of uniqueness. Also, if you're concermned about performance, indexed integer columns will yield faster queries, espeically when used in SQL joins. But I digress...

Mapping Simple Associations To Foreign Keys

How do we store a reference to another object? The simple pattern for this is to add a foreign key column that stores the identifier to the referenced object:



Again, note that I've used the identity from the object diagram. You would probably prefer to use an artifically generated value to ensure uniqueness.

Note also that I've used the role name homeAddress as the name of the foreign key column - a good practice is to keep your database schema as close to the object schema as you can, so it's easier to see what the mapping is. It may also be helpful in automating some of the mapping process - perhaps using reflection.

If the muliplicity of homeAddress was 0..1, then we could allow nulls in that column. (Just as we can allow null references in our code). As it's exactly 1, we should not allow null values in the foreign key column.

But what about the actual code needed to store and retrieve these related objects? We'll look at object-relational persistence code in an upcoming Blog entry :-)

Mapping simple bi-directional associations is easy using the same pattern. You just add a foreign key column to the other table, too, so table Address would get a column called customer that stores the identifier of the Customer at the other end of the association.

Mapping Un-Ordered Collections

This is where it starts to get a bit more complicated. We can't store a collection of objects in a single column and row (well, we could - but it would be a lot of hassle). How do we map the relationship between Customer and Account?



We add a foreign key column to the table that will store the objects in the collection - even if the association is unidirectional towards that class. Note the same mapping would work for Customer[1] <-> Account
  • because we already have the foreign key value to navigate back from an Account to its Customer.

    Mapping Ordered Collections

    Often, we are concerned that objects in a collection appear in a specific order. One aspect of object-relational mapping that's often overlooked is that we need to be able to store and retrieve collections in the right order.



    By adding an extra column to store the position in which the object appears in the collection, we can easily retrieve the collection in the original order.

    Mapping Inheritence & Polymorphic Collections

    Apart from the different way in which SQL databases handle object identity, another way in which they differ is their lack of built-in support for inheritence and polymorphism. Again, this is easily mapped:



    We can create a column for every attribute in the inheritence heirarchy, meaning that some rows will have redundant columns - but that's fine because if we know which concrete type we're wroking with, we can ignore the values in those columns. We then add an extra column to store information about the type of object stored in each row, which we can use when retrieving objects to know which type of object we should instantiate, and which attributes that specific type should have.

    Another strategy for doing this is to have a single table for the highest-level class in the inheritence heirarchy, and then adjoining tables for each subtype. You can then use SQL joins to piece each object back together - though this adds extra complexity to the code needed to manage polymorphic collections - collections where different subtypes can appear in the same collection (as in this example).

    Mapping Qualified Associations

    In code terms, a qualified association may be implemented using a HashTable or Dictionary. To store these kids of collections in a SQL database, you simply need to add a column for the key value (eg, if a Library has a HashTable of Book objects, we might use the ISBN as the key). You will often find that the key value is actually the value of an attribute of the objects in the collection (eg, a Book is very likely to have an ISBN attribute already), so it may be unnecessary to physically add this key column.

    Mapping Association Classes

    An association class is simply mapped as a separate table which points to the objects at either end of the association (in SQL, this is similar to how many-to-many relationships are implemented using linking tables - sometimes called "resolvers").

    Combinations Of Mapping

    It's common to see these different mapping strategies combined to implement more sophisticated relationships (eg, an ordered bi-directional polymorphic association!!!) Experiment and you should soon get the hang of applying these simple mapping patterns in wierd and wonderful combinations.

    Coming Soon - Object-Relational Coding Patterns

  • Posted 14 years, 8 months ago on April 24, 2005