Continuing on in my series on MongoDB and Python, this article delves into the object-document mapper (ODM) included with the Python MongoDB toolkit Ming. If you're just getting started with MongoDB, or with Ming, you might want to read the previous articles in the series first:
- Getting Started with MongoDB and Python
- Moving Along With PyMongo
- GridFS: The MongoDB Filesystem
- Aggregation in MongoDB (Part 1)
- MongoDB's New Aggregation Framework
- Schema Maintenance with Ming and MongoDB
- Declarative Schemas for MongoDB in Python Using Ming
And now that you're all caught up, let's take a look at what makes up an object document mapper, anyway...
But wait! Didn't I already describe an object-document mapper in previous Ming
posts? Well, yes and no. While it's true that the 'base' level of Ming allows you
to map documents from MongoDB into instances of Python classes, those classes are
really just slightly-glorified dict
. Some of the things that I expect in a
real ODM are missing, however:
- Automatic persistence of updated documents - If you change an instance of
ming.Document
, it's up to you to make sure that change gets persisted back to MongoDB. This leads to more verbose code and generally more errors, as it's easy to forget to.m.save()
a document when you're done. - An Identity Map - If you're working with several documents at once, you can get into a situation where you have two documents in memory that both represent the same document in MongoDB. This can cause consistency problems, particularly if the documents are both modified, but in different ways.
- A Unit of Work - When you're doing several updates, it's nice to be able to "batch up" the updates and flush them to MongoDB all at once, particularly in a NoSQL database like MongoDB which has no multi-document transactions. You don't get true transactional behavior with a unit of work, but you can at least choose to skip the flush step and the database doesn't change.
- Support for Relationships Between Documents - Maybe this is just my relational background showing through, but I like to be able to construct object graphs in RAM that aren't necessarily represented by embedded documents in MongoDB.
So, continuing in Ming's homage to SQLAlchemy, there is a second
layer to Ming that provides all these features: the ming.odm
package.
Defining an ODM model
Since everyone seems to like building either a blog or a wiki for their examples, we'll take a similar approach here, but with a twist. Suppose we're building a blogging platform that supports multiple sites, each of which can contain multiple blogs. We might model our data as follows:
Note in particular the 1:N relationships between Site
and Blog
, and between
Blog
and Post
. Here's an example set of documents that we might store:
// Site { _id: ObjectId(1234....1), domain: 'blogs-r-us.com' } // Blog { _id: ObjectId(1234...2), site_id: ObjectId(1234...1), name: 'My Awesome Blog' } // Post { _id: ObjectId(1234...3), blog_id: ObjectId(1234...2), title: 'Frsit Psot', text: 'This is my blog. There are many...' }
Note that we've modeled everything "relationally"; a Post
has a "foreign key"
into Blog
, which has a "foreign key" into Site
. In order to actually model
this in the Ming ORM, we first need a bit of setup:
from ming import Session from ming.odm import ODMSession doc_session = Session() odm_session = ODMSession(doc_session=doc_session)
Because the ODM is doing so much more than the schema enforcement layer, we need
a new session
to tie things together. The ODMSession
provide the identity map
and unit of work mentioned above, as well as a connection to the regular
Session
(which in turn ties everything to a particular MongoDB database).
Once we have this set up, we can start defining our model. Let's look at the
definition of a Site
first:
from ming import schema as S from ming.odm.declarative import MappedClass from ming.odm.property import FieldProperty, RelationProperty class Site(MappedClass): class __mongometa__: session = odm_session name = 'site' _id = FieldProperty(S.ObjectId) domain = FieldProperty(str) blogs = RelationProperty('Blog')
There are a couple of things to note here:
- The
Field
from the schema level is replaced by aFieldProperty
for ODM-level things. - The one-to-many relation between
Site
andBlog
is represented by aRelationProperty
. Since we haven't definedBlog
yet, we'll use a string to referenceBlog
rather than the class itself. There's some magic involved in figuring out what to do withRelationProperty
s; there will be more on that later.
Moving along, we can define Blog
as follows:
from ming.odm.property import ForeignIdProperty class Blog(MappedClass): class __mongometa__: session = odm_session name = 'blog' _id = FieldProperty(S.ObjectId) name = FieldProperty(str) site_id = ForeignIdProperty(Site) site = RelationProperty(Site) posts = RelationProperty('Post')
Here, we've introduced the ForeignIdProperty
to represent our "foreign key"
construct. Ming actually uses declared ForeignIdProperty
s to guess what to do
with RelationProperty
s. A couple of things to note here:
- We can use the
Site
class rather than the string"Site"
sinceSite
has been declared. - We don't need to specify a schema for the
site_id
field. Because it is aForeignIdProperty
, Ming knows to use the same validation for it that it uses forSite._id
.
To finish out our model, the Post
class looks similar:
class Post(MappedClass): class __mongometa__: session = odm_session name = 'blog' _id = FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) blog_id = ForeignIdProperty(Blog) blog = RelationProperty(Blog)
And we're done!
Using the Model to Manipulate Data
Once everything's defined, we can create some data as follows:
>>> import blog as B >>> from ming.datastore import DataStore >>> B.doc_session.bind = DataStore( ... 'mongodb://localhost:27017', ... database='blog') >>> site = B.Site(domain='blogs-r-us.com') >>> blog = B.Blog(name='My Awesome Blog', site=site) >>> post = B.Post(title='Frsit Psot', text='This is my blog.', blog=blog) >>> print blog <Blog _id=ObjectId(...) site_id=ObjectId(...) name='My Awesome Blog'>
Notice how Ming has helpfully filled in the ForeignIdProperty
values based on
passing object into the constructor. Now let's look at the database:
>>> list(B.doc_session.db.blog.find()) []
Recall that one of the features of the ODM is the unit of work. To flush all our
changes to the database, we need to explicitly tell the ODMSession
to do
so. Before we do, let's take a look at it:
>>> print B.odm_session <session> <UnitOfWork> <new> <Site _id=ObjectId(...) domain='blogs-r-us.com'> <Blog _id=ObjectId(...) site_id=ObjectId(...) name='My Awesome Blog'> <Post text='This is my blog.' blog_id=ObjectId(...) _id=ObjectId(...) title='Frsit Psot'> <clean> <dirty> <deleted> <imap (3)> Site : ... => <Site _id=ObjectId(...) domain='blogs-r-us.com'> Blog : ... => <Blog _id=ObjectId(...) site_id=ObjectId(...) name='My Awesome Blog'> Post : ... => <Post text='This is my blog.' blog_id=ObjectId(...) _id=ObjectId(...) title='Frsit Psot'>
There are a couple of things to note here:
- The
ODMSession
is tracking the newSite
,Blog
, andPost
objects we've created in its unit of work. Since each of these objects are in thenew
state, they will beinsert()
ed when then unit of work isflush()
ed. - The
ODMSession
is also tracking the objects in its identity map. The purpose of the identiy map is to make sure that if you perform two queries from MongoDB that return the same document, they will be represented by the same object in memory. More on this later.
Now let's go ahead an flush()
and take a look at the database:
>>> B.odm_session.flush() >>> list(B.doc_session.db.blog.find()) [{u'_id': ObjectId(...), u'site_id': ObjectId(...), u'name': u'My Awesome Blog'}]
And we see that the blog has been stored back into MongoDB. We can look at the
ODMSession
once again and see that all the objects have now moved into the
clean
state:
>>> print B.odm_session <session> <UnitOfWork> <new> <clean> <Site _id... <Blog _id... <Post text... <dirty> <deleted> <imap (3)>...
Now, let's try modifying the post title and looking at the session again:
>>> post.title = 'First Post' >>> print B.odm_session <session> <UnitOfWork> <new> <clean> <Site _id=... <Blog _id=... <dirty> <Post text... <deleted> <imap (3)>...
Note how the Post
is now dirty
. On the next flush()
, it will be .save()
d
back to MongoDB:
>>> B.odm_session.flush() >>> list(B.doc_session.db.post.find()) [{u'text': u'This is my blog.', u'blog_id': ObjectId(...), u'_id': Obj ectId(...), u'title': u'First Post'}] >>> print B.odm_session <session> <UnitOfWork> <new> <clean> <Site _id=... <Blog _id=... <Post text=... <dirty> <deleted> <imap (3)>...
... and it's clean again. Similarly we can .delete()
an object
(e.g. post.delete()
) to mark it as deleted
, causing it to be remove()
d on
the next flush()
.
Querying using the ODM
Putting data into the database is all well and good, but to be useful we should
be able to retrieve it is well. Doing so requires the use of the .query
property of our classes, which serves the same purpose as the .m
property in
"base" Ming, or the .objects
property in Django. First, let's clear out the ODM
session and retrieve all the posts:
>>> posts = B.Post.query.find().all() >>> posts [<Post text=u'This is my blog.' blog_id=ObjectId(...) _id=ObjectId(...) title=u'First Post'>] >>> B.odm_session <session> <UnitOfWork> <new> <clean> <Post text=... <dirty> <deleted> <imap (1)> Post : ...
Now let's query again to get another posts
list to see the identity map in action:
>>> posts1 = B.Post.query.find().all() >>> posts[0] <Post text=u'This is my blog.' blog_id=ObjectId(...) _id=ObjectId(...) title=u'First Post'> >>> posts1[0] <Post text=u'This is my blog.' blog_id=ObjectId(...) _id=ObjectId(...) title=u'First Post'> >>> posts[0] is posts1[0]
Interesting - if we perform a query that returns the same document, we get back the same Python object. This is important for maintaining consistency in cases where you might arrive at the same object via two different query paths. The precise guarantee that the identity map provides is as follows:
Any two references to an instance of the same class in the same session with the same `_id` value are the *same instance*.
Now let's see what Blog
this post is a part of:
>>> post.blog <Blog _id=ObjectId(...) site_id=ObjectId(...) name=u'My Awesome Blog'> >>> print B.odm_session <session> <UnitOfWork> <new> <clean> <Post text... <Blog _id... <dirty> <deleted> <imap (2)>...
Behind the scenes, Ming has queried the database to retrieve the Blog
whose
_id
value matches the blog_id
ForeignIdProperty
value from the
Post
. Likewise, the blog's posts
property contains a list including the post
we created:
>>> blog = post.blog >>> blog.posts I[<Post text=u'This is my blog.' blog_id=ObjectId(...) _id=ObjectId(...) title=u'First Post'>] >>> blog.posts[0] is post True
Automating the Session
So once of the things that I wanted in an ODM was automatic persistence of
documents. In all the above examples, we actually have to manually call
flush()
each time we want to persist changes. To rectify this, at least in the
context of web applications, Ming provides WSGI
middleware that will flush at the end of each web request if
everything went through fine, or clear the session if there is an error. To use
the middleware, simply wrap your WSGI application in MingMiddleware
:
from ming.odm.middleware import MingMiddleware app = some_application_factory() app = MingMiddleware(app)
To have everything work well, you should also use a ThreadLocalODMSession
provided by Ming when defining your classes. The signature is the same as the
ODMSession
:
from ming import Session from ming.odm import ThreadLocalODMSession doc_session = Session() odm_session = ThreadLocalODMSession(doc_session=doc_session)
By default, MingMiddleware
will clear the session and not flush if any
exception is raised during your request handling except for a
webob.HTTPRedirection
, since this is typically not considered an error. You can
modify this behavior by passing the flush_on_errors
keyword argument to the
MingMiddleware
constructor.
Feedback Encouraged
There are certainly still some rough edges around Ming's ODM, but it's still being actively developed and maintained, so we're always interested in feedback. So what do you think? Does the ODM add enough to Ming's functionality to convince you to try it out? Let me know in the comments below!
No comments:
Post a Comment