Just a little Python: Getting Started with MongoDB and Python

Wednesday, January 18, 2012

Getting Started with MongoDB and Python

If you've been following this blog for a while, you've seen me mention MongoDB more than once. One exciting thing for me is that I'll be co-teaching a tutorial at PyCon this year on Python and MongoDB that will cover MongoDB, PyMongo, and Ming. So to hopefully whet your appetite for learning more at the tutorial, I thought I'd write a few posts covering MongoDB, PyMongo, and Ming from a beginner's perspective.

What is MongoDB?

From MongoDB.org:

MongoDB (from "humongous") is a scalable, high-performance, open source NoSQL database.

Well, that's not all that enlightening, so I'll expand a bit here on MongoDB's features...

MongoDB is a document database

MongoDB is a document database, which means that instead of storing "rows" in "tables" like you do in a relational database, you store "documents" in "collections." Documents are basically JSON objects (technically BSON. This is to be distinguished from other NoSQL-type databases such as key-value stores (e.g. Tokyo Cabinet), column family stores (e.g. Cassandra) or column stores (e.g. MonetDB).

MongoDB has a flexible query language

This is one thing that makes MongoDB a pleasure to work with, particularly if you come from another NoSQL database where querying is either restrictive (key-value stores which can only be queried by key) or cumbersome (something like CouchDB that requires you to write a map-reduce query). MongoDB has a BSON-based query language that's a bit more restrictive than SQL, that you can still use to get a lot done.

Here's an example of a simple MongoDB query that we use at SourceForge to find all the blog posts for a project:

blog_post.find({'state':'published','app_config_id':{'$in':app_config_ids}})

There are also several other operators like '$lt', '$nin', '$not', and '$or' that allow you to construct quite complex queries, though you are somewhat restricted from what you can do in SQL (even with a single table).

MongoDB is fast and scalable

A single MongoDB node is able to comfortably serve 1000s of requests per second on cheap hardware. When you need to scale beyond that, you can use either replication (keeping several copies of the data on different servers) or sharding (partitioning the data across servers). MongoDB even includes logic to automatically load-balance your shards as your database and load increase.

Getting Started with MongoDB

While MongoDB is fairly straightforward to install on (64-bit) systems, there are also a couple of companies that provide a free tier of MongoDB hosting, MongoLab and MongoHQ that are great for getting started. I've been using, for no particular reason, MongoLab for my own things and I can recommend them, and it's what I have experience with, so that's what I'll cover here.

Let's assume you sign up for a MongoLab account. Once you've done this, you can create a database using their web-based control panel and click on it, you'll note the connection info at the top of the page:

(Your server name and port number may be different.) At this point, most tutorials would tell you to install and launch the 'mongo' command-line tool to begin exploring your database. We'll skip that here and use the python driver PyMongo directly. I like to use virtualenv myself and ipython, so that's the approach I'll take here:

$ virtualenv mongo
... install messages ...
$ source mongo/bin/activate
(mongo) $ pip install pymongo ipython
... install messages ...
(mongo) $ ipython
... banner message ...

Now that we're in ipython, we'll go ahead and connect to the database and create a document.

In [1]: import pymongo

In [2]: conn = pymongo.Connection('mongodb://tutorial-test:u3ZYh136@ds029187.mongolab.com:29187/tutorial-test')

In [3]: db = conn['tutorial-test']

In [4]: db.test_collection.insert({})
Out[4]: ObjectId('4f16f5c7eb03306a92000000')

In [5]: db.test_collection.find()
Out[5]: <pymongo.cursor.Cursor at 0x7fbb9006f350>

In [6]: list(db.test_collection.find())
Out[6]: 
[{u'_id': ObjectId('4f16f5c7eb03306a92000000')}]

Well, that's it for now. I'll be posting several followup articles in this series that will go into more detail on how to do various queries and updates using PyMongo, the MongoDB python driver, as well as how to effectively use Ming, so stay tuned!

11 comments:

Rick Copeland1:15 PM
FYI I deleted the user and DB I used in the post, so don't go trying any funny business ;-).
ReplyDelete
Replies
BAdjao B And B9:20 AM
Warning to others

blog_post.find({'state':'published','app_config_id':{'$in':app_config_ids}})

I just wasted an hour to find this syntax does not work.

I have a collection "articles" with a field "title" and one document with title
"Hadoop Development Environment OS X"

these work

temp = articles.find({"title": "Hadoop Development Environment OS X"});

temp = articles.find({"title":{"$in":["Hadoop Development Environment OS X"]}})

This returns nothing.
keys = ["Hadoop"]
temp = articles.find({"title":{"$in":keys}})

i.e all I can get back os a perfect match not a partial match.

Either I am misunderstanding, but I worked form an internet example the author said works fine, or there is a problem with the driver.

I am fairly confident I did not misunderstand the mongo docs.
ReplyDelete
Replies
BAdjao B And B1:20 PM
Ah, I worked out the "perfect match" just after posting this. My fault for trying to do things in a rush. Then I spent a while looking for how to use regexes.

that did the job and got me a bit further on.

Many Thanks.
ReplyDelete
Replies
Bob11:25 AM
Cool rundown, thanks Rick! In case anyone who is learning MongoDB finds it useful, I just launched a free tool called querymongo.com that translates MySQL syntax into MongoDB syntax. Hope someone can use it to get up to speed faster!
ReplyDelete
Replies
Anonymous10:27 PM
Sorry for such a real basic question here, but I'm trying to find some real world examples of how people actually get a pile of documents (physical documents like word docs or excel sheets) into a MongoDB collection. I've read a lot of articles that demonstrate how you can manually code information with JSON syntax using the doc ID, first name field and value, last name filed and value, etc. But, if I've got a folder full of say 10,000 word docs with customer info in each one and I want to be able to query that pile of docs and pull up say a result set that contains all customers from Iowa, how would I do that? How are all those documents parsed into JSON and then dumped into document objects into the collection? Is there some ETL type of program that does that? (and if so, what would it be?) I've googled like crazy trying to find an answer to that, but come up with zilch.
ReplyDelete
Replies
Anonymous1:21 PM
Rick, thanks for replying so quickly. Would you mind sharing, what method have you used in your real world projects to get the information into MongoDB? What physical form did the original info that you had to deal with come in and how did you dump it in the collection?
ReplyDelete
Replies

Add comment

Useful Resources

Interested in practical MongoDB programming?

MongoDB Applied Design Patterns
is available now, both in ebook and dead-tree form. In it, you'll see how to use MongoDB effectively in fields from real-time analytics to content management systems and more. The examples are all in Python, so readers of this blog should have no problem picking it all up.

Want to learn MongoDB using Python?

I just released an 84-page ebook MongoDB with Python and Ming to help you get started. In it, I cover everything from installing MongoDB for the first time, basic pymongo usage, MongoDB aggregation including MapReduce and the new aggregation framework, and GridFS. You'll also learn about Ming, the object-document mapper we built at SourceForge to accelerate our development beyond what we could do with PyMongo.

Want more personalized training?

I'm available for customized onsite Python and MongoDB training classes. You can sign up here for more information on this and other classes I'll be offering in the future including online and public training.

Just a little Python

Wednesday, January 18, 2012

Getting Started with MongoDB and Python

What is MongoDB?

MongoDB is a document database

MongoDB has a flexible query language

MongoDB is fast and scalable

Getting Started with MongoDB

11 comments:

Search

Useful Resources

Interested in practical MongoDB programming?

Want to learn MongoDB using Python?

Want more personalized training?

Pages

Rick's Resources

FeedBurner FeedCount

Email

Labels

Links

Blog Archive

Email

Popular Posts