Failing with MongoDB

Update: Sorry this isn’t my best piece of writing and there seems to be some confusion. The dataset in question was first in a 1.8 master/slave pair and then migrated to sharded replica sets and 2.0.0.

For a bit of history of my dealings with MongoDB at Urban Airship, I gave a couple versions of a Scaling with MongoDB talk:

My coworker Adam Lowry even gave a follow-up talk of sorts at Postgres Open 2011 (slides) about migrating one of our datasets off of MongoDB and (back) on to PostgreSQL.

After reading through those slides you’re probably wondering why we’re still dealing with MongoDB at all. We fully intended to migrate our data out of it by now, but priorities change, deadlines slip, and we never expected one of our last uses of MongoDB to experience a surge in growth.

The dataset in question seemed ideal for MongoDB:

  • Ephemeral – if we lose it we experience service degradation for a short while, but nothing catastrophic
  • Small – easily fits into memory (~15 GB)
  • Secondary index – In a key/value store we would have had to manage a secondary index manually

So this dataset dodged a lot of the previous problems we had with MongoDB and seemed safe to migrate at our leisure.

Global Write Lock

MongoDB has a global write lock. This means that while applying an insert or update, a single mongod instance can’t respond to other queries.

Our dataset may be small but it has a heavy read and write load. When the service it backed experienced a surge in usage, MongoDB quickly became CPU bound. This was especially frustrating considering mongod was running in a simple master/slave setup on two servers: each with 16 cores and enough memory to hold all the data a few times over again.

Because of the global write lock and heavy write load, operations are effectively serialized and executed on a single core. Meaning our servers didn’t even look loaded, as just 1 core would be 100% utilized by mongod.

Let the Sharding Begin

So we need to utilize multiple cores…
To do that we need multiple write locks…
There’s 1 write lock per mongod. So…
…multiple mongods per server?

We’d been avoiding sharding after having no luck getting it working in the 1.5.x dev series, but it’s our only choice now to get multiple mongods. I ran some tests and it seemed like we could turn our master/slave setup into a 2 shard setup with 2 mongods and 1 arbiter per shard with downtime in the seconds or low minutes.

The operational complexity of configuring a MongoDB cluster is daunting with each component bringing its own caveats:

mongod config servers

  • You need exactly 3 config mongods (1 is fine for testing, which makes things appear simpler than they really are).
  • There are lots of caveats with the config servers, so read Changing Config Servers carefully before configuring your cluster.
  • Otherwise these mongod instances are fairly blackboxish to me. Despite being mongod processes you administer them completely differently.

mongos routers

  • 1 per app server. This wouldn’t be a big deal except that our mongoses often start failing and require flushRouterConfig to be run on them. 2.0.1 supposedly fixes this, but we haven’t tested that yet (and trading known problems for new unknown ones is always scary).
  • mongos instances can use a lot of CPU and seem to have random spikes where they fully utilize every core very briefly. Keep this in mind if your application servers are already CPU bound.
  • On the bright side mongos balanced our data rather quickly. Our shard key is a uuid, and it properly setup reasonable ranges in very short order without having to preconfigure them.
  • “mongos” is a terribly confusing name. It sounds like multiple mongo instances. We’ve taken to calling them mongooses internally due to frequent typos and confusion.

arbiters

  • You need at least 3 members in a replica set in order to complete an election if 1 member goes down.
  • We haven’t had any issues with arbiters… not sure what we’d do if one broke somehow but since they have no persistent data they’re safe to restart at any time.

mongods

  • Early on we ran into a problem where changing replica set member entries wasn’t propagated to the config servers’ shard configuration. Restarting every mongos fixed it.
  • As far as I can tell a new replica set member will never leave the initial RECOVERING state until all operations to that set are stopped. Even 40 updates per second was enough of a trickle to prevent a new set member from leaving RECOVERING to becoming a SECONDARY. We had to shutdown mongoses to cut off all traffic to bring up a new member. (The replication log gave every indication of being caught up and our usual update load is thousands per second.)
  • Setting rest in the config file doesn’t seem to work. Put –rest in your command line options.
  • Sending an HTTP request to a mongod’s main port (instead of the HTTP endpoint) seems to be capable of crashing the mongod.

Client Drivers

While a single replica set member was in a RECOVERING state our Java services couldn’t complete any operations while our Python service was happily working away.

Summary

Right now we’re getting by with 2 shards on 2 dedicated servers and then mongoses and config servers spread throughout other servers. There appears to be some data loss occurring, though due to the ephemeral fast changing nature of this dataset it’s very difficult to determine definitively or reproduce independently.

So we’re trying to migrate off of MongoDB to a custom service better suited for this dataset ASAP.

This entry was posted in Open Source, Technology and tagged . Bookmark the permalink.
  • Ben Sabrin

    How about creating a ticket on the public forums and people from 10gen will look at your issues. feel free to email me directly if you want. There is no reason MongoDB should can’t solve the problem you have.

  • Eliot

    Some comments:
    – If you think there is data loss, can you please send some more details to me? That would be very strange and we would want to understand asap.
    – The mongos name is a bit unfortunate. We may try to change to mongor if we can do it without driving too many people crazy.
    – The RECOVERING issue was found and fixed, see https://jira.mongodb.org/browse/SERVER-3899
    – I cannot reproduce crashing a mongod by sending http requests to the main port. In fact we special case this a long time ago so you actually get an http response.

  • Sean McQuillan

    As I’m sure 10gen is going to reach out to you shortly to do damage control, can you please post a followup post documenting your experience interacting with 10gen?

    I’m evaluating adopting mongodb, but at the moment 10gen is an unknown entity to me and also the only way to get bugs fixed in mongodb. I would love to know more about their reaction to these issues that you have appropriately raised for the community to evaluate.

    Thanks,
    Sean

  • http://michael.susens-schurter.com/blog/ Michael Schurter

    @Eliot: Thanks for the fast reply! Glad to see the RECOVERING issue is fixed. Despite happily using JIRA at Urban Airship as well, I still find it nearly impossible to dig up relevant tickets.

    Here’s the backtrace from when the mongod crashed due to monitoring: http://paste.pocoo.org/show/503530/

    We were evaluating this tool:
    https://chrome.google.com/webstore/detail/apgglicbkgjcfnohdcgbcobengnkcjef

    It was working, but at some point while using it (perhaps after typing an incorrect port number? that was our best guess) that backtrace happened.

    It could be pure coincidence that it crashed while attempting to monitor it, but considering this is the first production mongod I’ve ever crashed, I find that hard to believe.

  • Eliot

    Do you have more of the log from the crash?
    There was a race in the http layer itself that’s fixed in 2.0.1. The tool uses that heavily, so that’s probably the culprit: https://jira.mongodb.org/browse/SERVER-3907

  • ENC

    It sounded like you were using MongoDB < 1.5. If that is true, have you tried upgrading to 1.8.x or 2.0?

  • http://michael.susens-schurter.com/blog/ Michael Schurter

    @ENC: Added an update. tl;dr Everything in this post related to replica sets and sharding uses 2.0.0.

  • Pingback: Failing with MongoDB | Discuss the World !

  • Ryan

    @michael – What does your schema looks like and what are the common operations you are performing. I have found that in high volume systems, tuning your collection schema can make an order of magnitude difference in system performance.

  • Parth

    @Michael Schurter,

    Please do update this thread if you continued with Mongo or moved to PostgreSQL. Your issue looks genuine and we might face the same issue. So before going all-in into Mongo I’d love to know how you resolved it. And thanks for posting this..

  • Eric

    Why does it seem like your only options are Postgres or Mongo? Why not move to something like Riak or Cassandra where you at least won’t lose data or have to worry about sharding?

  • Andreas

    Eric: With a 15 GB database they should normally not have to worry about sharding with any database engine. It is only the R/W lock that forced them to use it for MongoDB.

  • Howard

    Have you considered using MySQL?

    We have been using MySQL for storing scheme-less data like http://bret.appspot.com/entry/how-friendfeed-uses-mysql.

    We have around 80GB data, and InnoDB is superb under heavy read/write workload (not to mention it is ACID compliant) and we have been using InnoDB for many years and never lost a single bit of data…

  • lenzai

    Is there any reliable NoSQl write that provide high throughput for write ?

    I am quite surprised to learn this global lock issue with mongodb write.

  • ypcxx

    @Howard: I coded a web spider that was pounding a MySQL instance with inserts and updates all day, and at the end of each day, MySQL simply crashed (actually it did not crash, it froze). This was on a good and fast hardware, and after rewriting the insert logic many times.
    Also, if you ever tried to design an ACID system, you know it’s a total waste of resources. Of course that’s why SQL DB’s don’t have _real_ ACID, but even so.
    MongoDB I can pound with thousands of incremental updates without locking and at the end of day, I get a consistent result.
    With MySQL, compared to the paid SQL db’s, as a developer you are going to hit many walls, that you will have work around. As an example – the inability to specify a variable for the LIMIT N clause (yeah this is fixed in the latest, so in 10 or 20 years a WordPress plugin developer /or similar/ can start using it).
    MongoDB is far from complete or ideal, but after my first project, I will choose it over MySQL any day. I guess if you want to suggest an SQL salvation, PostgreSQL is much better positioned than MySQL.

  • http://bdarfler.com Benjamin Darfler

    Did you experiment with multiple shards per physical host?

  • Pingback: Sysadmin Sunday 55 « Boxed Ice Blog

  • Pingback: MongoDB against the ropes « Cloud Comments .net

  • Pingback: You’re having noSQL problems, I feel bad for you son | lucisferre.net

  • http://www.discursive.com Tim O’Brien

    I run MongoDB in a production system that has heavy write activity with little read activity. I noticed significant performance problems until I tweaked the collections and added just the right indexes.

    I agree that replica set configuration is a bit confounding at first, but once I set up replica sets a few times it’s easy to configure.

  • Anonymous

    They’re not our only options. We also use Cassandra and HBase internally. For this dataset Mongo’s lack of durability was considered an optimization because of the ephemeral nature of the data. Also as I mentioned: being able to utilize secondary indexes without manually managing them was appealing. Cassandra and other k/v stores have added secondary indexes of a sort since we initially wrote this service.

  • Anonymous

    We have two indexes: _id and a secondary. We use _id extensively (and shard on it) and the secondary slightly less. The only further optimization we could do to our dataset (without drastically re-engineering our application) would be short key names.

  • Anonymous

    That is precisely what we did to utilize more cores per server. We’re at 2 mongods per server right now (1 mongod for each shard), but frankly could probably fit a 3rd or 4th. However, I think our alternate solution will be complete soon enough we’d rather not experiment further with MongoDB.

  • Anonymous

    I’m glad MongoDB is working for your use case.

  • Anonymous

    This is not a question with a simple easy answer. The usual suspects are: HBase, Cassandra, Riak, Redis, etc. Understand your workload as best you can, then pick the database best suited for that use case.

  • Anonymous

    While many organizations happily use MySQL as a NoSQL database, we have not pursued that avenue internally at this point. Currently our efforts have mostly focused on PostgreSQL, Cassandra, HBase, and, obviously, MongoDB (throw in some Redis, Memcached, and S3 as well).

    There’s only so much time you can spend experimenting with databases and migrating datasets and application code.

  • Anonymous

    Just wanted to point out this is precisely the case. That being said we’ve had issues with other distributed databases with a similar sized dataset. Distributed systems are hard.

  • Anonymous

    See the last line of my post: “So we’re trying to migrate off of MongoDB to a custom service better suited for this dataset ASAP.”

    I’m not involved in this development effort, so you probably won’t hear much more from me on the migration effort.

  • Anonymous

    Added an update. tl;dr Everything in this post related to replica sets and sharding uses 2.0.0.

  • Anonymous

    The schema is half a dozen simple fields with an index on _id and another field. No lists or sub-objects/docuemnts.

    The vast majority of our operations are to get or update a specific _id with periodic updates to the secondary index. There are no full collection scans or index misses.

    The entire dataset and indexes fit in memory. We have journaling and fsyncing turned off as replication is sufficient durability for this ephemeral dataset.

    The update load in conjunction with the global write lock just means we’re limited around 3k updates/s per core. Since the write lock is per mongod, that means we’ve had to shard just to take advantage of multiple cores on a single server.

  • http://www.discursive.com Tim O’Brien

    Right, so I certainly had a few tough weeks during which I felt like it was time to throw MongoDB into the East River with a pair of concrete shoes, but with patience and an effort to divine hidden meaning from the, at times, sparse documentation I was able to see it through to a working solution. The critical point not to be lost is that a lot of these features are just hitting the stable builds now. This stuff is newish, it isn’t like we’re using features that have been around for a few years.

    It would be interesting to hear about any interaction you have with 10gen going forward. I’ve concluded that MongoDB is a good solution for my particular application, but I’ve also come around to the idea that a support contract isn’t such a bad idea.

  • Anonymous

    If I have to “divine hidden meaning” from my database, it has failed me. This is a failure condition. Nothing is perfect, and I fully expect to have to do hard work to learn a system and use it properly. I feel like I have done these things and applied best practices. “Divining hidden meaning” is not something I ever want to have to do with a data system. It implies a level of unpredictability, complexity, and lack of documentation that’s unacceptable for my use case.

    I’m not advocating everyone throw MongoDB under a bus and migrate all of their apps and data to something else. I’m relaying my experiences in hopes of helping others make educated decisions. No one should have to “divine hidden meaning.”

    If the answer to my issues are “just wait until release N+1″, that’s great advice for future generations but doesn’t really do a thing to solve my current woes.

    Support contracts do seem like a great idea. I appreciate 10gen’s eagerness to help. That’s very commendable. I’m sure bringing them in earlier would have helped. That being said, I’m confident I’m applying best practices. This use case was not at all complex. If I’m wrong and it takes a support contract to adequately fulfill this straightforward use case, I don’t think this is a product I want to be using. YMMV

  • Pingback: Quora

  • Pingback: Someone Is Being Honest on the Internet? « Another Word For It

  • BarryR

    Are you using a custom _id type (ie: not the default ObjectID format that starts off with a timestamp)? If not, then sharding on _id will be incredibly bad for writes. Whenever you insert a new document (or update a somewhat new document) it will end up going to a single mongod, so that process will get all the write requests, producing a big bottleneck.

    In such a case, either adding a hash (like digit sum modulo, say, 100) on the ObjectID and indexing on that, or replacing the _id field with “hash”+”ObjectId” (assuming that you don’t use the time ordered nature of ObjectId’s) would work fine and distribute your writes across the cluster.

  • Anonymous

    Yes, we are using a custom _id value: a uuid4 encoded as a hex string. It’s random and therefore distributes evenly. We let mongos figure out the partitions and it seems quite evenly balanced.

  • Pingback: Know why you choose your tools | Aloha StartUps

  • Lakalaka

    Call me stupid, but: if your dataset is only 15GB (fits in a server’s ram), why all this fuss about mongodb? Why not stick it in a good old relational database like postgres?

  • Anonymous

    Because we did not need SQL, transactions, relations, a write ahead log, and many of the other features provided by relational databases. We thought we were picking the right tool for the job. Had we known then what we know now we would have picked a different solution, but I doubt it would ever be a relational database simply because it offers a completely different feature set than we required.

  • Pingback: Quora

  • Jonathan Ultis

    What type of storage subsystem were you using on the Mongo boxes?

  • Michael Schurter

    4x RAID 0+1 SAS drives for the data files, but I think you’re missing the point: our dataset fit in memory, we had syncing turned off, and we didn’t use journaling – the disks should never be touched.

  • Jonathan Ultis

    The linux VM flushes the dirty pages in RAM to disk. It starts out flushing lazily. Eventually, if you get too high a percentage of dirty pages, it will start blocking all writes trying to get the deferred IO onto disks – so you don’t lose hours of data in the case of a crash. Is it possible that your IO subsystem just couldn’t keep up with the sustained write rate? And, you didn’t raise your dirty_ratio limit as high as possible to make it act like a pure in-memory DB?

    http://www.westnet.com/~gsmith/content/linux-pdflush.htm

    Mongo definitely wouldn’t do a good job of putting back pressure on writes without blocking readers, if the IO subsystem is saturated and you hit the dirty_ratio limit.

  • http://blog.schmichael.com/ Michael Schurter

    Great tip. We didn’t adjust the dirty_ratio as far as I know, so that probably caused unneeded IO. That being said I don’t remember iowait being an issue; most of our issues were with the global write lock and complexities of sharding.

  • Pingback: Should you use MongoDB? - Jared Wray's Blog

  • gabrielsamfira

    Version 2.2 has removed the Global Lock.

  • Pingback: You're Having noSQL Problems, I Feel Bad for You Son - .NET Code Geeks

  • Beartamer

    it’s still a database level lock