During the second quarter of 2015, my team—one of the Platform teams within Engineering—updated the MongoDB dependency for notification-services as part of an effort to responsibly consolidate MongoDB replica sets across the organization.
In updating from MongoDB 2.4.6 to 3.0.2, we learned much about MongoDB and notification-services, and one particular lesson makes for a good mystery story. 1 I call this “The Case of the 297 Extra Documents.”
notification-services is an application responsible for providing notification data across ControlCenter. Notification documents are created in MongoDB by notification-services when certain resources in its REST API receive a POST request. While a notification document may be read repeatedly, it is rarely written to again. That is, the documents are practically immutable. 2
On each, there are properties indicating whether it should be counted as “unread” in the ControlCenter user interface and whether it has been distributed (via email or a smartphone push notification system).
Aside from automatically adding indexes, these are the only ways these documents are modified by notification-services: creating, reading, and updating (in a limited way)—never deleting.
Two servers run notification-services in production, and both read from and write to this collection of documents. So, with a new MongoDB 3.0.2 replica set standing at-the-ready, we prepared the following release plan for our application instances:
1. Update the application’s configurable bindings to reference the new MongoDB replica set.
2. Export the collections from MongoDB 2.4.6 and import them to MongoDB 3.0.2.
3. Bounce (stop-and-start) the notification-services instances to apply the bindings from step one.
4. Export the collections from MongoDB 2.4.6 and import them to MongoDB 3.0.2 again. 3
While the mongoexport command is capable of limiting the data exported (it can take a query parameter), we decided to simply export all of the data each time—even though, for the second export/import, most of that data would already be in MongoDB 3.0.2. Also, while the mongoimport command is capable of upserting—that is, updating documents as well as inserting new documents—we decided to only insert new documents.
Our first export/import went smoothly. At 9:50pm, we started exporting 1,038,547 documents from MongoDB 2.4.6. At 10:02pm, the export finished and we immediately began importing those documents to MongoDB 3.0.2. At 10:19pm, the import finished, and we had 1,038,547 documents in MongoDB 3.0.2.
In order to automatically create the indexes in MongoDB 3.0.2, we pointed one of our development machines at the MongoDB 3.0.2 replica set and posted a test notification. We confirmed the indexes were created and the document count increased to 1,038,548. We then shut down that local environment.
Although we had tested our export/import process in other environments, we hesitated on step three of our release plan. What if we missed something? We decided to run our export/import process one more time before restarting the servers.
And it was a good thing we did because we uncovered a mystery.
At 10:28pm, we started exporting 1,038,881 documents from MongoDB 2.4.6. At 10:37pm, the export finished, and we immediately began importing those documents to MongoDB 3.0.2.
Our expectation was that, after the the import finished, MongoDB 3.0.2 would have 1,038,882 documents—that is, everything exported from MongoDB 2.4.6 plus the one test notification we had posted.
After the import finished, MongoDB 3.0.2 had 1,039,179 documents—297 extra documents we could not explain.
Go on and take a moment to speculate, hypothesize. We certainly did.
Did the “simple” import process create duplicate documents? Maybe due to those documents which had been updated?
When we briefly connected our local environment to make the index, did we somehow write additional documents—perhaps from another, internal environment?
Did one of the notification-services instances somehow get the new bindings and start creating notifications? Was some other application writing to our new MongoDB replica set?
Just what kind of data was in those 297 documents anyway?
After some investigating, we determined the 297 documents were in the first export, but not the second export. They had been deleted from MongoDB 2.4.6 between 9:50pm and 10:28pm, but still existed in MongoDB 3.0.2. So now we had a new question: what caused their deletion?
If you have worked with MongoDB before, the answer might not surprise you, but it surprised us: the documents were deleted by a MongoDB index.
Unlike MySQL, which uses an Event Scheduler to remove rows which exceed their TTL, or “time-to-live,” an index in MongoDB can function as a TTL.
And, sure enough: our collection had an undocumented, manually-added index functioning as a TTL.
We had even seen the extra index in MongoDB 2.4.6 earlier, but—thinking indexes existed only to inform querying strategies and noting this particular index was not documented nor on a property that our application queried by—we assumed it was cruft. And, we did not want to blindly copy such cruft into our pristine new MongoDB 3.0.2.
You can read more about Mongo and TTLs at http://docs.mongodb.org/manual/tutorial/expire-data/.
We shared this story within our company to tell other engineers about the index-as-TTL and to help the next team that encounters a strange index in MongoDB or documents that seem to vanish without application code to explain it.
Now, we are sharing the story with you. If you enjoyed this behind-the-scenes tale, please let us know in the comments below.
- “Engineering Noir.” It’s a very niche genre. ↩
- “practically immutable”—so, yes: notification documents are mutable. However, I want to stress that, via notification-services application calls, there are significantly more reads than writes. ↩
- We planned to export/import twice because notification-services instances would continue writing to MongoDB 2.4.6 in the time period between the first export and the restart. The second export/import would move over any documents created in that narrow time-frame (testing indicated about 20 minutes). ↩
- We felt the more powerful tools, mongodump and mongorestore, would be overkill for handling our small (about 1.14 GB and simply structured) data set. ↩