When you should NOT use MongoDB?
August 9, 2022
This article is covering the potential problems you will face when using MongoDB for typical relational tasks.
TLDR: If you don’t know what database to choose, then choose a relational database (MySQL or PostgreSQL). In the majority of cases that would be the right decision, in the remaining few cases you can migrate to MongoDB later.
MongoDB is one of the most popular databases for Node JS developers and some developers, especially new starters, believe that if you choose Node JS then MongoDB is the default choice. And unfortunately, MongoDB is the worst choice for juniors, since starting with MongoDB is very easy, but as you proceed problems arise fast.
Common myths & misconceptions about MongoDB
Myth N1 — If you are working with Node JS, then you need to choose MongoDB. It is true that Mongoose is one of the best ORMs for Node JS, but Node also has very good relational ORMs as well.
Myth N2—MongoDB is faster than relational databases. This is one of the biggest mistakes; if you put all data in the same collection (table) in relational databases, such as MySQL, you may achieve the same level or even higher performance.
Myth N3 — Developing with MongoDB is faster since you don’t have a strict schema design. Again not a valid point, if you don’t design your schema and migrations appropriately then you will face real problems later and even more if you have production data. This is only true for rapid prototyping.
Myth N4 — MongoDB is free and open source whereas other relational databases are paid. MongoDB community server is completely free to use, however it has also an enterprise edition and there are some limitations on free usage if you modify source code or offer it as a service. At the same time, there are great free relational databases such as MariaDB and PostgreSQL.
If you are reading so far, you may think I am writing this post against MongoDB. Not at all, MongoDB is one of the most powerful databases currently available. The problem is that most of the typical data stored in databases are by nature relational, for example, users list, users’ orders, order details, the stores, products, and product categories. And while MongoDB has some relational DB features, when you start using it for pure relational purposes problems start appearing over time.
Possible problems you should be aware
I am not going to compare Relational databases with NoSQL databases on a feature basis, as you can find many similar posts on the web. I really want to focus on the problems we have faced with MongoDB for one of our “typical relational” projects that we have developed for 5+ years.
N1. Designing relations is not straightforward in MongoDB
In relational databases, the design process is really straightforward. There are defined normalization forms (1N — 6NF and Boyce Codd) and if you follow at least the first 3, your database is in a pretty manageable state. And following the first 3 forms is not a challenge for any junior developer.
While in MongoDB schema design actually offers two choices for every data. You can either embed that data directly or reference another piece of data using the $lookup operator (similar to a JOIN). And both have their advantages and limitations.
This simple group member relation, which every beginner can create in relational design is not as simple in MongoDB.
If you decide to go with referencing like in relational design, then for simply fetching group members 2 lookups are needed and you still have problems with data integrity, since MongoDB doesn’t support foreign keys.
If you decide to go with embedding and storing user groups in a user document as an array, then getting user groups will be faster, but getting group users very slow.
The last way is to store group ids in a user document and user ids in a group document, and as you see, we have duplication and we need to deal with update/delete anomalies.
N2. Joining collections in MongoDB
The next “relational” operation is joining multiple collections, which is one of the easy and common operations for relational databases.
For “joining” two collections MongoDB has a $lookup operator. But MongoDB schema design advice is to avoid joins, as it is a heavy operation to join collections and does not have supported foreign keys for it as is. Instead, it is considered a best practice to save manual id references in one collections document and add indexes on the joining collection field to optimize the lookup.
While single lookups are more or less appropriate with the considerations above, joining with conditions and subqueries on a joined collection brings a number of restrictions, and both — execution and design complexity.
While writing this blog post, I found another great article fully dedicated to this topic. So, instead of reinventing the wheel, I will reuse data from the blog. It was very similar to our example: employees, departments, and employee_departments tables. The goal was to measure the performance of the two queries to find the total salary of each department, with or without the departments with no employees.
Not surprisingly, MongoDB was 50–130 times slower. It doesn’t mean that in general MongoDB is a slower database, it means that if you are using MongoDB then you need to design your database accordingly and when you face some performance issues, you can’t just add indexes to fix it, it will involve DB design and code changes.
N3. Data Integrity — ACID transactions
If you don’t know what is ACID, don’t worry, it just stands for Atomicity, Consistency, Isolation, and Durability and at a very high level, it ensures that your data never falls into an inconsistent state because of an operation that only partially completes.
One major drawback of non-relational databases is the lack of data integrity. Relational databases typically have ACID transactions for multiple operations because, by their nature, an update to a relational database can often mean updating multiple related tables.
MongoDB, and the majority of databases, support ACID transactions for single operations. Later MongoDB 4.0 came with its multi-document ACID transaction support. Still, it’s the use case for the DB user to handle the transaction and you can’t take advantage of them on a regular basis.
To sum up: MongoDB does not guarantee data integrity in any scenario, since it lacks relations. You are able to add some level of consistency by using multi-document transactions and application-level checks.
N4. Pagination in MongoDB doesn’t work as you would expect
Here is just a quick example of MongoDB skip() function performance compared with PostgreSQL offset/limit. For all cases, we request the same amount of users from our user collection. But query execution time keeps increasing as we move forward with pagination.
The reason it slows down is that to skip the 10,000 records and return 1000, the database still examines the 11,000 records and only returns 1000. More on how “skip” works and how to make it faster using a different approach you can read here.
Note: for testing, we have used cloud instances with 2GB of RAM. If you can afford a machine with the latest CPU and 64GB of RAM, then both MongoDB and PostgreSQL will execute in under 1 second, but in real life, we deploy our applications on cloud machines with 2 GB of RAM.
So, when MongoDB should be used?
While MongoDB states it is general purpose database, from our experience MongoDB is really great in the following scenarios:
- For fast prototyping — when it is only a prototype and you don’t want to worry about DB design, table creation, and indexing. There is no real data in the database and performance won’t be an issue.
- For typical document storage — like downloading and storing HTML pages, semi-structured documents, and other types of data that really have no defined structure.
- For high-speed logging, caching, etc in real-time.
MongoDB is one of the best databases in the world, but you should only use MongoDB when you really have non-relational data and you exactly know why the NoSQL database is needed. If you don’t have the answer to this question, simply go with a relational DB (MariaDB and PostgreSQL are excellent and free databases to use).
While MongoDB is known as a free, high-performance, and fast development database, when used in the wrong place, you will see the opposite effect, your system will become slow, hard manageable, and fixing performance issues will require changes in both data and code structure.
Share this post