Author: Karsten Silz
Oct 29, 2021   |  updated Dec 7, 2021 9 min read

Permalink: https://betterprojectsfaster.com/guide/java-full-stack-index-2021-11-db/

Java Full-Stack Index Nov 2021: Databases


This is an old version! Click below for the current one.

See Current Version


Summary

  • On your current project, keep your existing database unless that database is absolutely, irrevocably, really not working out for you.
  • If you need to switch databases or are on a new project:
    • If you know that you’ll need the NoSQL features and/or scalability, and you can’t get this with MySQL, then use MongoDB.
    • Otherwise, use MySQL.


Table Of Contents



Applications

These recommendations are for building enterprise applications on PCs and mobile devices - forms, data grids, reports. They are not for games or media applications.


Choices

Here are the choices in alphabetical order:

I’m only listing open-source products here. I don’t include commercial databases (like Oracle or Microsoft SQL Server) or cloud-proprietay databases (like Amazon DynamoDB or Google Cloud Spanner). Why?

For commercial and cloud products, there are many other factors that play a role in deciding for or against a product - money, legal considerations, existing contracts, vendor relations, etc. Restricting myself to open-source products means restricting myself to technical considerations only - or so I hope at least.

I also have no production experience with NoSQL databases. So I looked at the databases that Spring Data supports and picked what I think are the general applicable NoSQL database. Please tell me if I picked the wrong ones here!


Popularity

Why Popularity - and How?

Picking a popular technology makes our developer life easier: Easier to learn, easier to build, debug & deploy, easier to hire, and easier to convince teammates & bosses. Now I look at technology popularity as a funnel from interest to learning, application, and finally to skill:

Technology popularity funnel
Technology popularity funnel
  • Quantity decreases in the funnel - we’re interested in many technologies, but few end up on our resumes.
  • Time increases in the funnel - it takes many months, often years, for technology to move from “interest” to “skill”.

We’re interested in the trend of the ratio between competing technologies. So we use Google searches to measure interest, Udemy course buyers to measure learning, Stack Overflow questions to measure learning & application, and mentions in Indeed job ads to measure skills.


Google Searches

Google Trends demonstrates the initial interest in a technology over time.

Google Trends for Cassandra, MongoDB, MySQL, Neo4j, PostgreSQL since 2004
Google Trends for Cassandra, MongoDB, MySQL, Neo4j, PostgreSQL since 2004

This link produces the chart. Google Trends only allows five choices, so here’s the link with Couchbase instead of Neo4j - it doesn’t make a difference to the overall picture.

Just like Java, MySQL still leads the pack at 1/8 of its peak popularity from 2004. Unlike Java, MySQL’s competitors caugt up. So let’s zoom in into the last five years:

Google Trends for Cassandra, MongoDB, MySQL, Neo4j, PostgreSQL for the last 5 years
Google Trends for Cassandra, MongoDB, MySQL, Neo4j, PostgreSQL for the last 5 years

It’s a three-horse race here: MySQL still wins, PostgreSQL is second, and MongoDB third. MySQL is slighlty trending upwards recenlty, while both PostgreSQL and MongoDB have remained flat over the last year. This link produces the chart above.

Here’s the link with Couchbase instead of Neo4j - it also hovers barely above zero, just like Neo4j.


Students at Udemy

Udemy is one of the biggest online learning sites. They publish the number of courses and the number of students (if it goes beyond a certain threshold). This shows how many people evaluate a technology. PostgreSQL is the baseline here:

Students at Udemy for Cassandra, Couchbase, MongoDB, MySQL, Neo4j, PostgreSQL
Students at Udemy for Cassandra, Couchbase, MongoDB, MySQL, Neo4j, PostgreSQL

MySQL wins, but now MongoDB places second while PostgreSQL is third. Cassandra, Couchbase, and Neo4j don’t have enough students the cross Udemy’s reporting threshold.

Here are the links that show the courses for all and the number of students for some:


Questions at Stack Overflow

Stack Overflow Trends shows which percentage of questions at Stack Overflow has a particular technology tag. It is a proxy for using a technology during evaluation and productive use. “More questions = better” to me.

Questions at Stack Overflow for Angular, JSF, React, Vaadin, and Vue.js
Questions at Stack Overflow for Angular, JSF, React, Vaadin, and Vue.js

MySQL wins while MongoDB and PostgreSQL share the second place.. If the current trajectories continue, MongoDB and PostgreSQL will overtake MySQL in about 2 years here. Neo4j and Cassandra decrease in lockstep, while Couchbase shrinks from an even smaller starting point. This link produces the chart above.


Job Ads at Indeed

The Indeed job search is active in 63 countries representing 92% of the worldwide GDP in 2020. It demonstrates the willingness of organizations to pay for a technology - the strongest indicator of popularity in my mind. PostgreSQL is the baseline.

Job ad mentions at Indeed for Angular, React, and Vue.js
Job ad mentions at Indeed for Angular, React, and Vue.js

MySQL wins, PostgreSQL is second, MongoDB third. Cassandra musters about a quarter of PostgreSQL’s job ads, while Couchbase and Neo4j ares just a fraction.

Please see here for details, caveats, and adjustments of the job ad mentions.

You can find the detailed search results with links here. They include breakdowns by continents:


Analysis

SQL or NoSQL?

When picking a database, the first decision we have to make is whether we want to use a relational database (SQL) or one that foregoes said relations (NoSQL). Relational databases have been around for 40 years. They are trusted workhorses. We know their strengths and weaknesses. NoSQL is the new kid on the block, so what do they offer that relational databases don’t?

I’ve never used NoSQL in production. So here’s my outside view of the NoSQL advantages. Please correct me if I’m wrong!

Schemaless
We don’t define tables ahead of time. We just start saving our objects. If we add new properties to our classes, we just save them which changes our database.

No joins
Database normalization is a best practice in relational databases: We keep our tables separate, link them with foreign key relationships, and query them with joins. An example is an order table in an online shop with a customer_id column that links it to our customer table. So far, so good. But when our tables have many records, table joins can be slow or sometimes even impossible to run. It’s my understanding that with NoSQL, there are no joins, and you embed duplicate data from the joined table into your original table. In other words: Denormalization.

Better scaling
When your relational database gets too big for a single server, or that server isn’t fast enough, or you want redundancy, then things get hairy in the relational world: You use sharding to split your data across servers, or you set up database clusters for fail-over and redundancy. None of this is part of the relational model, so it differs from database to database. And running a database cluster with multiple main nodes that our applications can write to is tricky to set up. Not so with NoSQL databases: They were designed to scale, or so they say, and all of this is a lot easier.


Again, without any production experience in NoSQL, I’ve got issues with two of these advantages.

Schemaless
When adding new columns to tables, the ALTER TABLE statement is the least of our problems in my experience. The bigger issue is: How do I fill these new columns meaningfully for my existing data? Let’s say we add a date_of_birth column to a customer table. The default value is probably null. So how do I get the date of birth for all my existing customers? I don’t see how this is getting easier with NoSQL: I still only have one default for my new dateOfBirth property, and I similarly need to somehow get the date of birth for the existing customers.

Still, if we want schemaless in our relational databases, we can also get (most of) that: MySQL has a JSON datatype, as does Postgres.

No joins
Joins don’t work well - or not at all - if we have a lot of records. This means that if we don’t have many records, then joins work just fine. And it seems to me that we can denormalize data just as well in our relational database: We can duplicate the customer columns in the order table if we want to.

The data duplication in NoSQL databases and in denormalized relational ones has side effects, too: If we have properties/columns with customer data in a couple of classes/tables, then we need to update all of them in addition to our main customer class/table when customer data changes.

I’ve also heard that with NoSQL databases, we need to know how we’ll use the data and how we’ll query it before we start storing data. I suppose that’s because of the “no joins” duplication of data. But that doesn’t make much sense to me: Can’t we simply duplicate customer data into our order class after we created the order class? Isn’t the schemaless nature of NoSQL making that especially easy?

To me, the one clear advantage of NoSQL databases over relational databases is better scalability. So if we have a ton of data, then NoSQL databases work better because they were designed for that use case.

Now not all is lost for the relational database fans among us. If you want to host your own database server, then a product like CockroachDB is right for you: A NoSQL database that acts like a relational database on the outside. In the case of CockroachDB, it behaves like Postgres, minus features like stored procedures, triggers, and such. If you need your database in the cloud, then vendors like Amazon, Google and Microsoft are happy to scale relational databases for you.

Assessment

So here’s my assessment of the candidates:

  • There was a time when MySQL was synonymous with “database on the internet”. I mean, MySQL was the “M” in “LAMP stack”, the dominating architecture for hosting websites in the early 2000s. It’s still the most popular database out there, but not by much. Being bought by Sun first and then Oracle didn’t do MySQL any favors. Neither did the free fork MariaDB, created by one of the MySQL founders.
  • Postgres has always played second fiddle to MySQL. And that for a database that defines itself as the “the World’s Most Advanced Open Source Relational Database”! But Postgres does have the image of being more sophisticated than MySQL. What it does have for sure is the concept of extensions that can safely add new functionality.
  • MongoDB is the most popular NoSQL database on this list.
  • I don’t know much about Cassandra, Couchbase, and Neo4j, except that they are much less popular than MongoDB (and decreasing) and, besides Cassandra, are barely mentioned in job ads.

This is my recommendation:

  • On your current project, keep your existing database unless that database is absolutely, irrevocably, really not working out for you.
  • If you need to switch databases or are on a new project:
    • If you know that you’ll need the NoSQL features and/or scalability, and you can’t get this with MySQL, then use MongoDB.
    • Otherwise, use MySQL.

comments powered by Disqus