Polyglot persistence is used often and it’s often
mentioned in articles. Simply explained it means awareness of different types
of persistence models and technologies, and using the right or better
persistence model for the data that needs to be stored. In this definition
there’s a distinction between traditional relational database models, object
based database models, and the more intangible NoSQL concept.
I have a lot of experience in relational data storage, as
do many others, from using Oracle, MySQL databases etc. and one could argue
that this is still the most common method for storing data. However, not so
long ago the NoSQL concept received a lot of attention and I have long been
interested in this, so I’ll devote a post to the topic.
NoSQL is not as simple to define as RDMS which is
essentially SQL. When I first read about it I didn’t get it and I’m still not
sure if I see the big picture with NoSQL. Now, I’ve said that so you know that
you can’t expect a write up with much detail. This post is for me to condense and
gather some of my early insights into NoSQL, from a bird’s eye view.
One thing that I realized quickly is that NoSQL is tied
to large clusters of databases, which is in contrast to relational databases.
Not to say RDMS technologies can’t be clustered but often the system is based
on central data storage and growing to a cluster of relational databases is
very problematic. NoSQL is supposed to handle this much better.
So, NoSQL come in three major forms and each of these
forms comes very close the type of data you need to store. Choosing which form
is very important and is the whole point of polyglot persistence, to find the
better storage model. The three models are:
·
Key/Value-pairs
·
Graph
·
Document
They are each different and all of them fall under the
umbrella of NoSQL. All of them are very different to RDMS and that’s why NoSQL
has had such a strong impact on new technology. In designing enterprise apps or
systems today it would not be fair to assume that your persistence should be
relational. It may the preferred option but awareness and familiarity of the
other options will enforce the reasons for choosing relational persistence,
after considering NoSQL. The gains of choosing the correct one may minimize
development time and result in a better application so due diligence on the different
options should be a priority.
Data being stored
|
Persistence technology
|
Applications
|
Financial data
|
Oracle, MySQL, SQL Server etc
|
Transactional updates, ACID
|
Reporting
|
Oracle, MySQL, SQL Server etc
|
|
User sessions
|
Redis
|
Rapid read/write access
|
Shopping cart
|
Riak, DynamoDB
|
High availability in multiple locations
|
Recommendations
|
Neo4J
|
Traverse links between friends, product and purchases
|
Product catalog
|
MongoDB
|
Many reads, infrequent writes
|
User activity logs
|
Cassandra
|
Large cluster, many writes on many nodes
|
The table is based on findings in a post by
Martin Fowler I read recently and gives a suggestion to a scenario in which
persistence technology may be applicable. For example a global e-commerce store
may be using Riak or DynamoDB to serve customers in different parts of the
planet. This is obviously not enough to make a decision on which technology to
use for a specific application or in given scenario so I wanted to go just a
little deeper. I wanted to find out more about the characteristics of the different
technologies and in what scenario they can be considered.
Redis (Remote
Dictionary Service) is basically a key-value storage in RAM with built in
Persistence. Since it’s in RAM its extremely fast and suitable for quick
read/write. Redis support data types (key and values) such as strings, hashes,
lists, and sets. The string is the most basic type and the other types are
actually containers of strings, there the characteristics of the string is
important. The string may be up 512MB and can store images or serialized objects,
and it is binary safe. The string and the other types for that matter may be
used for keys or values, however there’s a recommendation not to make the key
too large.
I’ve read or heard of several applications using Redit in
one way or the other, for example a twitter like feed, a authentication store,
a leaderboard, a roster with online/offline status, a note keeping app.
Another things about Redit is that is cross-platform, have clients
in numerous languages so it seems to be a good choice if the application is intended
for multiple platforms and if different programming languages are used.
Riak is also a
key/value store. From visiting their website the technology is used by many
well-known internet brands, for example the online retailer Best Buy. Its main
purpose is real-time systems where availability is a high priority as well as
scalability. It has a full-text search engine and some advanced indexing
features which make latency low. Another use case is mobile apps, for example
the find-a-taxi app Flywheel is said to use Riak.
DynamoDB is a
database from Amazon and is part of the AWS (Amazon Web Services) suite. Being
part of the AWS means it’s hosted by the amazon cloud and does integrate well
with the many other services in the AWS infrastructure. This means that data is
stored on SSD drives are easily replicated across the regional zones of AWS.
DynamoDB is a table based data storage but the tables have no schema expect for
the fact that each table have a primary key. The three concepts of tables,
items are attributes are central to the datamodel.
To read data a Query or a Scan is used, where a Query for a primary
key and the scan searches the entire table. Query and results are in JSON
format.
The scalability of DynamoDB make it suitable for online
portals with a large number of users. With it comes the full AWS infrastructure,
including the power amazon management console.
Neo4j is a graph
database, it is based on connections between nodes in a web where the
connections or edges of the graph contain data. It has support for transactions
and is robust according to ACID. Conceptually it may be thoughts of as a web of
relations between people in the form of a graph and therefore I would expect
have a big usage in Social Media. Recommendations based on your social network
would be a suitable application where a graph query is efficient. Neo4j uses
Cypher query language with some keywords taken from SQL but in general looks
and is used quite differently.
MongoDB is a document
database. Documents are stored in JSON like format and is rumored to be the
most popluar NoSQL database. Perhaps this is due to its first release in 2007
which is very early in the history of NoSQL technology. Search queries can be
made of fields, ranges or regular expressions. Master-slave replication is one of the
features of MongoDB. Suitable applications for MongoDB and document databases
is where the concept of a document is central such as in a News Agency, I
believe that many well know news providers uses MongoDB or other document
databases.
Cassandra is a
distributed database system with a key-value storage model. It used for storing
large data as it scales horizontally with little effort. It been tested in huge
data volumes up to hundreds of terabytes over hundreds of machines. If storing
and consuming large volumes of then Cassandra may be the model of choice.
There are many more implementations of NoSQL databases
available and if one is pondering using one the next thing after deciding
whether to use a key/value, graph or document storage would be to try as many
as possible. Things was a lot easier when RDSMS was the only options wasn’t it?
As a last sentence I’ll recognize Scott Leberknight who is
believed by many to be the first person to use the term Polyglot Persistence in
this article.
No comments:
Post a Comment