Column oriented DBMS

I’ve been investigating Column Oriented DBMS systems today. These are designed for data warehousing applications (as opposed to transactional processing). They use a data storage mechanism which is orthogonal to standard row oriented DBMSes.

“Normal” DBMSes are row based: they store each row of a table as a contiguous record. Column oriented databases store each column as a contiguous record. For data warehousing applications this can yield an order of magnitude increase in query times as only the data for the specifically queried columns needs to be read from disk. They can also provide a similar decrease in storage requirements; since the columnar data will likely consist of very similar entries, data compression will work far better.

The disadvantage is that this storage method is optimised for reading, and not for writing. You wouldn’t replace a standard transactional DB system with one, but they are beneficial for analytics as this application primarily reads data.

There are various companies which provide CODBMS solutions, such as Vertica, or SAND.

I thought about how such a system might be implemented; I ended up realising you would store each column of a table in a seperate file, and index into the file to retrieve data. This could easily be simulated in a conventional row oriented DBMS by splitting a single multi-column table up into multiple sub-tables, one per column, with a view on top joining them all together. I would imagine the performance will be absolutely terrible as a row-oriented DBMS is not optimised to work in this way (as a CODBMS would be), but it sounds like an interesting experiment to try one afternoon at work. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: