Column oriented DBMS

30 October 2007

I’ve been investigating Column Oriented DBMS systems today. These are designed for data warehousing applications (as opposed to transactional processing). They use a data storage mechanism which is orthogonal to standard row oriented DBMSes.

“Normal” DBMSes are row based: they store each row of a table as a contiguous record. Column oriented databases store each column as a contiguous record. For data warehousing applications this can yield an order of magnitude increase in query times as only the data for the specifically queried columns needs to be read from disk. They can also provide a similar decrease in storage requirements; since the columnar data will likely consist of very similar entries, data compression will work far better.

The disadvantage is that this storage method is optimised for reading, and not for writing. You wouldn’t replace a standard transactional DB system with one, but they are beneficial for analytics as this application primarily reads data.

There are various companies which provide CODBMS solutions, such as Vertica, or SAND.

I thought about how such a system might be implemented; I ended up realising you would store each column of a table in a seperate file, and index into the file to retrieve data. This could easily be simulated in a conventional row oriented DBMS by splitting a single multi-column table up into multiple sub-tables, one per column, with a view on top joining them all together. I would imagine the performance will be absolutely terrible as a row-oriented DBMS is not optimised to work in this way (as a CODBMS would be), but it sounds like an interesting experiment to try one afternoon at work. :)

Comments are moderated, so yours won't appear immediately. Your email is never published.

Leave a comment