Friday, March 23, 2007

Google is using Hibernate ORM? That's surprising

Update: Explanation of database partitioning techniques updated and enhanced. See towards the end of the post...

Update: Lots of people are coming here from the google-code blog. You might want to check out my posts on Simplicity, Management, Software Development, Career, or the main blog. If you like the content, you can subscribe to the feed. Thanks.


Learning that Google internally uses the Hibernate Object-Relational Mapping (ORM) framework brought my opinion of Google down by quite a few notches.


I strongly believe that in performance-critical applications, database transactions and SQL queries are best written and managed by the developer rather than by an over-generic ORM framework. If you have a low-usage application that will be simultaneously used by at the most 300 users, then go ahead, use an ORM layer and knock yourself out. But it is likely that if you have written a good app, it will become more and more popular, and you will start hitting performance problems soon. Any Google consumer application would be at the high end of performance-criticality requirements.

The reasons that have shaped my views are as follows -
-- Typically any application spends 25% of the time executing application code, and the other 75% executing database queries (Rough estimates based on my experience)
-- So if you want a high-performance app, optimize the data access layer
-- An ORM tool is very generic. It is meant to support multiple RDBMS systems and a huge variety of usage scenarios. So these are optimized for flexibility and not performance
-- Optimizations on the data access layer are best achieved by optimizing the SQL. You know your domain model and data schema best. So you are the best person to write the SQL
-- ORM tools like Hibernate do not allow you to write your own SQL. Instead you give definitions of your database tables and map them to your business objects. Hibernate automatically generates SQL from these relationships

Many people take issue with this and say that to manage transactions and write optimized SQL, you need very good developers, who are expensive and difficult to attract and retain.
My response: Of course, if you need to create outstanding apps, you need good developers. If you can't attract and retain them, that's your management problem. As for the cost aspect, hire a great developer, fire 5 average ones. You will get more work done, and save money.


Anyway, all the above is deviating from the main topic. Which is that Google has open-sourced Hibernate Shards, which is an extension to Hibernate to enable Horizontal Partitioning of Data.

What is Horizontal Data Partitioning? Well, if your app requires access to large large amounts of data (hundreds of millions of records), and performance is critical, and you have a large number of users, then storing all your data in one database will hit problems. This is because there will be too much read/write to the same database storage. One solution is to split the data in a big database table into "horizontal partitions" based on some particular criteria, and store the partitions in separate databases. For example, you will group your user profile table based on the user's location - US, Europe, Asia - and store each group of profile data in a separate database. This is called horizontal data partitioning as you are splitting the data at the "row level" within a database table.

Another option is Vertical Data Partitioning. Here, you partition the data in a database by putting different high-volume database tables into separate databases. For example, if you have lots of User Profiles, Messages, Transactions etc etc in separate tables, then with vertical partitioning, you will store each of these tables in separate databases. This is called vertical data partitioning as you are splitting your data at the "table level" within a database.

4 comments:

dolan said...

Actually, for quite a while now Hibernate has allowed you to write your own queries. I've used this functionality many times. That said, then you're more often than not tying yourself to a specific version of SQL (ie. Oracle SQL), but the option is there.

Arun said...

dolan
Actually tying yourself to a specific Oracle version of SQL is not abad thing. Coz if you really want database performance, you will need to take advantage of a lot of proprietary features of the database.

But I was under the impression that this was "HSQL" - Hibernate SQL - that you could write...

Nor said...

You can easily embed direct SQL using the sql-query facility.

Unknown said...

Please note a 20% project does not imply that it is used in the Google production system. Also not every system in Google is on the critical path of search, there are systems for internal HR etc. as well.

A 20% project is something any engineer, can choose to work on any topic ,as a 20% project. If it has wings it may become a product offering, but not not every 20% project is a product or core infrastructure of Google.

So you may be reading too much into something that is offered with effort on 20% time by some really bright engineers.