The Oracle relational database management system (RDBMS) has been around for a very long time. In the early days it came with a number of manuals. Version 4 had three small binders that fit in a nice box that was popular with software back in the 1980's. Version 5 replaced the binders with books that came with their own boxes. Version 6 seemed to come with its own bookshelf. I would have to say that my favorite version of the documentation came with the beta release of version 7 as there was an entire book devoted to performance tuning. All of the tips and tricks to tune the seemingly complex database software was in one volume. When the production documentation was released, that very important volume was embedded into one of the three larger books on database administration. I was sad to see it disappear and kept copious notes in my beta manual that I used with a fair number of production systems.
Performance tuning is a topic that applies to all database management system and PostgreSQL is no exception. There are a number of topics related to performance tuning that include:
- Performance Monitoring
- SQL Statement Optimization
- Proper Indexes
- Server Configuration
- Hardware Upgrades
- Data Partitioning
Amazingly, if you were to go back and look at the 5 major topics covered in that old Oracle7 Performance Tuning manual, the topics would be relatively similar to the first 5 here even though we are now dealing with PostgreSQL. Here is a quick summary of each topic.
Performance Monitoring
I have already covered
monitoring PostgreSQL, but there are more things you can do when you want to monitor for performance. PostgreSQL allows you to turn on
monitoring statistics to collect information about your database cluster. Normally statistics are left turned off so as not to impede performance. Turning them on adds some overhead and so the best course of action is to turn them on during the testing phase of your application development, run real-world tests, analyze the information they provide, make appropriate changes, and turn statistics off before putting your application in production. Understanding what information is collected and how to use it is a critical tool in helping you optimize PostgreSQL for your environment.
SQL Statement Optimization
PostgreSQL uses a cost-based optimizer to come up with the fastest way to retrieve data for a query. Understanding how the optimizer works and
how to influence it will help performance. You can also see how PostgreSQL will execute your query by prefacing it with
EXPLAIN or EXPLAIN ANALYZE. Generally you will not need to tune your SQL queries but there are some exceptions.
Proper Indexes
Indexes provide a shortcut to the data stored in a table. A query can use an index that is generally much faster than doing a full-table scan provided an index exists for a specific column and that column is used without modification in the query's
WHERE clause. There are situations where an index will slow a query down. They also increase the time it takes to add data. Making sure right mix indexes is important for performance.
Server Configuration
PostgreSQL provides the
postgresql.conf file that allows you to configure it for your environment. There are a number of internal buffers and caches that can be made larger and increase performance so long as you have the hardware memory for them. Making them too large can decrease performance and limit the number of connections to your database. As PostgreSQL generally comes configured to run on the smallest of servers, there are a lot of tweaks that can be made to the
postgresql.conf file to help increase performance.
Hardware Upgrades
If you are planning a new database server installation, then you may want to start with hardware. Using more than one disk, increasing RAM, or multiple CPU's can all help increase performance. However it is listed last as you will best understand what upgrades are most beneficial after seeing how your application works with PostgreSQL.
Data Partitioning
One of the reasons that Oracle 7's beta documentation didn't include the topic of data partitioning is that it wasn't made available until version 8. It is something that has been available in PostgreSQL for a while and allows you to break your table into smaller chunks based on the values in a particular column or group of columns. The idea is that certain queries that leverage natural data boundaries will run faster because you only need to look at a small set of data. The database only needs to look in a specific partition instead of the entire table.
An Artificial Benchmark
PostgreSQL comes with an artificial benchmark tool called
pgbench and it can be used to help you fine tune your database installation while you develop your application. It is most helpful as you modify your server configuration or make hardware changes but can be used with any aspect of performance tuning. Spend some time learning how to use it and feel free to play around with it so it mimics your own application. Those hours you spend tuning at the beginning will be well worth it.