cd $HOME
How to optimize a JOIN?
A SQL JOIN is a complex mechanism. A whole bunch of algorithms can be used to perform the operation, which makes the task of optimizing the process difficult (should I add some indexes? Is the Buffer size too small? ...).
I was working at Cornell University on Minibase (a tiny DBMS, Database Management System, for educational purpose): it provides four types of join algorithms (Simple Tuple Loop Join, Block Nested Loop Join, Index Nested Loop Join and Sort Merge Join). I have benchmarked all of them to see the influence of several parameters, such as the size of the Buffer, position of the relations (R join S vs. S join R), ...
You can download the result and analysis of the study here: http://www.mouraf.org/documents/joins_analysis.pdf
The goal was to understand the general mechanisms that a DBMS can provide to join relations and how much they cost (number of I/O). It is not a HOWTO explaining how to tweak MySQL. Not all DBMS provide all the algorithms mentioned before anyways. It however gives some hints of what to optimize when your queries take too much time to run.





Comments