Tuesday, September 15, 2015

Hekaton Part 7 - Hash Collisions & Hash Buckets

As explained in the earlier post, a hash index places pointers to the rows in hash buckets, depending upon the value returned by hash function. Hash bucket count for a index is fixed at the time of creation and doesn't grow as data grows. So, what happens when the index column contains duplicates and they hash to the same hash bucket? Sometimes, two values, even though they are different, they may hash to the same hash bucket. Such a scenario is termed as hash collision.

Hash Indexes in "Hekaton in memory tables" handles the hash collision in the following way
1) When two values hash to the same hash bucket(h1), the pointer to the latest row (R2) inserted is stored in the hash bucket
2) The latest inserted row(R2) would contain a pointer to the row(R1) that was present previously  in hash bucket(h1).

Refer to the pic below


Hash Index is created on column c1. Row with value "Alice" for Column c1 resides at address 2000. Hash("Alice") maps to hash bucket 1 and hence hash bucket 1 contains the address 2000.

Refer to the pic below, to see what happens when another row with value "Alice" for c1 is inserted.



Lets say the new row inserted is at address 3000. Hash bucket "h1" now contains the address 3000. The new row's meta data ( index pointer)  at address 3000, contains a pointer to the previous row at Address 2000.
 
Please note 2 important points here.
  1. In "in memory" tables, rows contain index pointers unlike disk tables where indexes contain pointers / row locators to table.
  2. These index pointers actually link the table together. This is the reason for atleast one index rule in any in memory table
In the next post, we will see a few demos to take a closer look at hash collision and index chains

No comments: