Revisit the English version (#1835)

* Review the English version using Claude-4.5.

* Update mkdocs.yml

* Align the section titles.

* Bug fixes
This commit is contained in:
Yudong Jin
2025-12-30 17:54:01 +08:00
committed by GitHub
parent 091afd38b4
commit 45e1295241
106 changed files with 4195 additions and 3398 deletions

View File

@@ -1,12 +1,12 @@
# Hash algorithms
# Hash algorithm
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and chaining can **only ensure that the hash table functions normally when collisions occur, but cannot reduce the frequency of hash collisions**.
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and separate chaining **can only ensure that the hash table functions normally when hash collisions occur, but cannot reduce the frequency of hash collisions**.
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a separate chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
![Ideal and worst cases of hash collisions](hash_algorithm.assets/hash_collision_best_worst_condition.png)
**The distribution of key-value pairs is determined by the hash function**. Recalling the steps of calculating a hash function, first compute the hash value, then modulo it by the array length:
**The distribution of key-value pairs is determined by the hash function**. Recalling the calculation steps of the hash function, first compute the hash value, then take the modulo by the array length:
```shell
index = hash(key) % capacity
@@ -35,7 +35,7 @@ For cryptographic applications, to prevent reverse engineering such as deducing
- **Collision resistance**: It should be extremely difficult to find two different inputs that produce the same hash value.
- **Avalanche effect**: Minor changes in the input should lead to significant and unpredictable changes in the output.
Note that **"Uniform Distribution" and "Collision Resistance" are two separate concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
Note that **"uniform distribution" and "collision resistance" are two independent concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
## Design of hash algorithms
@@ -346,7 +346,57 @@ We know that the keys in a hash table can be of various data types such as integ
=== "Kotlin"
```kotlin title="built_in_hash.kt"
val num = 3
val hashNum = num.hashCode()
// Hash value of integer 3 is 3
val bol = true
val hashBol = bol.hashCode()
// Hash value of boolean true is 1231
val dec = 3.14159
val hashDec = dec.hashCode()
// Hash value of decimal 3.14159 is -1340954729
val str = "Hello 算法"
val hashStr = str.hashCode()
// Hash value of string "Hello 算法" is -727081396
val arr = arrayOf<Any>(12836, "小哈")
val hashTup = arr.hashCode()
// Hash value of array [12836, 小哈] is 189568618
val obj = ListNode(0)
val hashObj = obj.hashCode()
// Hash value of ListNode object utils.ListNode@1d81eb93 is 495053715
```
=== "Ruby"
```ruby title="built_in_hash.rb"
num = 3
hash_num = num.hash
# Hash value of integer 3 is -4385856518450339636
bol = true
hash_bol = bol.hash
# Hash value of boolean true is -1617938112149317027
dec = 3.14159
hash_dec = dec.hash
# Hash value of decimal 3.14159 is -1479186995943067893
str = "Hello 算法"
hash_str = str.hash
# Hash value of string "Hello 算法" is -4075943250025831763
tup = [12836, '小哈']
hash_tup = tup.hash
# Hash value of tuple (12836, '小哈') is 1999544809202288822
obj = ListNode.new(0)
hash_obj = obj.hash
# Hash value of ListNode object #<ListNode:0x000078133140ab70> is 4302940560806366381
```
=== "Zig"
@@ -355,7 +405,7 @@ We know that the keys in a hash table can be of various data types such as integ
```
??? pythontutor "Code Visualization"
??? pythontutor "Visualized Execution"
https://pythontutor.com/render.html#code=class%20ListNode%3A%0A%20%20%20%20%22%22%22%E9%93%BE%E8%A1%A8%E8%8A%82%E7%82%B9%E7%B1%BB%22%22%22%0A%20%20%20%20def%20__init__%28self,%20val%3A%20int%29%3A%0A%20%20%20%20%20%20%20%20self.val%3A%20int%20%3D%20val%20%20%23%20%E8%8A%82%E7%82%B9%E5%80%BC%0A%20%20%20%20%20%20%20%20self.next%3A%20ListNode%20%7C%20None%20%3D%20None%20%20%23%20%E5%90%8E%E7%BB%A7%E8%8A%82%E7%82%B9%E5%BC%95%E7%94%A8%0A%0A%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20num%20%3D%203%0A%20%20%20%20hash_num%20%3D%20hash%28num%29%0A%20%20%20%20%23%20%E6%95%B4%E6%95%B0%203%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%203%0A%0A%20%20%20%20bol%20%3D%20True%0A%20%20%20%20hash_bol%20%3D%20hash%28bol%29%0A%20%20%20%20%23%20%E5%B8%83%E5%B0%94%E9%87%8F%20True%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%201%0A%0A%20%20%20%20dec%20%3D%203.14159%0A%20%20%20%20hash_dec%20%3D%20hash%28dec%29%0A%20%20%20%20%23%20%E5%B0%8F%E6%95%B0%203.14159%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%20326484311674566659%0A%0A%20%20%20%20str%20%3D%20%22Hello%20%E7%AE%97%E6%B3%95%22%0A%20%20%20%20hash_str%20%3D%20hash%28str%29%0A%20%20%20%20%23%20%E5%AD%97%E7%AC%A6%E4%B8%B2%E2%80%9CHello%20%E7%AE%97%E6%B3%95%E2%80%9D%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%204617003410720528961%0A%0A%20%20%20%20tup%20%3D%20%2812836,%20%22%E5%B0%8F%E5%93%88%22%29%0A%20%20%20%20hash_tup%20%3D%20hash%28tup%29%0A%20%20%20%20%23%20%E5%85%83%E7%BB%84%20%2812836,%20'%E5%B0%8F%E5%93%88'%29%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%201029005403108185979%0A%0A%20%20%20%20obj%20%3D%20ListNode%280%29%0A%20%20%20%20hash_obj%20%3D%20hash%28obj%29%0A%20%20%20%20%23%20%E8%8A%82%E7%82%B9%E5%AF%B9%E8%B1%A1%20%3CListNode%20object%20at%200x1058fd810%3E%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%20274267521&cumulative=false&curInstr=19&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false

View File

@@ -1,25 +1,25 @@
# Hash collision
The previous section mentioned that, **in most cases, the input space of a hash function is much larger than the output space**, so theoretically, hash collisions are inevitable. For example, if the input space is all integers and the output space is the size of the array capacity, then multiple integers will inevitably be mapped to the same bucket index.
The previous section mentioned that, **in most cases, the input space of a hash function is much larger than the output space**, so theoretically, hash collisions are inevitable. For example, if the input space is all integers and the output space is the array capacity size, then multiple integers will inevitably be mapped to the same bucket index.
Hash collisions can lead to incorrect query results, severely impacting the usability of the hash table. To address this issue, whenever a hash collision occurs, we perform hash table resizing until the collision disappears. This approach is pretty simple, straightforward, and working well. However, it appears to be pretty inefficient as the table expansion involves a lot of data migration as well as recalculation of hash code, which are expansive. To improve efficiency, we can adopt the following strategies:
Hash collisions can lead to incorrect query results, severely impacting the usability of the hash table. To address this issue, whenever a hash collision occurs, we can perform hash table expansion until the collision disappears. This approach is simple, straightforward, and effective, but it is very inefficient because hash table expansion involves a large amount of data migration and hash value recalculation. To improve efficiency, we can adopt the following strategies:
1. Improve the hash table data structure in a way that **locating target element is still functioning well in the event of a hash collision**.
2. Expansion is the last resort before it becomes necessary, when severe collisions are observed.
1. Improve the hash table data structure so that **the hash table can function normally when hash collisions occur**.
2. Only expand when necessary, that is, only when hash collisions are severe.
There are mainly two methods for improving the structure of hash tables: "Separate Chaining" and "Open Addressing".
The main methods for improving the structure of hash tables include "separate chaining" and "open addressing".
## Separate chaining
In the original hash table, each bucket can store only one key-value pair. <u>Separate chaining</u> converts a single element into a linked list, treating key-value pairs as list nodes, storing all colliding key-value pairs in the same linked list. The figure below shows an example of a hash table with separate chaining.
In the original hash table, each bucket can store only one key-value pair. <u>Separate chaining</u> converts a single element into a linked list, treating key-value pairs as linked list nodes and storing all colliding key-value pairs in the same linked list. The figure below shows an example of a separate chaining hash table.
![Separate chaining hash table](hash_collision.assets/hash_table_chaining.png)
The operations of a hash table implemented with separate chaining have changed as follows:
- **Querying Elements**: Input `key`, obtain the bucket index through the hash function, then access the head node of the linked list. Traverse the linked list and compare key to find the target key-value pair.
- **Adding Elements**: Access the head node of the linked list via the hash function, then append the node (key-value pair) to the list.
- **Deleting Elements**: Access the head of the linked list based on the result of the hash function, then traverse the linked list to find the target node and delete it.
- **Querying elements**: Input `key`, obtain the bucket index through the hash function, then access the head node of the linked list, then traverse the linked list and compare `key` to find the target key-value pair.
- **Adding elements**: First access the linked list head node through the hash function, then append the node (key-value pair) to the linked list.
- **Deleting elements**: Access the head of the linked list based on the result of the hash function, then traverse the linked list to find the target node and delete it.
Separate chaining has the following limitations:
@@ -28,8 +28,8 @@ Separate chaining has the following limitations:
The code below provides a simple implementation of a separate chaining hash table, with two things to note:
- Lists (dynamic arrays) are used instead of linked lists for simplicity. In this setup, the hash table (array) contains multiple buckets, each of which is a list.
- This implementation includes a hash table resizing method. When the load factor exceeds $\frac{2}{3}$, we expand the hash table to twice its original size.
- Lists (dynamic arrays) are used instead of linked lists to simplify the code. In this setup, the hash table (array) contains multiple buckets, each of which is a list.
- This implementation includes a hash table expansion method. When the load factor exceeds $\frac{2}{3}$, we expand the hash table to $2$ times its original size.
```src
[file]{hash_map_chaining}-[class]{hash_map_chaining}-[func]{}
@@ -39,34 +39,34 @@ It's worth noting that when the linked list is very long, the query efficiency $
## Open addressing
<u>Open addressing</u> does not introduce additional data structures but instead handles hash collisions through "multiple probing". The probing methods mainly include linear probing, quadratic probing, and double hashing.
<u>Open addressing</u> does not introduce additional data structures but instead handles hash collisions through "multiple probes". The probing methods mainly include linear probing, quadratic probing, and double hashing.
Let's use linear probing as an example to introduce the mechanism of open addressing hash tables.
### Linear probing
Linear probing uses a fixed-step linear search for probing, differing from ordinary hash tables.
Linear probing uses a fixed-step linear search for probing, and its operation method differs from ordinary hash tables.
- **Inserting Elements**: Calculate the bucket index using the hash function. If the bucket already contains an element, linearly traverse forward from the conflict position (usually with a step size of $1$) until an empty bucket is found, then insert the element.
- **Searching for Elements**: If a hash collision is encountered, use the same step size to linearly traverse forward until the corresponding element is found and return `value`; if an empty bucket is encountered, it means the target element is not in the hash table, so return `None`.
- **Inserting elements**: Calculate the bucket index using the hash function. If the bucket already contains an element, linearly traverse forward from the conflict position (usually with a step size of $1$) until an empty bucket is found, then insert the element.
- **Searching for elements**: If a hash collision is encountered, use the same step size to linearly traverse forward until the corresponding element is found and return `value`; if an empty bucket is encountered, it means the target element is not in the hash table, so return `None`.
The figure below shows the distribution of key-value pairs in an open addressing (linear probing) hash table. According to this hash function, keys with the same last two digits will be mapped to the same bucket. Through linear probing, they are stored sequentially in that bucket and the buckets below it.
![Distribution of key-value pairs in open addressing (linear probing) hash table](hash_collision.assets/hash_table_linear_probing.png)
However, **linear probing is prone to create "clustering"**. Specifically, the longer the continuously occupied positions in the array, the greater the probability of hash collisions occurring in these continuous positions, further promoting the growth of clustering at that position, forming a vicious cycle, and ultimately leading to degraded efficiency of insertion, deletion, query, and update operations.
However, **linear probing is prone to create "clustering"**. Specifically, the longer the continuously occupied positions in the array, the greater the probability of hash collisions occurring in these continuous positions, further promoting clustering growth at that position, forming a vicious cycle, and ultimately leading to degraded efficiency of insertion, deletion, query, and update operations.
It's important to note that **we cannot directly delete elements in an open addressing hash table**. Deleting an element creates an empty bucket `None` in the array. When searching for elements, if linear probing encounters this empty bucket, it will return, making the elements below this bucket inaccessible. The program may incorrectly assume these elements do not exist, as shown in the figure below.
It's important to note that **we cannot directly delete elements in an open addressing hash table**. Deleting an element creates an empty bucket `None` in the array. When searching for elements, if linear probing encounters this empty bucket, it will return, making the elements below this empty bucket inaccessible. The program may incorrectly assume these elements do not exist, as shown in the figure below.
![Query issues caused by deletion in open addressing](hash_collision.assets/hash_table_open_addressing_deletion.png)
To solve this problem, we can adopt the <u>lazy deletion</u> mechanism: instead of directly removing elements from the hash table, **use a constant `TOMBSTONE` to mark the bucket**. In this mechanism, both `None` and `TOMBSTONE` represent empty buckets and can hold key-value pairs. However, when linear probing encounters `TOMBSTONE`, it should continue traversing since there may still be key-value pairs below it.
However, **lazy deletion may accelerate the performance degradation of the hash table**. Every deletion operation produces a delete mark, and as `TOMBSTONE` increases, the search time will also increase because linear probing may need to skip multiple `TOMBSTONE` to find the target element.
However, **lazy deletion may accelerate the performance degradation of the hash table**. Every deletion operation produces a deletion mark, and as `TOMBSTONE` increases, the search time will also increase because linear probing may need to skip multiple `TOMBSTONE` to find the target element.
To address this, consider recording the index of the first encountered `TOMBSTONE` during linear probing and swapping the positions of the searched target element with that `TOMBSTONE`. The benefit of doing this is that each time an element is queried or added, the element will be moved to a bucket closer to its ideal position (the starting point of probing), thereby optimizing query efficiency.
To address this, consider recording the index of the first encountered `TOMBSTONE` during linear probing and swapping the searched target element with that `TOMBSTONE`. The benefit of doing this is that each time an element is queried or added, the element will be moved to a bucket closer to its ideal position (the starting point of probing), thereby optimizing query efficiency.
The code below implements an open addressing (linear probing) hash table with lazy deletion. To make better use of the hash table space, we treat the hash table as a "circular array,". When going beyond the end of the array, we return to the beginning and continue traversing.
The code below implements an open addressing (linear probing) hash table with lazy deletion. To make better use of the hash table space, we treat the hash table as a "circular array". When going beyond the end of the array, we return to the beginning and continue traversing.
```src
[file]{hash_map_open_addressing}-[class]{hash_map_open_addressing}-[func]{}
@@ -74,11 +74,11 @@ The code below implements an open addressing (linear probing) hash table with la
### Quadratic probing
Quadratic probing is similar to linear probing and is one of the common strategies of open addressing. When a collision occurs, quadratic probing does not simply skip a fixed number of steps but skips a number of steps equal to the "square of the number of probes", i.e., $1, 4, 9, \dots$ steps.
Quadratic probing is similar to linear probing and is one of the common strategies for open addressing. When a collision occurs, quadratic probing does not simply skip a fixed number of steps but skips a number of steps equal to the "square of the number of probes", i.e., $1, 4, 9, \dots$ steps.
Quadratic probing has the following advantages:
- Quadratic probing attempts to alleviate the clustering effect of linear probing by skipping the distance of the square of the number of probes.
- Quadratic probing attempts to alleviate the clustering effect of linear probing by skipping distances equal to the square of the probe count.
- Quadratic probing skips larger distances to find empty positions, which helps to distribute data more evenly.
However, quadratic probing is not perfect:
@@ -90,14 +90,14 @@ However, quadratic probing is not perfect:
As the name suggests, the double hashing method uses multiple hash functions $f_1(x)$, $f_2(x)$, $f_3(x)$, $\dots$ for probing.
- **Inserting Elements**: If hash function $f_1(x)$ encounters a conflict, it tries $f_2(x)$, and so on, until an empty position is found and the element is inserted.
- **Searching for Elements**: Search in the same order of hash functions until the target element is found and returned; if an empty position is encountered or all hash functions have been tried, it indicates the element is not in the hash table, then return `None`.
- **Inserting elements**: If hash function $f_1(x)$ encounters a conflict, try $f_2(x)$, and so on, until an empty position is found and the element is inserted.
- **Searching for elements**: Search in the same order of hash functions until the target element is found and return it; if an empty position is encountered or all hash functions have been tried, it indicates the element is not in the hash table, then return `None`.
Compared to linear probing, the double hashing method is less prone to clustering, but multiple hash functions introduce additional computational overhead.
!!! tip
Please note that open addressing (linear probing, quadratic probing, and double hashing) hash tables all have the problem of "can not directly delete elements."
Please note that open addressing (linear probing, quadratic probing, and double hashing) hash tables all have the problem of "cannot directly delete elements".
## Choice of programming languages
@@ -105,4 +105,4 @@ Different programming languages adopt different hash table implementation strate
- Python uses open addressing. The `dict` dictionary uses pseudo-random numbers for probing.
- Java uses separate chaining. Since JDK 1.8, when the array length in `HashMap` reaches 64 and the length of a linked list reaches 8, the linked list is converted to a red-black tree to improve search performance.
- Go uses separate chaining. Go stipulates that each bucket can store up to 8 key-value pairs, and if the capacity is exceeded, an overflow bucket is linked; when there are too many overflow buckets, a special equal-capacity resizing operation is performed to ensure performance.
- Go uses separate chaining. Go stipulates that each bucket can store up to 8 key-value pairs, and if the capacity is exceeded, an overflow bucket is linked; when there are too many overflow buckets, a special equal-capacity expansion operation is performed to ensure performance.

View File

@@ -1,30 +1,30 @@
# Hash table
A <u>hash table</u>, also known as a <u>hash map</u>, is a data structure that establishes a mapping between keys and values, enabling efficient element retrieval. Specifically, when we input a `key` into the hash table, we can retrieve the corresponding `value` in $O(1)$ time complexity.
A <u>hash table</u>, also known as a <u>hash map</u>, establishes a mapping between keys `key` and values `value`, enabling efficient element retrieval. Specifically, when we input a key `key` into a hash table, we can retrieve the corresponding value `value` in $O(1)$ time.
As shown in the figure below, given $n$ students, each student has two data fields: "Name" and "Student ID". If we want to implement a query function that takes a student ID as input and returns the corresponding name, we can use the hash table shown in the figure below.
As shown in the figure below, given $n$ students, each with two pieces of data: "name" and "student ID". If we want to implement a query function that "inputs a student ID and returns the corresponding name", we can use the hash table shown below.
![Abstract representation of a hash table](hash_map.assets/hash_table_lookup.png)
In addition to hash tables, arrays and linked lists can also be used to implement query functionality, but the time complexity is different. Their efficiency is compared in the table below:
In addition to hash tables, arrays and linked lists can also implement query functionality. Their efficiency comparison is shown in the following table.
- **Inserting an element**: Simply append the element to the tail of the array (or linked list). The time complexity of this operation is $O(1)$.
- **Searching for an element**: As the array (or linked list) is unsorted, searching for an element requires traversing through all of the elements. The time complexity of this operation is $O(n)$.
- **Deleting an element**: To remove an element, we first need to locate it. Then, we delete it from the array (or linked list). The time complexity of this operation is $O(n)$.
- **Adding elements**: Simply add elements to the end of the array (linked list), using $O(1)$ time.
- **Querying elements**: Since the array (linked list) is unordered, all elements need to be traversed, using $O(n)$ time.
- **Deleting elements**: The element must first be located, then deleted from the array (linked list), using $O(n)$ time.
<p align="center"> Table <id> &nbsp; Comparison of time efficiency for common operations </p>
<p align="center"> Table <id> &nbsp; Comparison of element query efficiency </p>
| | Array | Linked List | Hash Table |
| -------------- | ------ | ----------- | ---------- |
| Search Elements | $O(n)$ | $O(n)$ | $O(1)$ |
| Insert Elements | $O(1)$ | $O(1)$ | $O(1)$ |
| Delete Elements | $O(n)$ | $O(n)$ | $O(1)$ |
| | Array | Linked List | Hash Table |
| --------------- | ------ | ----------- | ---------- |
| Find element | $O(n)$ | $O(n)$ | $O(1)$ |
| Add element | $O(1)$ | $O(1)$ | $O(1)$ |
| Delete element | $O(n)$ | $O(n)$ | $O(1)$ |
As observed, **the time complexity for operations (insertion, deletion, searching, and modification) in a hash table is $O(1)$**, which is highly efficient.
As observed, **the time complexity for insertion, deletion, search, and modification operations in a hash table is $O(1)$**, which is very efficient.
## Common operations of hash table
## Common hash table operations
Common operations of a hash table include: initialization, querying, adding key-value pairs, and deleting key-value pairs. Here is an example code:
Common operations on hash tables include: initialization, query operations, adding key-value pairs, and deleting key-value pairs. Example code is as follows:
=== "Python"
@@ -33,15 +33,15 @@ Common operations of a hash table include: initialization, querying, adding key-
hmap: dict = {}
# Add operation
# Add key-value pair (key, value) to the hash table
hmap[12836] = "Xiao Ha"
hmap[15937] = "Xiao Luo"
hmap[16750] = "Xiao Suan"
hmap[13276] = "Xiao Fa"
hmap[10583] = "Xiao Ya"
# Add key-value pair (key, value) to hash table
hmap[12836] = "小哈"
hmap[15937] = "小啰"
hmap[16750] = "小算"
hmap[13276] = "小法"
hmap[10583] = "小鸭"
# Query operation
# Input key into hash table, get value
# Input key into hash table to get value
name: str = hmap[15937]
# Delete operation
@@ -57,14 +57,14 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Add operation */
// Add key-value pair (key, value) to hash table
map[12836] = "Xiao Ha";
map[15937] = "Xiao Luo";
map[16750] = "Xiao Suan";
map[13276] = "Xiao Fa";
map[10583] = "Xiao Ya";
map[12836] = "小哈";
map[15937] = "小啰";
map[16750] = "小算";
map[13276] = "小法";
map[10583] = "小鸭";
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
string name = map[15937];
/* Delete operation */
@@ -80,14 +80,14 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Add operation */
// Add key-value pair (key, value) to hash table
map.put(12836, "Xiao Ha");
map.put(15937, "Xiao Luo");
map.put(16750, "Xiao Suan");
map.put(13276, "Xiao Fa");
map.put(10583, "Xiao Ya");
map.put(12836, "小哈");
map.put(15937, "小啰");
map.put(16750, "小算");
map.put(13276, "小法");
map.put(10583, "小鸭");
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
String name = map.get(15937);
/* Delete operation */
@@ -102,15 +102,15 @@ Common operations of a hash table include: initialization, querying, adding key-
Dictionary<int, string> map = new() {
/* Add operation */
// Add key-value pair (key, value) to hash table
{ 12836, "Xiao Ha" },
{ 15937, "Xiao Luo" },
{ 16750, "Xiao Suan" },
{ 13276, "Xiao Fa" },
{ 10583, "Xiao Ya" }
{ 12836, "小哈" },
{ 15937, "小啰" },
{ 16750, "小算" },
{ 13276, "小法" },
{ 10583, "小鸭" }
};
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
string name = map[15937];
/* Delete operation */
@@ -126,14 +126,14 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Add operation */
// Add key-value pair (key, value) to hash table
hmap[12836] = "Xiao Ha"
hmap[15937] = "Xiao Luo"
hmap[16750] = "Xiao Suan"
hmap[13276] = "Xiao Fa"
hmap[10583] = "Xiao Ya"
hmap[12836] = "小哈"
hmap[15937] = "小啰"
hmap[16750] = "小算"
hmap[13276] = "小法"
hmap[10583] = "小鸭"
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
name := hmap[15937]
/* Delete operation */
@@ -149,14 +149,14 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Add operation */
// Add key-value pair (key, value) to hash table
map[12836] = "Xiao Ha"
map[15937] = "Xiao Luo"
map[16750] = "Xiao Suan"
map[13276] = "Xiao Fa"
map[10583] = "Xiao Ya"
map[12836] = "小哈"
map[15937] = "小啰"
map[16750] = "小算"
map[13276] = "小法"
map[10583] = "小鸭"
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
let name = map[15937]!
/* Delete operation */
@@ -170,15 +170,15 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Initialize hash table */
const map = new Map();
/* Add operation */
// Add key-value pair (key, value) to the hash table
map.set(12836, 'Xiao Ha');
map.set(15937, 'Xiao Luo');
map.set(16750, 'Xiao Suan');
map.set(13276, 'Xiao Fa');
map.set(10583, 'Xiao Ya');
// Add key-value pair (key, value) to hash table
map.set(12836, '小哈');
map.set(15937, '小啰');
map.set(16750, '小算');
map.set(13276, '小法');
map.set(10583, '小鸭');
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
let name = map.get(15937);
/* Delete operation */
@@ -193,23 +193,23 @@ Common operations of a hash table include: initialization, querying, adding key-
const map = new Map<number, string>();
/* Add operation */
// Add key-value pair (key, value) to hash table
map.set(12836, 'Xiao Ha');
map.set(15937, 'Xiao Luo');
map.set(16750, 'Xiao Suan');
map.set(13276, 'Xiao Fa');
map.set(10583, 'Xiao Ya');
console.info('\nAfter adding, the hash table is\nKey -> Value');
map.set(12836, '小哈');
map.set(15937, '小啰');
map.set(16750, '小算');
map.set(13276, '小法');
map.set(10583, '小鸭');
console.info('\nAfter adding, hash table is\nKey -> Value');
console.info(map);
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
let name = map.get(15937);
console.info('\nInput student number 15937, query name ' + name);
console.info('\nInput student ID 15937, queried name ' + name);
/* Delete operation */
// Delete key-value pair (key, value) from hash table
map.delete(10583);
console.info('\nAfter deleting 10583, the hash table is\nKey -> Value');
console.info('\nAfter deleting 10583, hash table is\nKey -> Value');
console.info(map);
```
@@ -221,14 +221,14 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Add operation */
// Add key-value pair (key, value) to hash table
map[12836] = "Xiao Ha";
map[15937] = "Xiao Luo";
map[16750] = "Xiao Suan";
map[13276] = "Xiao Fa";
map[10583] = "Xiao Ya";
map[12836] = "小哈";
map[15937] = "小啰";
map[16750] = "小算";
map[13276] = "小法";
map[10583] = "小鸭";
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
String name = map[15937];
/* Delete operation */
@@ -246,14 +246,14 @@ Common operations of a hash table include: initialization, querying, adding key-
/* Add operation */
// Add key-value pair (key, value) to hash table
map.insert(12836, "Xiao Ha".to_string());
map.insert(15937, "Xiao Luo".to_string());
map.insert(16750, "Xiao Suan".to_string());
map.insert(13279, "Xiao Fa".to_string());
map.insert(10583, "Xiao Ya".to_string());
map.insert(12836, "小哈".to_string());
map.insert(15937, "小啰".to_string());
map.insert(16750, "小算".to_string());
map.insert(13279, "小法".to_string());
map.insert(10583, "小鸭".to_string());
/* Query operation */
// Input key into hash table, get value
// Input key into hash table to get value
let _name: Option<&String> = map.get(&15937);
/* Delete operation */
@@ -270,7 +270,47 @@ Common operations of a hash table include: initialization, querying, adding key-
=== "Kotlin"
```kotlin title="hash_map.kt"
/* Initialize hash table */
val map = HashMap<Int,String>()
/* Add operation */
// Add key-value pair (key, value) to hash table
map[12836] = "小哈"
map[15937] = "小啰"
map[16750] = "小算"
map[13276] = "小法"
map[10583] = "小鸭"
/* Query operation */
// Input key into hash table to get value
val name = map[15937]
/* Delete operation */
// Delete key-value pair (key, value) from hash table
map.remove(10583)
```
=== "Ruby"
```ruby title="hash_map.rb"
# Initialize hash table
hmap = {}
# Add operation
# Add key-value pair (key, value) to hash table
hmap[12836] = "小哈"
hmap[15937] = "小啰"
hmap[16750] = "小算"
hmap[13276] = "小法"
hmap[10583] = "小鸭"
# Query operation
# Input key into hash table to get value
name = hmap[15937]
# Delete operation
# Delete key-value pair (key, value) from hash table
hmap.delete(10583)
```
=== "Zig"
@@ -279,11 +319,11 @@ Common operations of a hash table include: initialization, querying, adding key-
```
??? pythontutor "Code Visualization"
??? pythontutor "Visualized Execution"
https://pythontutor.com/render.html#code=%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%93%88%E5%B8%8C%E8%A1%A8%0A%20%20%20%20hmap%20%3D%20%7B%7D%0A%20%20%20%20%0A%20%20%20%20%23%20%E6%B7%BB%E5%8A%A0%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%9C%A8%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E6%B7%BB%E5%8A%A0%E9%94%AE%E5%80%BC%E5%AF%B9%20%28key,%20value%29%0A%20%20%20%20hmap%5B12836%5D%20%3D%20%22%E5%B0%8F%E5%93%88%22%0A%20%20%20%20hmap%5B15937%5D%20%3D%20%22%E5%B0%8F%E5%95%B0%22%0A%20%20%20%20hmap%5B16750%5D%20%3D%20%22%E5%B0%8F%E7%AE%97%22%0A%20%20%20%20hmap%5B13276%5D%20%3D%20%22%E5%B0%8F%E6%B3%95%22%0A%20%20%20%20hmap%5B10583%5D%20%3D%20%22%E5%B0%8F%E9%B8%AD%22%0A%20%20%20%20%0A%20%20%20%20%23%20%E6%9F%A5%E8%AF%A2%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%90%91%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E8%BE%93%E5%85%A5%E9%94%AE%20key%20%EF%BC%8C%E5%BE%97%E5%88%B0%E5%80%BC%20value%0A%20%20%20%20name%20%3D%20hmap%5B15937%5D%0A%20%20%20%20%0A%20%20%20%20%23%20%E5%88%A0%E9%99%A4%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%9C%A8%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E5%88%A0%E9%99%A4%E9%94%AE%E5%80%BC%E5%AF%B9%20%28key,%20value%29%0A%20%20%20%20hmap.pop%2810583%29&cumulative=false&curInstr=2&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false
There are three common ways to traverse a hash table: traversing key-value pairs, traversing keys, and traversing values. Here is an example code:
There are three common ways to traverse a hash table: traversing key-value pairs, traversing keys, and traversing values. Example code is as follows:
=== "Python"
@@ -428,17 +468,17 @@ There are three common ways to traverse a hash table: traversing key-value pairs
/* Traverse hash table */
// Traverse key-value pairs Key->Value
map.forEach((key, value) {
print('$key -> $value');
print('$key -> $value');
});
// Traverse keys only Key
// Traverse keys only
map.keys.forEach((key) {
print(key);
print(key);
});
// Traverse values only Value
// Traverse values only
map.values.forEach((value) {
print(value);
print(value);
});
```
@@ -451,12 +491,12 @@ There are three common ways to traverse a hash table: traversing key-value pairs
println!("{key} -> {value}");
}
// Traverse keys only Key
// Traverse keys only
for key in map.keys() {
println!("{key}");
println!("{key}");
}
// Traverse values only Value
// Traverse values only
for value in map.values() {
println!("{value}");
}
@@ -471,41 +511,67 @@ There are three common ways to traverse a hash table: traversing key-value pairs
=== "Kotlin"
```kotlin title="hash_map.kt"
/* Traverse hash table */
// Traverse key-value pairs key->value
for ((key, value) in map) {
println("$key -> $value")
}
// Traverse keys only
for (key in map.keys) {
println(key)
}
// Traverse values only
for (_val in map.values) {
println(_val)
}
```
=== "Ruby"
```ruby title="hash_map.rb"
# Traverse hash table
# Traverse key-value pairs key->value
hmap.entries.each { |key, value| puts "#{key} -> #{value}" }
# Traverse keys only
hmap.keys.each { |key| puts key }
# Traverse values only
hmap.values.each { |val| puts val }
```
=== "Zig"
```zig title="hash_map.zig"
// Zig example is not provided
```
??? pythontutor "Code Visualization"
??? pythontutor "Visualized Execution"
https://pythontutor.com/render.html#code=%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%93%88%E5%B8%8C%E8%A1%A8%0A%20%20%20%20hmap%20%3D%20%7B%7D%0A%20%20%20%20%0A%20%20%20%20%23%20%E6%B7%BB%E5%8A%A0%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%9C%A8%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E6%B7%BB%E5%8A%A0%E9%94%AE%E5%80%BC%E5%AF%B9%20%28key,%20value%29%0A%20%20%20%20hmap%5B12836%5D%20%3D%20%22%E5%B0%8F%E5%93%88%22%0A%20%20%20%20hmap%5B15937%5D%20%3D%20%22%E5%B0%8F%E5%95%B0%22%0A%20%20%20%20hmap%5B16750%5D%20%3D%20%22%E5%B0%8F%E7%AE%97%22%0A%20%20%20%20hmap%5B13276%5D%20%3D%20%22%E5%B0%8F%E6%B3%95%22%0A%20%20%20%20hmap%5B10583%5D%20%3D%20%22%E5%B0%8F%E9%B8%AD%22%0A%20%20%20%20%0A%20%20%20%20%23%20%E9%81%8D%E5%8E%86%E5%93%88%E5%B8%8C%E8%A1%A8%0A%20%20%20%20%23%20%E9%81%8D%E5%8E%86%E9%94%AE%E5%80%BC%E5%AF%B9%20key-%3Evalue%0A%20%20%20%20for%20key,%20value%20in%20hmap.items%28%29%3A%0A%20%20%20%20%20%20%20%20print%28key,%20%22-%3E%22,%20value%29%0A%20%20%20%20%23%20%E5%8D%95%E7%8B%AC%E9%81%8D%E5%8E%86%E9%94%AE%20key%0A%20%20%20%20for%20key%20in%20hmap.keys%28%29%3A%0A%20%20%20%20%20%20%20%20print%28key%29%0A%20%20%20%20%23%20%E5%8D%95%E7%8B%AC%E9%81%8D%E5%8E%86%E5%80%BC%20value%0A%20%20%20%20for%20value%20in%20hmap.values%28%29%3A%0A%20%20%20%20%20%20%20%20print%28value%29&cumulative=false&curInstr=8&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false
## Simple implementation of a hash table
## Simple hash table implementation
First, let's consider the simplest case: **implementing a hash table using only one array**. In the hash table, each empty slot in the array is called a <u>bucket</u>, and each bucket can store a key-value pair. Therefore, the query operation involves finding the bucket corresponding to the `key` and retrieving the `value` from it.
Let's first consider the simplest case: **implementing a hash table using only an array**. In a hash table, each empty position in the array is called a <u>bucket</u>, and each bucket can store a key-value pair. Therefore, the query operation is to find the bucket corresponding to `key` and retrieve the `value` from the bucket.
So, how do we locate the corresponding bucket based on the `key`? This is achieved through a <u>hash function</u>. The role of the hash function is to map a larger input space to a smaller output space. In a hash table, the input space consists of all the keys, and the output space consists of all the buckets (array indices). In other words, given a `key`, **we can use the hash function to determine the storage location of the corresponding key-value pair in the array**.
So how do we locate the corresponding bucket based on `key`? This is achieved through a <u>hash function</u>. The role of the hash function is to map a larger input space to a smaller output space. In a hash table, the input space is all `key`s, and the output space is all buckets (array indices). In other words, given a `key`, **we can use the hash function to obtain the storage location of the key-value pair corresponding to that `key` in the array**.
With a given `key`, the calculation of the hash function consists of two steps:
When inputting a `key`, the hash function's calculation process consists of the following two steps:
1. Calculate the hash value by using a certain hash algorithm `hash()`.
2. Take the modulus of the hash value with the bucket count (array length) `capacity` to obtain the array `index` corresponding to the key.
1. Calculate the hash value through a hash algorithm `hash()`.
2. Take the modulo of the hash value by the number of buckets (array length) `capacity` to obtain the bucket (array index) `index` corresponding to that `key`.
```shell
index = hash(key) % capacity
```
Afterward, we can use the `index` to access the corresponding bucket in the hash table and thereby retrieve the `value`.
Subsequently, we can use `index` to access the corresponding bucket in the hash table and retrieve the `value`.
Let's assume that the array length is `capacity = 100`, and the hash algorithm is defined as `hash(key) = key`. Therefore, the hash function can be expressed as `key % 100`. The following figure illustrates the working principle of the hash function using `key` as student ID and `value` as name.
Assuming the array length is `capacity = 100` and the hash algorithm is `hash(key) = key`, the hash function becomes `key % 100`. The figure below shows the working principle of the hash function using `key` as student ID and `value` as name.
![Working principle of hash function](hash_map.assets/hash_function.png)
The following code implements a simple hash table. Here, we encapsulate `key` and `value` into a class `Pair` to represent the key-value pair.
The following code implements a simple hash table. Here, we encapsulate `key` and `value` into a class `Pair` to represent a key-value pair.
```src
[file]{array_hash_map}-[class]{array_hash_map}-[func]{}
@@ -513,25 +579,25 @@ The following code implements a simple hash table. Here, we encapsulate `key` an
## Hash collision and resizing
Essentially, the role of the hash function is to map the entire input space of all keys to the output space of all array indices. However, the input space is often much larger than the output space. Therefore, **theoretically, there will always be cases where "multiple inputs correspond to the same output"**.
Fundamentally, the role of a hash function is to map the input space consisting of all `key`s to the output space consisting of all array indices, and the input space is often much larger than the output space. Therefore, **theoretically there must be cases where "multiple inputs correspond to the same output"**.
In the example above, with the given hash function, when the last two digits of the input `key` are the same, the hash function produces the same output. For instance, when querying two students with student IDs 12836 and 20336, we find:
For the hash function in the above example, when the input `key`s have the same last two digits, the hash function produces the same output. For example, when querying two students with IDs 12836 and 20336, we get:
```shell
12836 % 100 = 36
20336 % 100 = 36
```
As shown in the figure below, both student IDs point to the same name, which is obviously incorrect. This situation where multiple inputs correspond to the same output is called <u>hash collision</u>.
As shown in the figure below, two student IDs point to the same name, which is obviously incorrect. We call this situation where multiple inputs correspond to the same output a <u>hash collision</u>.
![Example of hash collision](hash_map.assets/hash_collision.png)
![Hash collision example](hash_map.assets/hash_collision.png)
It is easy to understand that as the capacity $n$ of the hash table increases, the probability of multiple keys being assigned to the same bucket decreases, resulting in fewer collisions. Therefore, **we can reduce hash collisions by resizing the hash table**.
It's easy to see that the larger the hash table capacity $n$, the lower the probability that multiple `key`s will be assigned to the same bucket, and the fewer collisions. Therefore, **we can reduce hash collisions by expanding the hash table**.
As shown in the figure below, before resizing, the key-value pairs `(136, A)` and `(236, D)` collide. However, after resizing, the collision is resolved.
As shown in the figure below, before expansion, the key-value pairs `(136, A)` and `(236, D)` collided, but after expansion, the collision disappears.
![Hash table resizing](hash_map.assets/hash_table_reshash.png)
Similar to array expansion, resizing a hash table requires migrating all key-value pairs from the original hash table to the new one, which is time-consuming. Furthermore, since the `capacity` of the hash table changes, we need to recalculate the storage positions of all key-value pairs using the hash function, further increasing the computational overhead of the resizing process. Therefore, programming languages often allocate a sufficiently large capacity for the hash table to prevent frequent resizing.
Similar to array expansion, hash table expansion requires migrating all key-value pairs from the original hash table to the new hash table, which is very time-consuming. Moreover, since the hash table capacity `capacity` changes, we need to recalculate the storage locations of all key-value pairs through the hash function, further increasing the computational overhead of the expansion process. For this reason, programming languages typically reserve a sufficiently large hash table capacity to prevent frequent expansion.
The <u>load factor</u> is an important concept in hash tables. It is defined as the ratio of the number of elements in the hash table to the number of buckets. It is used to measure the severity of hash collisions and **often serves as a trigger for hash table resizing**. For example, in Java, when the load factor exceeds $0.75$, the system will resize the hash table to twice its original size.
The <u>load factor</u> is an important concept for hash tables. It is defined as the number of elements in the hash table divided by the number of buckets, and is used to measure the severity of hash collisions. **It is also commonly used as a trigger condition for hash table expansion**. For example, in Java, when the load factor exceeds $0.75$, the system will expand the hash table to $2$ times its original size.

View File

@@ -1,9 +1,9 @@
# Hash table
# Hashing
![Hash table](../assets/covers/chapter_hashing.jpg)
![Hashing](../assets/covers/chapter_hashing.jpg)
!!! abstract
In the world of computing, a hash table is akin to an intelligent librarian.
It understands how to compute index numbers, enabling swift retrieval of the desired book.
In the world of computing, a hash table is like a clever librarian.
They know how to calculate call numbers, enabling them to quickly locate the target book.

View File

@@ -6,13 +6,13 @@
- Common hash table operations include querying, adding key-value pairs, deleting key-value pairs, and traversing the hash table.
- The hash function maps a `key` to an array index, allowing access to the corresponding bucket and retrieval of the `value`.
- Two different keys may end up with the same array index after hashing, leading to erroneous query results. This phenomenon is known as hash collision.
- The larger the capacity of the hash table, the lower the probability of hash collisions. Therefore, hash table resizing can mitigate hash collisions. Similar to array resizing, hash table resizing is costly.
- The load factor, defined as the number of elements divided by the number of buckets, reflects the severity of hash collisions and is often used as a condition to trigger hash table resizing.
- Chaining addresses hash collisions by converting each element into a linked list, storing all colliding elements in the same list. However, excessively long lists can reduce query efficiency, which can be improved by converting the lists into red-black trees.
- Open addressing handles hash collisions through multiple probes. Linear probing uses a fixed step size but it cannot delete elements and is prone to clustering. Multiple hashing uses several hash functions for probing which reduces clustering compared to linear probing but increases computational overhead.
- Different programming languages adopt various hash table implementations. For example, Java's `HashMap` uses chaining, while Python's `dict` employs open addressing.
- The larger the capacity of the hash table, the lower the probability of hash collisions. Therefore, hash table expansion can mitigate hash collisions. Similar to array expansion, hash table expansion is costly.
- The load factor, defined as the number of elements divided by the number of buckets, reflects the severity of hash collisions and is often used as a condition to trigger hash table expansion.
- Separate chaining addresses hash collisions by converting each element into a linked list, storing all colliding elements in the same linked list. However, excessively long linked lists can reduce query efficiency, which can be improved by converting the linked lists into red-black trees.
- Open addressing handles hash collisions through multiple probing. Linear probing uses a fixed step size but cannot delete elements and is prone to clustering. Double hashing uses multiple hash functions for probing, which reduces clustering compared to linear probing but increases computational overhead.
- Different programming languages adopt various hash table implementations. For example, Java's `HashMap` uses separate chaining, while Python's `dict` employs open addressing.
- In hash tables, we desire hash algorithms with determinism, high efficiency, and uniform distribution. In cryptography, hash algorithms should also possess collision resistance and the avalanche effect.
- Hash algorithms typically use large prime numbers as moduli to ensure uniform distribution of hash values and reduce hash collisions.
- Hash algorithms typically use large prime numbers as moduli to maximize the uniform distribution of hash values and reduce hash collisions.
- Common hash algorithms include MD5, SHA-1, SHA-2, and SHA-3. MD5 is often used for file integrity checks, while SHA-2 is commonly used in secure applications and protocols.
- Programming languages usually provide built-in hash algorithms for data types to calculate bucket indices in hash tables. Generally, only immutable objects are hashable.
@@ -32,16 +32,16 @@ Firstly, hash tables have higher time efficiency but lower space efficiency. A s
Secondly, hash tables are only more time-efficient in specific use cases. If a feature can be implemented with the same time complexity using an array or a linked list, it's usually faster than using a hash table. This is because the computation of the hash function incurs overhead, making the constant factor in the time complexity larger.
Lastly, the time complexity of hash tables can degrade. For example, in chaining, we perform search operations in a linked list or red-black tree, which still risks degrading to $O(n)$ time.
Lastly, the time complexity of hash tables can degrade. For example, in separate chaining, we perform search operations in a linked list or red-black tree, which still risks degrading to $O(n)$ time.
**Q**: Does multiple hashing also have the flaw of not being able to delete elements directly? Can space marked as deleted be reused?
**Q**: Does double hashing also have the flaw of not being able to delete elements directly? Can space marked as deleted be reused?
Multiple hashing is a form of open addressing, and all open addressing methods have the drawback of not being able to delete elements directly; they require marking elements as deleted. Marked spaces can be reused. When inserting new elements into the hash table, and the hash function points to a position marked as deleted, that position can be used by the new element. This maintains the probing sequence of the hash table while ensuring efficient use of space.
Double hashing is a form of open addressing, and all open addressing methods have the drawback of not being able to delete elements directly; they require marking elements as deleted. Marked spaces can be reused. When inserting new elements into the hash table, and the hash function points to a position marked as deleted, that position can be used by the new element. This maintains the probing sequence of the hash table while ensuring efficient use of space.
**Q**: Why do hash collisions occur during the search process in linear probing?
During the search process, the hash function points to the corresponding bucket and key-value pair. If the `key` doesn't match, it indicates a hash collision. Therefore, linear probing will search downwards at a predetermined step size until the correct key-value pair is found or the search fails.
During the search process, the hash function points to the corresponding bucket and key-value pair. If the `key` doesn't match, it indicates a hash collision. Therefore, linear probing will search downward at a predetermined step size until the correct key-value pair is found or the search fails.
**Q**: Why can resizing a hash table alleviate hash collisions?
**Q**: Why can expanding a hash table alleviate hash collisions?
The last step of a hash function often involves taking the modulo of the array length $n$, to keep the output within the array index range. When resizing, the array length $n$ changes, and the indices corresponding to the keys may also change. Keys that were previously mapped to the same bucket might be distributed across multiple buckets after resizing, thereby mitigating hash collisions.
The last step of a hash function often involves taking the modulo of the array length $n$, to keep the output within the array index range. When expanding, the array length $n$ changes, and the indices corresponding to the keys may also change. Keys that were previously mapped to the same bucket might be distributed across multiple buckets after expansion, thereby mitigating hash collisions.