Revisit the English version (#1835)

* Review the English version using Claude-4.5.

* Update mkdocs.yml

* Align the section titles.

* Bug fixes
This commit is contained in:
Yudong Jin
2025-12-30 17:54:01 +08:00
committed by GitHub
parent 091afd38b4
commit 45e1295241
106 changed files with 4195 additions and 3398 deletions

View File

@@ -1,12 +1,12 @@
# Hash algorithms
# Hash algorithm
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and chaining can **only ensure that the hash table functions normally when collisions occur, but cannot reduce the frequency of hash collisions**.
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and separate chaining **can only ensure that the hash table functions normally when hash collisions occur, but cannot reduce the frequency of hash collisions**.
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a separate chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
![Ideal and worst cases of hash collisions](hash_algorithm.assets/hash_collision_best_worst_condition.png)
**The distribution of key-value pairs is determined by the hash function**. Recalling the steps of calculating a hash function, first compute the hash value, then modulo it by the array length:
**The distribution of key-value pairs is determined by the hash function**. Recalling the calculation steps of the hash function, first compute the hash value, then take the modulo by the array length:
```shell
index = hash(key) % capacity
@@ -35,7 +35,7 @@ For cryptographic applications, to prevent reverse engineering such as deducing
- **Collision resistance**: It should be extremely difficult to find two different inputs that produce the same hash value.
- **Avalanche effect**: Minor changes in the input should lead to significant and unpredictable changes in the output.
Note that **"Uniform Distribution" and "Collision Resistance" are two separate concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
Note that **"uniform distribution" and "collision resistance" are two independent concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
## Design of hash algorithms
@@ -346,7 +346,57 @@ We know that the keys in a hash table can be of various data types such as integ
=== "Kotlin"
```kotlin title="built_in_hash.kt"
val num = 3
val hashNum = num.hashCode()
// Hash value of integer 3 is 3
val bol = true
val hashBol = bol.hashCode()
// Hash value of boolean true is 1231
val dec = 3.14159
val hashDec = dec.hashCode()
// Hash value of decimal 3.14159 is -1340954729
val str = "Hello 算法"
val hashStr = str.hashCode()
// Hash value of string "Hello 算法" is -727081396
val arr = arrayOf<Any>(12836, "小哈")
val hashTup = arr.hashCode()
// Hash value of array [12836, 小哈] is 189568618
val obj = ListNode(0)
val hashObj = obj.hashCode()
// Hash value of ListNode object utils.ListNode@1d81eb93 is 495053715
```
=== "Ruby"
```ruby title="built_in_hash.rb"
num = 3
hash_num = num.hash
# Hash value of integer 3 is -4385856518450339636
bol = true
hash_bol = bol.hash
# Hash value of boolean true is -1617938112149317027
dec = 3.14159
hash_dec = dec.hash
# Hash value of decimal 3.14159 is -1479186995943067893
str = "Hello 算法"
hash_str = str.hash
# Hash value of string "Hello 算法" is -4075943250025831763
tup = [12836, '小哈']
hash_tup = tup.hash
# Hash value of tuple (12836, '小哈') is 1999544809202288822
obj = ListNode.new(0)
hash_obj = obj.hash
# Hash value of ListNode object #<ListNode:0x000078133140ab70> is 4302940560806366381
```
=== "Zig"
@@ -355,7 +405,7 @@ We know that the keys in a hash table can be of various data types such as integ
```
??? pythontutor "Code Visualization"
??? pythontutor "Visualized Execution"
https://pythontutor.com/render.html#code=class%20ListNode%3A%0A%20%20%20%20%22%22%22%E9%93%BE%E8%A1%A8%E8%8A%82%E7%82%B9%E7%B1%BB%22%22%22%0A%20%20%20%20def%20__init__%28self,%20val%3A%20int%29%3A%0A%20%20%20%20%20%20%20%20self.val%3A%20int%20%3D%20val%20%20%23%20%E8%8A%82%E7%82%B9%E5%80%BC%0A%20%20%20%20%20%20%20%20self.next%3A%20ListNode%20%7C%20None%20%3D%20None%20%20%23%20%E5%90%8E%E7%BB%A7%E8%8A%82%E7%82%B9%E5%BC%95%E7%94%A8%0A%0A%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20num%20%3D%203%0A%20%20%20%20hash_num%20%3D%20hash%28num%29%0A%20%20%20%20%23%20%E6%95%B4%E6%95%B0%203%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%203%0A%0A%20%20%20%20bol%20%3D%20True%0A%20%20%20%20hash_bol%20%3D%20hash%28bol%29%0A%20%20%20%20%23%20%E5%B8%83%E5%B0%94%E9%87%8F%20True%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%201%0A%0A%20%20%20%20dec%20%3D%203.14159%0A%20%20%20%20hash_dec%20%3D%20hash%28dec%29%0A%20%20%20%20%23%20%E5%B0%8F%E6%95%B0%203.14159%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%20326484311674566659%0A%0A%20%20%20%20str%20%3D%20%22Hello%20%E7%AE%97%E6%B3%95%22%0A%20%20%20%20hash_str%20%3D%20hash%28str%29%0A%20%20%20%20%23%20%E5%AD%97%E7%AC%A6%E4%B8%B2%E2%80%9CHello%20%E7%AE%97%E6%B3%95%E2%80%9D%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%204617003410720528961%0A%0A%20%20%20%20tup%20%3D%20%2812836,%20%22%E5%B0%8F%E5%93%88%22%29%0A%20%20%20%20hash_tup%20%3D%20hash%28tup%29%0A%20%20%20%20%23%20%E5%85%83%E7%BB%84%20%2812836,%20'%E5%B0%8F%E5%93%88'%29%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%201029005403108185979%0A%0A%20%20%20%20obj%20%3D%20ListNode%280%29%0A%20%20%20%20hash_obj%20%3D%20hash%28obj%29%0A%20%20%20%20%23%20%E8%8A%82%E7%82%B9%E5%AF%B9%E8%B1%A1%20%3CListNode%20object%20at%200x1058fd810%3E%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%20274267521&cumulative=false&curInstr=19&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false