談?wù)刯ava中幾種常見的散列算法及解決哈希碰撞的方式

沈建明發(fā)布于2019-08-16 10:53 / 2068人閱讀

摘要：接下來分析幾個(gè)常見的實(shí)現(xiàn)方式。再哈希法再哈希法，就是出現(xiàn)沖突后采用其他的哈希函數(shù)計(jì)算，直到不再?zèng)_突為止。，其中為不同的哈希函數(shù)。

由表及里，循序漸進(jìn)，請(qǐng)往下看。隨手點(diǎn)贊是對(duì)作者最大的鼓勵(lì)！^0^。

什么是哈希表

引用：嚴(yán)蔚敏 《數(shù)據(jù)結(jié)構(gòu)（C語言版）》中的內(nèi)容

哈希表就是 依據(jù)關(guān)鍵字可以根據(jù)一定的算法（哈希函數(shù)）映射到表中的特定位置 的思想建立的表。因此哈希表最大的特點(diǎn)就是可以根據(jù)f(K)函數(shù)得到其在數(shù)組中的索引。

接下來來看看Java中Object對(duì)hashCode()方法的說明，當(dāng)然此方法和equals(Object obj)方法是相輔相成的。

Object類中的equals和hashCode方法（文章內(nèi)源碼均基于JDK8） equals方法官方文檔：

public boolean equals(Object obj)
Indicates whether some other object is "equal to" this one. 
The equals method implements an equivalence relation on non-null
object references: 
· It is reflexive: for any non-null reference value x, x.equals(x) should return true.  
· It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.  
· It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
· It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.  
· For any non-null reference value x, x.equals(null) should return false.  
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true). 
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
Parameters: 
obj the reference object with which to compare. 
Returns:
true if this object is the same as the obj argument; false otherwise.
See Also: 
hashCode(),java.util.HashMap

在官方說明中，指明了equals方法具有自反性、對(duì)稱性、傳遞性、一致性，同時(shí)也提醒在在繼承Object的時(shí)候，如果要重寫hashCode方法，通常都需要重寫該方法，因?yàn)閔ashCode要求（下面也有提及）：如果兩個(gè)對(duì)象執(zhí)行equals方法結(jié)果為true，則兩對(duì)象的哈希碼應(yīng)該是相等的。

hashCode方法官方文檔：

public native int hashCode();
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by java.util.HashMap. 
The general contract of hashCode is: 
· Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application. 
· If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. 
· It is not required that if two objects are unequal according to the java.lang.Object.equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables. 
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java? programming language.)
Returns:
a hash code value for this object.
See Also:
java.lang.Object.equals(java.lang.Object),java.lang.System.identityHashCode

該方法返回對(duì)象的經(jīng)過處理后的內(nèi)存地址，由于每個(gè)對(duì)象的內(nèi)存地址都不一樣，所以哈希碼也不一樣。此方法為native方法，取決于JVM的內(nèi)部設(shè)計(jì)，一般是某種C地址的偏移。

文檔中給出了三條規(guī)定：

在對(duì)象沒有被修改的前提下，執(zhí)行多次調(diào)用，該hashCode方法必須始終返回相同的整數(shù)。

如果兩個(gè)對(duì)象執(zhí)行equals方法結(jié)果為true，則分別調(diào)用hashCode方法產(chǎn)生的整數(shù)結(jié)果是相等的。

非必要要求：兩個(gè)對(duì)象執(zhí)行equals方法結(jié)果為false，則分別調(diào)用hashCode方法產(chǎn)生的整數(shù)結(jié)果是不相等的。

第三個(gè)要求雖然為非必需，但如果實(shí)現(xiàn)，則可以提高散列表的性能。

接下來分析幾個(gè)常見的實(shí)現(xiàn)方式。

String的equals和hashCode方法

hashCode方法源碼：

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

該函數(shù)很簡(jiǎn)單，以31為權(quán)，每一位為字符的ASCII值進(jìn)行運(yùn)算，用自然溢出來等效取模，達(dá)到了目的——只要字符串的內(nèi)容相同，返回的哈希碼也相同。但是乘子31在此需要解釋一下。選31作為乘子，是因?yàn)椋?/p>

31是一個(gè)奇質(zhì)數(shù)，如果選擇一個(gè)偶數(shù)會(huì)在乘法運(yùn)算中產(chǎn)生溢出，導(dǎo)致數(shù)值信息丟失，因?yàn)槌硕喈?dāng)于移位運(yùn)算。選擇質(zhì)數(shù)的優(yōu)勢(shì)并不是特別的明顯，但這是一個(gè)傳統(tǒng)。

31可以被JVM優(yōu)化：31 * i = (i << 5) - i。

equals方法源碼：

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

此equals方法包含了"=="，雙等號(hào)比較的是地址，存儲(chǔ)地址相同，內(nèi)容則相同。當(dāng)?shù)刂凡煌臅r(shí)候，先驗(yàn)證了比較對(duì)象是否為String，接著比較了兩個(gè)字符串的長度，最后才循環(huán)比較每個(gè)字符是否相等。

Integer的equals和hashCode方法

hashCode方法源碼：

    @Override
    public int hashCode() {
        return Integer.hashCode(value);
    }
    public static int hashCode(int value) {
        return value;
    }

equals方法源碼：

     public boolean equals(Object obj) {
        if (obj instanceof Integer) {
            return value == ((Integer)obj).intValue();
        }
        return false;
    }

由此可見，Integer哈希碼就是Integer對(duì)象里所包含的那個(gè)整數(shù)的數(shù)值，且equals方法比較的也是兩者的整數(shù)數(shù)值，即兩個(gè)數(shù)值大小的Integer對(duì)象，計(jì)算出的哈希碼是相等的。

最后，像int，char這樣的基礎(chǔ)類，它們不需要hashCode，如果需要存儲(chǔ)時(shí)，將進(jìn)行自動(dòng)裝箱操作，計(jì)算方法同Integer。

哈希碰撞（hash沖突）

在計(jì)算hash地址的過程中會(huì)出現(xiàn)對(duì)于不同的關(guān)鍵字出現(xiàn)相同的哈希地址的情況，即key1 ≠ key2，但是f(key1) = f(key2)，這種情況就是Hash 沖突。具有相同關(guān)鍵字的key1和key2稱之為同義詞。
通過優(yōu)化哈希函數(shù)可以減少這種沖突的情況（如：均衡哈希函數(shù)），但是在通用條件下，考慮到于表格的長度有限及關(guān)鍵值（數(shù)據(jù)）的無限，這種沖突是不可避免的，所以就需要處理沖突。

沖突處理

沖突處理分為以下四種方式：

開放地址

再哈希

鏈地址

建立公共溢出區(qū)

其中開放地址又分為：

線性探測(cè)再散列
二次探測(cè)再散列
偽隨機(jī)探測(cè)再散列

下面談?wù)剮追N方法的原理：

開放地址

開放地址法處理沖突的基本原則就是出現(xiàn)沖突后按照一定算法查找一個(gè)空位置存放。公式：

Hi為計(jì)算出的地址，H(key)為哈希函數(shù)，di為增量。其中di的三種獲取方式既是上面提到的開放地址法的三種分類（線性探測(cè)再散列、二次探測(cè)再散列、偽隨機(jī)探測(cè)再散列）。

線性探測(cè)再散列

，即依次向后查找。

二次探測(cè)再散列
，即依次向前后查找，增量為1、2、3的二次方。

偽隨機(jī)探測(cè)再散列
偽隨機(jī)，顧名思義就是隨機(jī)產(chǎn)生一個(gè)增量位移。

再哈希法

再哈希法，就是出現(xiàn)沖突后采用其他的哈希函數(shù)計(jì)算，直到不再?zèng)_突為止。

，其中RHi為不同的哈希函數(shù)。

鏈地址法

鏈接地址法不同與前兩種方法，他是在出現(xiàn)沖突的地方存儲(chǔ)一個(gè)鏈表，所有的同義詞記錄都存在其中。形象點(diǎn)說就行像是在出現(xiàn)沖突的地方直接把后續(xù)的值摞上去。例如HashMap，如下圖。

建立公共溢出區(qū)

建立公共溢出區(qū)的基本思想是：假設(shè)哈希函數(shù)的值域是[1,m-1]，則設(shè)向量HashTable[0...m-1]為基本表，每個(gè)分量存放一個(gè)記錄，另外設(shè)向量OverTable[0...v]為溢出表，所有關(guān)鍵字和基本表中關(guān)鍵字為同義詞的記錄，不管它們由哈希函數(shù)得到的哈希地址是什么，一旦發(fā)生沖突，都填入溢出表。