Hashcode/Equals

hashcode涉及到集合HashMap等集合，此篇侧重于了解hashcode和equals方法的作用的原理。有助于下一篇HashMap的理解。

一、Hash是什么

Hash是散列的意思，就是把任意长度的输入，通过散列算法变换成固定长度的输出，该输出就是散列值。这个玩意还可以做加密。

不同关键字经过散列算法变换后可能得到同一个散列地址，这种现象称为碰撞。
如果两个Hash值不同（前提是同一Hash算法），那么这两个Hash值对应的原始输入必定不同

二、什么是hashcode

HashCode的存在主要是为了查找的快捷性，HashCode是用来在散列存储结构中确定对象的存储地址的。

如果两个对象的HashCode相同，不代表两个对象就相同，只能说明这两个对象在散列存储结构中，存放于同一个位置。

如果对象的equals方法被重写，那么对象的HashCode方法也尽量重写。为什么呢？下文会说。

三、HashCode有什么用

比方说Set里面已经有1000个元素了，那么第1001个元素进来的时候，最多可能调用1000次equals方法，如果equals方法写得复杂，对比的东西特别多，那么效率会大大降低。

使用HashCode就不一样了，比方说HashSet，底层是基于HashMap实现的，先通过HashCode取一个模，这样一下子就固定到某个位置了，如果这个位置上没有元素，那么就可以肯定HashSet中必定没有和新添加的元素equals的元素，就可以直接存放了，都不需要比较；

如果这个位置上有元素了，逐一比较，比较的时候先比较HashCode，HashCode都不同接下去都不用比了，肯定不一样，HashCode相等，再equals比较，没有相同的元素就存，有相同的元素就不存。

如果原来的Set里面有相同的元素，只要HashCode的生成方式定义得好（不重复），不管Set里面原来有多少元素，只需要执行一次的equals就可以了。这样一来，实际调用equals方法的次数大大降低，提高了效率。

当俩个对象的`hashCode`值相同的时候，`Hashset`会将对象保存在同一个位置，但是他们`equals`返回`false`，所以实际上这个位置采用链式结构来保存多个对象。

四、为什么重写Object的equals()方法尽量要重写Object的hashCode()方法

面临问题：若两个对象equals相等，但由于不在一个区间，因为hashCode的值在重写之前是对内存地址计算得出，所以根本没有机会进行比较，会被认为是不同的对象(这就是为什么还要重写hashcode方法了)。所以Java对于eqauls方法和hashCode方法是这样规定的：

1 如果两个对象相同(equals为true)，那么它们的hashCode值一定要相同。也告诉我们重写equals方法，一定要重写hashCode方法，也就是说hashCode值要和类中的成员变量挂上钩，对象相同–>成员变量相同—->hashCode值一定相同。

2 如果两个对象的hashCode相同(只是映射到同一个位置而已)，它们并不一定相同，这里的对象相同指的是用eqauls方法比较。

简单来说，如果只重写equals方法而不重写hashcode方法，会导致重复元素的产生。具体通过下面的例子进行说明。

五、举例

6.1 Student类

很简单，定义了id和name两个字段，无参和有参构造函数，toString方法。

public class Student {
	private int id;
	private String name;
	
	get(),set()略...
	
	public Student(){}
	
	public Student(int id, String name) {
		super();
		this.id = id;
		this.name = name;
	}
	
	@Override
	public String toString() {
		return "Student [id=" + id + ", name=" + name + "]";
	}

}

6.2 main方法

public static void main(String[] args) {
	Student student1 = new Student(1,"hh");
	Student student2 = new Student(1,"hh");
	Student student3 = new Student(2,"gg");
	
	
	HashSet<Student> set = new HashSet<Student>(); 
	set.add(student1);
	set.add(student2);
	set.add(student3);
	set.add(student1);//重复添加了student1
	
	System.out.println("set集合容量为:  "+set.size());
	
	Iterator<Student> iterator = set.iterator();  
    while (iterator.hasNext()) {  
        Student student = iterator.next();  
        System.out.println(student+"---"+student.hashCode());  
    } 
}

执行结果：

set集合容量为:  3
Student [id=1, name=hh]---1735600054
Student [id=1, name=hh]---356573597
Student [id=2, name=gg]---21685669

我们可以看到，只要是new的对象，他们的hashcode是不一样的。所以，就会认为他们是不一样的对象。所以，集合里面数量为3.

6.3 只重写equals()方法，而不重写HashCode()方法

输出：

set集合容量为:  3
Student [id=2, name=gg]---2018699554
Student [id=1, name=hh]---366712642
Student [id=1, name=hh]---1829164700

结论：覆盖equals（Object obj）但不覆盖hashCode(),导致数据不唯一性。

在这里，其实我们可以看到，student1和student2其实是一个对象，但是由于都是new并且没有重写hashcode导致他们变成了两个不一样的对象。

分析：

（1）当执行set.add(student1)时，集合为空，直接存入集合；

（2）当执行set.add(student2)时，首先判断该对象（student2）的hashCode值所在的存储区域是否有相同的hashCode，因为没有覆盖hashCode方法，所以jdk使用默认Object的hashCode方法，返回内存地址转换后的整数，因为不同对象的地址值不同，所以这里不存在与student2相同hashCode值的对象，因此jdk默认不同hashCode值，equals一定返回false，所以直接存入集合。

（3）当执行set.add(student3)时,与2同理。

（4）当最后执行set.add(student1)时，因为student1已经存入集合，同一对象返回的hashCode值是一样的，继续判断equals是否返回true，因为是同一对象所以返回true。此时jdk认为该对象已经存在于集合中，所以舍弃。

6.4 只重写HashCode()方法，equals()方法直接返回false

set集合容量为:  3
Student [id=1, name=hh]---4320
Student [id=1, name=hh]---4320
Student [id=2, name=gg]---4319

按照上面的分析，可能会觉得里面应该装4个，因为两次add的student1，虽然他们的hashcode一样，但是equals直接返回false，那么应该判定为两个不同的对象。但是结果确跟我们预想的不一样。

分析：

首先student1和student2的对象比较hashCode，因为重写了HashCode方法，所以hashcode相等,然后比较他们两的equals方法，因为equals方法始终返回false,所以student1和student2也是不相等的，所以student2也被放进了set

首先student1(student2)和student3的对象比较hashCode，不相等，所以student3放进set中

最后再看最后重复添加的student1,与第一个student1的hashCode是相等的，在比较equals方法，因为equals返回false,所以student1和student4不相等;同样，student2和student4也是不相等的;student3和student4的hashcode都不相等，所以肯定不相等的，所以最后一个重复的student1应该可以放到set集合中，那么结果应该是size:4,那为什么会是3呢？

这时候我们就需要查看HashSet的源码了，下面是HashSet中的add方法的源码：

1
2
3

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

这里我们可以看到其实HashSet是基于HashMap实现的，我们在点击HashMap的put方法，源码如下：

1
2
3

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

首先是判断hashCode是否相等，不相等的话，直接跳过，相等的话，然后再来比较这两个对象是否相等或者这两个对象的equals方法，因为是进行的或操作，所以只要有一个成立即可，那这里我们就可以解释了，其实上面的那个集合的大小是3,因为最后的一个r1没有放进去，以为r1==r1返回true的，所以没有放进去了。所以集合的大小是3，如果我们将hashCode方法设置成始终返回false的话，这个集合就是4了。

6.5 同时重写

我的写法是：

@Override
public int hashCode() {
    int result = 17;
    result = result * 31 + name.hashCode();
    result = result * 31 + id;

    return result;
}

@Override
public boolean equals(Object obj) {
    if(obj == this)
        return true;
    if(!(obj instanceof Student))
        return false;

    Student o = (Student)obj;
    return o.name.equals(name) && o.id == id;
}

结果：

1
2
3

set集合容量为:  2
Student [id=2, name=gg]---118515
Student [id=1, name=hh]---119506

达到我们预期的效果。

六、内存泄露

我们上面实验了重写equals和hashcode方法，执行main，执行结果是：

1
2
3

set集合容量为:  2
Student [id=1, name=hh]---4320
Student [id=2, name=gg]---4319

将main方法改为：

public static void main(String[] args) {
	Student student1 = new Student(1,"hh");
	Student student2 = new Student(1,"hh");
	Student student3 = new Student(2,"gg");
	
	
	HashSet<Student> set = new HashSet<Student>(); 
	set.add(student1);
	set.add(student2);
	set.add(student3);
	set.add(student1);//重复添加了student1
	
	System.out.println("set集合容量为:  "+set.size());
	
	//------新增的开始-------
	student3.setId(11);
	
	set.remove(student3);
	
	System.out.println("set集合容量为:  "+set.size());
	
	//------新增的结束-------
	
	Iterator<Student> iterator = set.iterator();  
    while (iterator.hasNext()) {  
        Student student = iterator.next();  
        System.out.println(student+"---"+student.hashCode());  
    } 
}

运行结果是：

set集合容量为:  2
set集合容量为:  2
Student [id=1, name=hh]---4320
Student [id=11, name=gg]---4598

我们调用了remove删除student3对象，以为删除了student3,但事实上并没有删除，这就叫做内存泄露，就是不用的对象但是他还在内存中。所以我们多次这样操作之后，内存就爆了。

原因：

在调用remove方法的时候，会先使用对象的hashCode值去找到这个对象，然后进行删除，这种问题就是因为我们在修改了对象student3的id属性的值，又因为RectObject对象的hashCode方法中有id值参与运算,所以student3对象的hashCode就发生改变了，所以remove方法中并没有找到student3了，所以删除失败。即student3的hashCode变了，但是他存储的位置没有更新，仍然在原来的位置上，所以当我们用他的新的hashCode去找肯定是找不到了。

总结：

上面的这个内存泄露告诉我一个信息：如果我们将对象的属性值参与了hashCode的运算中，在进行删除的时候，就不能对其属性值进行修改，否则会出现严重的问题。

七、总结

hashCode是为了提高在散列结构存储中查找的效率，在线性表中没有作用。
equals和hashCode需要同时覆盖。
若两个对象equals返回true，则hashCode有必要也返回相同的int数。
若两个对象equals返回false，则hashCode不一定返回不同的int数,但为不相等的对象生成不同hashCode值可以提高哈希表的性能。
若两个对象hashCode返回相同int数，则equals不一定返回true。
同一对象在执行期间若已经存储在集合中，则不能修改影响hashCode值的相关信息，否则会导致内存泄露问题。

整理自：