HashMap 源码分析

JDK 1.8

源码注释

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

基于 Hash table 的 Map 接口的实现。HashMap 类基本上与 HashTable 相同,除了 HashMap 是异步的、允许 null 值。这个类不对 map 的顺序做保证,特别是不保证顺序会随着时间的改变而保持不变。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the “capacity” of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

这个实现提供基本操作( get 和 put )的常数时间,假设 hash 函数将元素合适地分配到 bucket 中。遍历集合需要和 HashMap 实例的容量( capacity )( bucket 的数量)+ TA 的大小( size )( key-value 映射的数量)成比例的时间。因此,如果遍历的性能很重要,不要设置太大的 capacity (或太小的 load factor)。

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

一个 HashMap 实例有两个参数影响 TA 的性能:初始容量(initial capavity)和负载因子(load factor)。initial capacity 是 hash table 创建时的 capacity。load factor 是在 capacity 自动增长前,允许 hash table 多满的一个量度。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

作为一个普遍的规则,默认的 load factor (0.75)提供了一个在时间和空间消耗上的好的权衡。更高的值减少了空间的多余,但增加了查找消耗(反映在 HashMap 类的大多数操作上,包括 get 和 put)。期望的 map 中的 entry 的数量和 TA 的 load factor 应该在设置 initial capacity 纳入考虑,以尽可能减少重新 hash 操作的数量。如果 initial capacity 比被 load factor 分出的 entry 的最大数量还大,就不会发生 rehash 操作。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

如果很多映射要被存到一个 HashMap 实例中,用足够大的 capacity 将会让映射存得更高效,与让 HashMap 自动按需要 rehash 来增大 table 相比。注意:使用 hashCode() 相同的 key 一定会降低性能。为减少影响,当 key 是 Comparable 的,HashMap 可能会使用 key 之间的比较顺序来打破束缚。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be “wrapped” using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

1
Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class’s “collection view methods” are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

This class is a member of the Java Collections Framework.

field

serialVersionUID

序列化用。

DEFAULT_INITIAL_CAPACITY

默认初始容量 16。必须是 2 的幂。

MAXIMUM_CAPACITY

最大容量 1 << 30。

DEFAULT_LOAD_FACTOR

默认 load factor 0.75。

TREEIFY_THRESHOLD

红黑树化的阈值 8。必须比 2 大,应该至少是 8,以便与 UNTREEIFY_THRESHOLD 配合。

UNTREEIFY_THRESHOLD

红黑树变回链表的阈值 6。应该比 UNTREEIFY_THRESHOLD,必须至少是 6。

MIN_TREEIFY_CAPACITY

最小的树化的 capacity 64。至少 4 * TREEIFY_THRESHOLD 以避免 resize 时与 treeification threshold 的冲突。

table

桶。存放链表的头节点或树的根。

size

map 中 key-value 映射的数量。

modCount

被结构性修改的次数。

threshold

下一次扩容的阈值。 threshold = capacity * load factor。

loadFactor

负载因子。

构造函数

1
2
3
4
5
6
7
8
// 指定 initialCapacity 用于计算 threshold,指定 load factor
public HashMap(int initialCapacity, float loadFactor)
// 指定 initialCapacity,使用默认 load factor,调用上面一个方法
public HashMap(int initialCapacity) {
// 使用默认 load factor
public HashMap() {
// 根据一个已有的 map 创建,使用默认 load factor
public HashMap(Map<? extends K, ? extends V> m) {

put

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
* Implements Map.put and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// table 还未创建时
if ((tab = table) == null || (n = tab.length) == 0)
// 通过 resize() 创建 table,并将长度赋值给 n
n = (tab = resize()).length;
// 根据 i = (n - 1) & hash 得到位置,并赋值给 p
// 如果位置上是 null
if ((p = tab[i = (n - 1) & hash]) == null)
// 就直接创建一个新的结点,并赋值给该位置
tab[i] = newNode(hash, key, value, null);
// 如果数组位置上已经有元素
else {
Node<K,V> e; K k;
// 如果 key 是和已有元素的 key 相同,先把 Node 记录在 e 中
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
// 如果 p 是 TreeNode
else if (p instanceof TreeNode)
// 新建 TreeNode 并记录在 e 中
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// 如果 key 不同
else {
// 在链表上遍历
for (int binCount = 0; ; ++binCount) {
// 找到链表的结束
if ((e = p.next) == null) {
// 新建链表 Node 并连上去
p.next = newNode(hash, key, value, null);
// 如果在遍历的过程中,链表的长度达到了需要转换成红黑树的阈值时
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
// 转换成红黑树
treeifyBin(tab, hash);
break;
}
// 在遍历的过程中,找到了相同 key 的元素
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
// 根据 onlyIfAbsent 是否覆盖
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
// 增加 modCount
++modCount;
// 根据 size 是否扩容
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}

get

以 equals 为标准。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
* Implements Map.get and related methods.
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}

基本思想是先根据 hash 找到对应的桶,接着检查第一个节点(first)的 key,如果 equals 就返回。否则,判断节点类型,如果是 TreeNode,就调用 first.getTreeNode(hash, key) 来查找;如果是链表,就向下遍历找到 equals 的。否则没找到返回 null。

resize

因为我们使用 2 的幂来扩容,每个 bin 里的元素或者待在相同的索引,或者移动到新 table 的索引的 2 的幂偏移的位置。

新 capacity 是旧 capacity 的两倍。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
// table 已经初始化
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// 如果没超过最大容量
// newCap 变成两倍
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
// 下一次 resize 的阈值也设为两倍
newThr = oldThr << 1; // double threshold
}
// threshold > 0,且桶数组未被初始化
// 调用 HashMap(int) 和 HashMap(int, float) 构造方法时
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
// 桶数组未被初始化,且 threshold 为 0
// 调用 HashMap() 构造方法
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 第一个条件分支未计算 newThr 或嵌套分支在计算过程中导致 newThr 溢出归零
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
// 创建新的 table
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
// 拷贝旧的 table
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
// 如果链表只有一个节点,直接计算 hash 作为头节点放入对应的桶中
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// 根据 e.hash & oldCap 是否等于 0 将原链表分为两部分,即分成两个链表
// 在 j 桶的位置的部分
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// 在 j + oldCap 的部分
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
// 将链表头节点放入对应的桶
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

split

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/**
* Splits nodes in a tree bin into lower and upper tree bins,
* or untreeifies if now too small. Called only from resize;
* see above discussion about split bits and indices.
*
* @param map the map
* @param tab the table for recording bin heads
* @param index the index of the table being split
* @param bit the bit of hash to split on
*/
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
TreeNode<K,V> b = this;
// Relink into lo and hi lists, preserving order
// 将节点分为两个 list
TreeNode<K,V> loHead = null, loTail = null;
TreeNode<K,V> hiHead = null, hiTail = null;
int lc = 0, hc = 0;
for (TreeNode<K,V> e = b, next; e != null; e = next) {
next = (TreeNode<K,V>)e.next;
e.next = null;
// (e.hash & bit) == 0 划分根据
// bit 是 oldCap
if ((e.hash & bit) == 0) {
if ((e.prev = loTail) == null)
loHead = e;
else
loTail.next = e;
loTail = e;
// 记录节点数量
++lc;
}
else {
if ((e.prev = hiTail) == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
++hc;
}
}

if (loHead != null) {
// 根据节点数量是否 treeify
if (lc <= UNTREEIFY_THRESHOLD)
tab[index] = loHead.untreeify(map);
else {
tab[index] = loHead;
if (hiHead != null) // (else is already treeified)
loHead.treeify(tab);
// else hiHead == null 时,表明扩容后,
// 所有节点仍在原位置,树结构不变,无需重新树化
}
}
if (hiHead != null) {
if (hc <= UNTREEIFY_THRESHOLD)
tab[index + bit] = hiHead.untreeify(map);
else {
tab[index + bit] = hiHead;
if (loHead != null)
hiHead.treeify(tab);
}
}
}

treeifyBin

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/**
* Replaces all linked nodes in bin at index for given hash unless
* table is too small, in which case resizes instead.
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
// 当 capacity < MIN_TREEIFY_CAPACITY 时,进行 resize
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
do {
// 将普通节点替换成树形节点
TreeNode<K,V> p = replacementTreeNode(e, null);
// 将普通链表转成由树形节点链表
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);
if ((tab[index] = hd) != null)
// 将树形链表转换成红黑树
hd.treeify(tab);
}
}

问题

被 transient 所修饰 table 变量

如果大家细心阅读 HashMap 的源码,会发现桶数组 table 被申明为 transient。transient 表示易变的意思,在 Java 中,被该关键字修饰的变量不会被默认的序列化机制序列化。我们再回到源码中,考虑一个问题:桶数组 table 是 HashMap 底层重要的数据结构,不序列化的话,别人还怎么还原呢?

这里简单说明一下吧,HashMap 并没有使用默认的序列化机制,而是通过实现readObject/writeObject两个方法自定义了序列化的内容。这样做是有原因的,试问一句,HashMap 中存储的内容是什么?不用说,大家也知道是键值对。所以只要我们把键值对序列化了,我们就可以根据键值对数据重建 HashMap。有的朋友可能会想,序列化 table 不是可以一步到位,后面直接还原不就行了吗?这样一想,倒也是合理。但序列化 talbe 存在着两个问题:

  1. table 多数情况下是无法被存满的,序列化未使用的部分,浪费空间
  2. 同一个键值对在不同 JVM 下,所处的桶位置可能是不同的,在不同的 JVM 下反序列化 table 可能会发生错误。

以上两个问题中,第一个问题比较好理解,第二个问题解释一下。HashMap 的get/put/remove等方法第一步就是根据 hash 找到键所在的桶位置,但如果键没有覆写 hashCode 方法,计算 hash 时最终调用 Object 中的 hashCode 方法。但 Object 中的 hashCode 方法是 native 型的,不同的 JVM 下,可能会有不同的实现,产生的 hash 可能也是不一样的。也就是说同一个键在不同平台下可能会产生不同的 hash,此时再对在同一个 table 继续操作,就会出现问题。

综上所述,大家应该能明白 HashMap 不序列化 table 的原因了。

HashMap 链表插入方式 → 头插为何改成尾插 ?

HashMap 是有序的吗?

HashMap 是无序的,LinkedHashMap 是有序的。

参考

源码

HashMap 源码详细分析(JDK1.8) | 田小波的技术博客

HashMap源码分析 · Leo’s Studio

HashMap 链表插入方式 → 头插为何改成尾插 ? - 青石路 - 博客园