arraylist
是平時相當常用的list實現, 其中boolean add(e e)
的實現比較直接:
1
2
3
4
5
6
7
8
9
10
11
|
/** * appends the specified element to the end of this list. * * @param e element to be appended to this list * @return <tt>true</tt> (as specified by {@link collection#add}) */ public boolean add(e e) { ensurecapacityinternal(size + 1 ); // increments modcount!! elementdata[size++] = e; return true ; } |
有時候也使用 void add(int index, e element)
把元素插入到指定的index
上. 在jdk中的實現是:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
/** * inserts the specified element at the specified position in this * list. shifts the element currently at that position (if any) and * any subsequent elements to the right (adds one to their indices). * * @param index index at which the specified element is to be inserted * @param element element to be inserted * @throws indexoutofboundsexception {@inheritdoc} */ public void add( int index, e element) { rangecheckforadd(index); ensurecapacityinternal(size + 1 ); // increments modcount!! system.arraycopy(elementdata, index, elementdata, index + 1 , size - index); elementdata[index] = element; size++; } |
略有差別, 需要保證當前elementdata
數組容量夠用, 然后把從index
處一直到尾部的數組元素都向后挪一位. 最后把要插入的元素賦給數組的index
處.
一直以來, 我都認為 system.arraycopy
這個native方法, 它的c++實現是調用底層的memcpy
, 直接方便, 效率也沒問題.
但今天看了openjdk的源碼發現并非如此.
以openjdk8u60 為例, 在objarrayklass.cpp 中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
void objarrayklass::copy_array(arrayoop s, int src_pos, arrayoop d, int dst_pos, int length, traps) { assert (s->is_objarray(), "must be obj array" ); if (!d->is_objarray()) { throw (vmsymbols::java_lang_arraystoreexception()); } // check is all offsets and lengths are non negative if (src_pos < 0 || dst_pos < 0 || length < 0 ) { throw (vmsymbols::java_lang_arrayindexoutofboundsexception()); } // check if the ranges are valid if ( (((unsigned int ) length + (unsigned int ) src_pos) > (unsigned int ) s->length()) || (((unsigned int ) length + (unsigned int ) dst_pos) > (unsigned int ) d->length()) ) { throw (vmsymbols::java_lang_arrayindexoutofboundsexception()); } // special case. boundary cases must be checked first // this allows the following call: copy_array(s, s.length(), d.length(), 0). // this is correct, since the position is supposed to be an 'in between point', i.e., s.length(), // points to the right of the last element. if (length== 0 ) { return ; } if (usecompressedoops) { narrowoop* const src = objarrayoop(s)->obj_at_addr<narrowoop>(src_pos); narrowoop* const dst = objarrayoop(d)->obj_at_addr<narrowoop>(dst_pos); do_copy<narrowoop>(s, src, d, dst, length, check); } else { oop* const src = objarrayoop(s)->obj_at_addr<oop>(src_pos); oop* const dst = objarrayoop(d)->obj_at_addr<oop>(dst_pos); do_copy<oop> (s, src, d, dst, length, check); } } |
可以看到copy_array在做了各種檢查之后, 最終copy的部分在do_copy方法中, 而這個方法實現如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
// either oop or narrowoop depending on usecompressedoops. template < class t> void objarrayklass::do_copy(arrayoop s, t* src, arrayoop d, t* dst, int length, traps) { barrierset* bs = universe::heap()->barrier_set(); // for performance reasons, we assume we are that the write barrier we // are using has optimized modes for arrays of references. at least one // of the asserts below will fail if this is not the case. assert (bs->has_write_ref_array_opt(), "barrier set must have ref array opt" ); assert (bs->has_write_ref_array_pre_opt(), "for pre-barrier as well." ); if (s == d) { // since source and destination are equal we do not need conversion checks. assert (length > 0 , "sanity check" ); bs->write_ref_array_pre(dst, length); copy::conjoint_oops_atomic(src, dst, length); } else { // we have to make sure all elements conform to the destination array klass* bound = objarrayklass::cast(d->klass())->element_klass(); klass* stype = objarrayklass::cast(s->klass())->element_klass(); if (stype == bound || stype->is_subtype_of(bound)) { // elements are guaranteed to be subtypes, so no check necessary bs->write_ref_array_pre(dst, length); copy::conjoint_oops_atomic(src, dst, length); } else { // slow case: need individual subtype checks // note: don't use obj_at_put below because it includes a redundant store check t* from = src; t* end = from + length; for (t* p = dst; from < end; from++, p++) { // xxx this is going to be slow. t element = *from; // even slower now bool element_is_null = oopdesc::is_null(element); oop new_val = element_is_null ? oop( null ) : oopdesc::decode_heap_oop_not_null(element); if (element_is_null || (new_val->klass())->is_subtype_of(bound)) { bs->write_ref_field_pre(p, new_val); *p = element; } else { // we must do a barrier to cover the partial copy. const size_t pd = pointer_delta(p, dst, (size_t)heapoopsize); // pointer delta is scaled to number of elements (length field in // objarrayoop) which we assume is 32 bit. assert (pd == (size_t)( int )pd, "length field overflow" ); bs->write_ref_array((heapword*)dst, pd); throw (vmsymbols::java_lang_arraystoreexception()); return ; } } } } bs->write_ref_array((heapword*)dst, length); } |
可以看到, 在設定了heap barrier之后, 元素是在for循環中被一個個挪動的. 做的工作比我想象的要多.
如果有m個元素, 按照給定位置, 使用arraylist.add(int,e)逐個插入到一個長度為n的arraylist中, 復雜度應當是o(m*n), 或者o(m*(m+n)), 所以, 如果m和n都不小的話, 效率確實是不高的.
效率高一些的方法是, 建立m+n長度的數組或arraylist, 在給定位置賦值該m個要插入的元素, 其他位置依次賦值原n長度list的元素. 這樣時間復雜度應當是o(m+n).
還有, 在前面的實現中, 我們可以看到有對ensurecapacityinternal(int) 的調用. 這個保證數組容量的實現主要在:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
/** * increases the capacity to ensure that it can hold at least the * number of elements specified by the minimum capacity argument. * * @param mincapacity the desired minimum capacity */ private void grow( int mincapacity) { // overflow-conscious code int oldcapacity = elementdata.length; int newcapacity = oldcapacity + (oldcapacity >> 1 ); if (newcapacity - mincapacity < 0 ) newcapacity = mincapacity; if (newcapacity - max_array_size > 0 ) newcapacity = hugecapacity(mincapacity); // mincapacity is usually close to size, so this is a win: elementdata = arrays.copyof(elementdata, newcapacity); } |
大家知道由于效率原因, arraylist容量增長不是正好按照要求的容量mincapacity來設計的, 新容量計算的主要邏輯是: 如果要求容量比當前容量的1.5倍大, 就按照要求容量重新分配空間; 否則按當前容量1.5倍增加. 當然不能超出integer.max_value了. oldcapacity + (oldcapacity >> 1) 實際就是當前容量1.5倍, 等同于(int) (oldcapacity * 1.5), 但因這段不涉及浮點運算只是移位, 顯然效率高不少.
所以如果arraylist一個一個add元素的話, 容量是在不夠的時候1.5倍增長的. 關于1.5這個數字, 或許是覺得2倍增長太快了吧. 也或許有實驗數據的驗證支撐.
關于這段代碼中出現的arrays.copyof這個方法, 實現的是重新分配一段數組, 把elementdata賦值給新分配的空間, 如果新分配的空間大, 則后面賦值null, 如果分配空間比當前數組小則截斷. 底層還是調用的system.arraycopy.
以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支持服務器之家。
原文鏈接:https://segmentfault.com/a/1190000016910760