LockFreeList
sequential inconsistency (expected, B).LockFreeList
([Sec. 9.8]
[1] and companion code) with a known bug, to see if it finds the
bug. LockFreeList
manages a concurrent linked list
using AtomicMarkableReference
s. We find the SC
violation, indicated in the errata for [1] and previously found via model
checking [4], in the remove
method. This violation can occur
between two remove
operations, because
remove
sets the mark without first checking whether it is already set.
This means that both remove
s can return true even if only one
element is present to be removed. This was corrected in
[2] by replacing attemptMark
with
compareAndSet
; in the corrected version we do not find any violations.
SynchronousDualQueue
sequential inconsistency (unexpected, B).deq
call operation should block
until it rendezvouses with an enq
operation by another thread, and
when it returns should return the item added by the
corresponding enq
. However, we have found it is possible for a
thread to return from deq
a
NULL
value in a particular scenario, with three threads running the
following operations: t_1: enq(0)
, t_2: deq()
, t_3:
deq()
.
The following sequence of events would reproduce this problem, starting with an
empty SynchronousDualQueue
:
t_1: enters enq
, creates a new Node(0, NodeType.ITEM)
called
offer
, inserts it into the queue by
adjusting tail.next
(and thus
head.next
) to point to offer
(line 26, Figure 10.20, pg. 247), and
sets tail
to refer to offer
(line 27), but yields before entering
the busy-wait loop on line 28.
t_3: enters deq()
, creates a reservation
offer Node
, finds that the head
and tail
are not the same, that tail
is an
ITEM
, and that its snapshot is consistent, and so gets the item
from head.next
before successfully setting head.next
's item
to null and yielding (line 100, SynchronousDualQueue.java).
t_2: enters deq()
, creates a reservation
offer Node
, and like
t_3 finds that the head
and tail
are not the same, that
tail
is an ITEM
, and that its snapshot is consistent, and so gets
the item
from head.next
and again sets head.next
to
null, It finally sets head
to head->next
and
returns the
item
, which is null, since t_3 previously set
head.next
's item to null (line 103, SynchronousDualQueue.java).
We have found that this problem is easily correctable by having
the deq()
method check for null items. Inserting if
(!item) continue;
immediately after line 99 (getting the item) corrects
this bug.
RefinableHashSet
null pointer dereference
(unexpected, B).acquire
method called by one thread trying to add to the hash set
while another is resizing. As written, the code to resize the hash set and the
lock array used to control access to the buckets in the hash set controls access
to write to the locks array using an AtomicMarkableReference
to a
Thread
called owner
.
The problem we encounter is shown by the following sequence of events:
A thread t_1 in the acquire method checks the owner
, finds
that it indicates no one is resizing, but then yields before proceeding (line
21, Fig. 13.9, pg. 313).
A resize operation in another thread t_2 then sets owner
to
itself, waits for quiescence which will immediately succeed because no locks are
held, and then carries out reallocation of the hash set at a new size, reaching
line 50 before yielding. This makes the locks field of the hash set a valid
allocation of memory, but initialized to null pointers
to ReentrantLock
(line 50, Fig. 13.10, pg. 314).
Finally, t_1 records references to the locks array and particular lock to
acquire (oldLock
), which will be a null pointer as the contents of
locks have not been initialized, and encounters a null pointer exception when
trying to call oldLock.lock()
when oldLock
is null (line 24, Fig. 13.9, pg. 313). This also applies
to RefinableCuckooHashSet
.
LockFreeHashSet
sequential inconsistency
(unexpected, B).BucketList
, which provides
most of the functionality used in LockFreeHashSet
, which is not
written out in the book but appears in the companion code. At line 81,
the remove
method sets the mark of the node to be removed
to true
, marking it logically to be removed, but does not check
first to ensure the node hasn't already been marked for removal. This means two
concurrent remove
methods can both return true
,
indicating they both have removed an item from the hash set, when in fact only
one item has been removed. We are able to fix this bug easily by changing the
call pred.next.attemptMark(curr, true)
to
pred.next.compareAndSet(curr, curr, false, true)
on line 81 in
BucketList
. This is very similar to the first violation above.
StripedCuckooHashSet
null pointer dereference
(unexpected, B).relocate
[Fig. 13.27] [3], called
by add
[Fig. 13.26] [3]. Method relocate
carries out a
``shuffling'' process, moving items between two hash tables to ensure no table
entry has too many items. It acquires the locks it needs before moving any
items; however, it determines the value of the item to be relocated
(iSet.get(0)
) without holding any locks, meaning up until that
point another thread could remove all items from the probe set or even
call resize
[Fig. 13.30] [3] and invalidate any references held by
the calling process before it can call get(0)
.
Modifying relocate
to obtain the needed lock
before get(0)
is not sufficient---a deadlock is then possible if
another thread is in the middle of acquiring all locks in
resize
.
StripedCuckooHashSet
infinite state (unexpected, C).resize
[Fig. 13.30] [3] and increase the size of the data
structure without bound. This violation depends on specific hash function
characteristics and thus cannot be found with the identity hash model. In the
example found by COLLECT, the hash function maps all items to 0. In this case,
regardless of the size of the set, when enough items are added to trigger a
resize, resize
will continue to recursively
call resize
(through
add
[Fig. 13.26] [3]) forever. This also
applies to RefinableCuckooHashSet
.
FineGrainedHeap
non-termination (unexpected, A).add
method of FineGrainedHeap
requires the use of
fair locks. The code for add
can be seen in [Figure 15.11] [3]. In
the simplest case, with two threads both trying to add to an
empty FineGrainedHeap
, it is possible for one thread to be blocked
waiting for a ReentrantLock
that the other thread will repeatedly
attempt to lock and then unlock, not able to make progress until the blocked
thread progresses and sets the status
field to
AVAILABLE
. Because the Java ReentrantLock
by default does not
guarantee favoring the longest-waiting thread, one thread can be stuck waiting
for its lock while the other thread repeatedly locks and unlocks the same lock
indefinitely, a non-termination violation. Correcting this non-termination
requires the use of fair locks, modelled by our FairReentrantLock
utility
class. With weak fairness in thread scheduling enabled in COLLECT as well,
the above violation does not occur.
FineGrainedHeap
deadlock (unexpected, B).FineGrainedHeap
has a second bug. A deadlock can occur with two threads
when the heap contains one item. At the start, the heap contains one node u
with score 1. Thread 0 calls add
[Fig. 15.11] [3] with an item of score 0 while
thread 1 calls removeMin
[Fig. 15.10] [3],
and the following sequence of events occurs:
add
, acquires the heapLock
,
determines that its child node (call it v) will be the second node in
the tree, locks v, releases heapLock
,
initializes v with score 0, and unlocks v.removeMin
,
acquires heapLock
, determines that its bottom node (to be
swapped with ROOT
) is the second node v, and
locks v.add
, enters the while
loop
to move v up to the appropriate position in the tree, determines
that v's parent is u, and locks u. It then attempts to
lock v, its child, but blocks because thread
1 holds the v lock.ROOT
node,
but blocks because thread 0 holds the u lock.SkipQueue
non-termination (expected, A).SkipQueue
implements a priority queue using a list of nodes, each
containing AtomicMarkableReference
s to successor nodes. In the original
version ([Fig. 15.5] [1] and companion code) method
findAndMarkMin
, called by removeMin
, incorrectly
traverses the skip queue: it fails to advance curr
when it
encounters a marked node, preventing the while
loop from
terminating, causing a non-termination violation. The errata provides a fix,
adjusting the else
condition to ensure curr
advances.
However, both [2] and [3] apply the errata's fix incorrectly, leaving an extra
syntactically-invalid closing brace near the end of the method [Fig. 15.13]
[3]. Correcting for this typo, we do not find a non-termination violation.
SkipQueue
sequential inconsistency (expected, E).SkipQueue
is
not LZ. It meets QC, a weaker correctness
condition. Figure 15.15 of [3] illustrates an LZ violation; the
example is also an SC violation. We detect a similar, but
simpler, SC violation by using E. The violation occurs
for a schedule consisting of 3 threads: thread 0 executes two add
s
of items with scores 0 and 2, resp., while threads 1 and 2 execute
one removeMin
each, on a SkipQueue
starting with one
item with score 1.
[1] Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco, CA (2008), https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-370591-4
[2] Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco, CA, revised reprint edn. (2012), https://shop.elsevier.com/books/the-art-of-multiprocessor-programming-revised-reprint/herlihy/978-0-12-397337-5
[3] Herlihy, M., Shavit, N., Luchangco, V., Spear, M.: The Art of Multiprocessor Programming. Morgan Kaufmann, Cambridge, MA, second edn. (2020), https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-415950-1
[4] Černý, P., Radhakrishna, A., Zufferey, D., Chaudhuri, S., Alur, R.: Model checking of linearizability of concurrent list implementations. In: Proceedings of the 22nd International Conference on Computer Aided Verification. p. 465–479. CAV’10, Springer-Verlag, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_41