LockFreeList sequential inconsistency (expected, B).LockFreeList ([Sec. 9.8]
[1] and companion code) with a known bug, to see if it finds the
bug. LockFreeList manages a concurrent linked list
using AtomicMarkableReferences. We find the SC
violation, indicated in the errata for [1] and previously found via model
checking [4], in the remove method. This violation can occur
between two remove operations, because
remove sets the mark without first checking whether it is already set.
This means that both removes can return true even if only one
element is present to be removed. This was corrected in
[2] by replacing attemptMark with
compareAndSet; in the corrected version we do not find any violations.
SynchronousDualQueue sequential inconsistency (unexpected, B).deq call operation should block
until it rendezvouses with an enq operation by another thread, and
when it returns should return the item added by the
corresponding enq. However, we have found it is possible for a
thread to return from deq a
NULL value in a particular scenario, with three threads running the
following operations: t_1: enq(0), t_2: deq(), t_3:
deq().
The following sequence of events would reproduce this problem, starting with an
empty SynchronousDualQueue:
t_1: enters enq, creates a new Node(0, NodeType.ITEM) called
offer, inserts it into the queue by
adjusting tail.next (and thus
head.next) to point to offer (line 26, Figure 10.20, pg. 247), and
sets tail to refer to offer (line 27), but yields before entering
the busy-wait loop on line 28.
t_3: enters deq(), creates a reservation
offer Node, finds that the head and tail
are not the same, that tail is an
ITEM, and that its snapshot is consistent, and so gets the item
from head.next before successfully setting head.next's item
to null and yielding (line 100, SynchronousDualQueue.java).
t_2: enters deq(), creates a reservation
offer Node, and like
t_3 finds that the head and tail are not the same, that
tail is an ITEM, and that its snapshot is consistent, and so gets
the item from head.next and again sets head.next to
null, It finally sets head to head->next and
returns the
item, which is null, since t_3 previously set
head.next's item to null (line 103, SynchronousDualQueue.java).
We have found that this problem is easily correctable by having
the deq() method check for null items. Inserting if
(!item) continue; immediately after line 99 (getting the item) corrects
this bug.
RefinableHashSet null pointer dereference
(unexpected, B).acquire method called by one thread trying to add to the hash set
while another is resizing. As written, the code to resize the hash set and the
lock array used to control access to the buckets in the hash set controls access
to write to the locks array using an AtomicMarkableReference to a
Thread called owner.
The problem we encounter is shown by the following sequence of events:
A thread t_1 in the acquire method checks the owner, finds
that it indicates no one is resizing, but then yields before proceeding (line
21, Fig. 13.9, pg. 313).
A resize operation in another thread t_2 then sets owner to
itself, waits for quiescence which will immediately succeed because no locks are
held, and then carries out reallocation of the hash set at a new size, reaching
line 50 before yielding. This makes the locks field of the hash set a valid
allocation of memory, but initialized to null pointers
to ReentrantLock (line 50, Fig. 13.10, pg. 314).
Finally, t_1 records references to the locks array and particular lock to
acquire (oldLock), which will be a null pointer as the contents of
locks have not been initialized, and encounters a null pointer exception when
trying to call oldLock.lock() when oldLock
is null (line 24, Fig. 13.9, pg. 313). This also applies
to RefinableCuckooHashSet.
LockFreeHashSet sequential inconsistency
(unexpected, B).BucketList, which provides
most of the functionality used in LockFreeHashSet, which is not
written out in the book but appears in the companion code. At line 81,
the remove method sets the mark of the node to be removed
to true, marking it logically to be removed, but does not check
first to ensure the node hasn't already been marked for removal. This means two
concurrent remove methods can both return true,
indicating they both have removed an item from the hash set, when in fact only
one item has been removed. We are able to fix this bug easily by changing the
call pred.next.attemptMark(curr, true) to
pred.next.compareAndSet(curr, curr, false, true) on line 81 in
BucketList. This is very similar to the first violation above.
StripedCuckooHashSet null pointer dereference
(unexpected, B).relocate [Fig. 13.27] [3], called
by add [Fig. 13.26] [3]. Method relocate carries out a
``shuffling'' process, moving items between two hash tables to ensure no table
entry has too many items. It acquires the locks it needs before moving any
items; however, it determines the value of the item to be relocated
(iSet.get(0)) without holding any locks, meaning up until that
point another thread could remove all items from the probe set or even
call resize [Fig. 13.30] [3] and invalidate any references held by
the calling process before it can call get(0).
Modifying relocate to obtain the needed lock
before get(0) is not sufficient---a deadlock is then possible if
another thread is in the middle of acquiring all locks in
resize.
StripedCuckooHashSet infinite state (unexpected, C).resize [Fig. 13.30] [3] and increase the size of the data
structure without bound. This violation depends on specific hash function
characteristics and thus cannot be found with the identity hash model. In the
example found by COLLECT, the hash function maps all items to 0. In this case,
regardless of the size of the set, when enough items are added to trigger a
resize, resize will continue to recursively
call resize (through
add [Fig. 13.26] [3]) forever. This also
applies to RefinableCuckooHashSet.
FineGrainedHeap non-termination (unexpected, A).add method of FineGrainedHeap requires the use of
fair locks. The code for add can be seen in [Figure 15.11] [3]. In
the simplest case, with two threads both trying to add to an
empty FineGrainedHeap, it is possible for one thread to be blocked
waiting for a ReentrantLock that the other thread will repeatedly
attempt to lock and then unlock, not able to make progress until the blocked
thread progresses and sets the status field to
AVAILABLE. Because the Java ReentrantLock by default does not
guarantee favoring the longest-waiting thread, one thread can be stuck waiting
for its lock while the other thread repeatedly locks and unlocks the same lock
indefinitely, a non-termination violation. Correcting this non-termination
requires the use of fair locks, modelled by our FairReentrantLock utility
class. With weak fairness in thread scheduling enabled in COLLECT as well,
the above violation does not occur.
FineGrainedHeap deadlock (unexpected, B).FineGrainedHeap has a second bug. A deadlock can occur with two threads
when the heap contains one item. At the start, the heap contains one node u
with score 1. Thread 0 calls add
[Fig. 15.11] [3] with an item of score 0 while
thread 1 calls removeMin [Fig. 15.10] [3],
and the following sequence of events occurs:
add, acquires the heapLock,
determines that its child node (call it v) will be the second node in
the tree, locks v, releases heapLock,
initializes v with score 0, and unlocks v.removeMin,
acquires heapLock, determines that its bottom node (to be
swapped with ROOT) is the second node v, and
locks v.add, enters the while loop
to move v up to the appropriate position in the tree, determines
that v's parent is u, and locks u. It then attempts to
lock v, its child, but blocks because thread
1 holds the v lock.ROOT node,
but blocks because thread 0 holds the u lock.SkipQueue non-termination (expected, A).SkipQueue implements a priority queue using a list of nodes, each
containing AtomicMarkableReferences to successor nodes. In the original
version ([Fig. 15.5] [1] and companion code) method
findAndMarkMin, called by removeMin, incorrectly
traverses the skip queue: it fails to advance curr when it
encounters a marked node, preventing the while loop from
terminating, causing a non-termination violation. The errata provides a fix,
adjusting the else condition to ensure curr advances.
However, both [2] and [3] apply the errata's fix incorrectly, leaving an extra
syntactically-invalid closing brace near the end of the method [Fig. 15.13]
[3]. Correcting for this typo, we do not find a non-termination violation.
SkipQueue sequential inconsistency (expected, E).SkipQueue is
not LZ. It meets QC, a weaker correctness
condition. Figure 15.15 of [3] illustrates an LZ violation; the
example is also an SC violation. We detect a similar, but
simpler, SC violation by using E. The violation occurs
for a schedule consisting of 3 threads: thread 0 executes two adds
of items with scores 0 and 2, resp., while threads 1 and 2 execute
one removeMin each, on a SkipQueue starting with one
item with score 1.
[1] Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco, CA (2008), https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-370591-4
[2] Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco, CA, revised reprint edn. (2012), https://shop.elsevier.com/books/the-art-of-multiprocessor-programming-revised-reprint/herlihy/978-0-12-397337-5
[3] Herlihy, M., Shavit, N., Luchangco, V., Spear, M.: The Art of Multiprocessor Programming. Morgan Kaufmann, Cambridge, MA, second edn. (2020), https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-415950-1
[4] Černý, P., Radhakrishna, A., Zufferey, D., Chaudhuri, S., Alur, R.: Model checking of linearizability of concurrent list implementations. In: Proceedings of the 22nd International Conference on Computer Aided Verification. p. 465–479. CAV’10, Springer-Verlag, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_41