Violations Found in the COLLECT Case Study

Below we provide extended descriptions of each violation found.

Original LockFreeList sequential inconsistency (expected, B).

We applied COLLECT to an old version of LockFreeList ([Sec. 9.8] [1] and companion code) with a known bug, to see if it finds the bug. LockFreeList manages a concurrent linked list using AtomicMarkableReferences. We find the SC violation, indicated in the errata for [1] and previously found via model checking [4], in the remove method. This violation can occur between two remove operations, because remove sets the mark without first checking whether it is already set. This means that both removes can return true even if only one element is present to be removed. This was corrected in [2] by replacing attemptMark with compareAndSet; in the corrected version we do not find any violations.

SynchronousDualQueue sequential inconsistency (unexpected, B).

As this is a synchronous queue, any deq call operation should block until it rendezvouses with an enq operation by another thread, and when it returns should return the item added by the corresponding enq. However, we have found it is possible for a thread to return from deq a NULL value in a particular scenario, with three threads running the following operations: t_1: enq(0), t_2: deq(), t_3: deq(). The following sequence of events would reproduce this problem, starting with an empty SynchronousDualQueue: t_1: enters enq, creates a new Node(0, NodeType.ITEM) called offer, inserts it into the queue by adjusting tail.next (and thus head.next) to point to offer (line 26, Figure 10.20, pg. 247), and sets tail to refer to offer (line 27), but yields before entering the busy-wait loop on line 28. t_3: enters deq(), creates a reservation offer Node, finds that the head and tail are not the same, that tail is an ITEM, and that its snapshot is consistent, and so gets the item from head.next before successfully setting head.next's item to null and yielding (line 100, SynchronousDualQueue.java). t_2: enters deq(), creates a reservation offer Node, and like t_3 finds that the head and tail are not the same, that tail is an ITEM, and that its snapshot is consistent, and so gets the item from head.next and again sets head.next to null, It finally sets head to head->next and returns the item, which is null, since t_3 previously set head.next's item to null (line 103, SynchronousDualQueue.java). We have found that this problem is easily correctable by having the deq() method check for null items. Inserting if (!item) continue; immediately after line 99 (getting the item) corrects this bug.

RefinableHashSet null pointer dereference (unexpected, B).

We find that is possible to encounter a null pointer exception in the acquire method called by one thread trying to add to the hash set while another is resizing. As written, the code to resize the hash set and the lock array used to control access to the buckets in the hash set controls access to write to the locks array using an AtomicMarkableReference to a Thread called owner. The problem we encounter is shown by the following sequence of events: A thread t_1 in the acquire method checks the owner, finds that it indicates no one is resizing, but then yields before proceeding (line 21, Fig. 13.9, pg. 313). A resize operation in another thread t_2 then sets owner to itself, waits for quiescence which will immediately succeed because no locks are held, and then carries out reallocation of the hash set at a new size, reaching line 50 before yielding. This makes the locks field of the hash set a valid allocation of memory, but initialized to null pointers to ReentrantLock (line 50, Fig. 13.10, pg. 314). Finally, t_1 records references to the locks array and particular lock to acquire (oldLock), which will be a null pointer as the contents of locks have not been initialized, and encounters a null pointer exception when trying to call oldLock.lock() when oldLock is null (line 24, Fig. 13.9, pg. 313). This also applies to RefinableCuckooHashSet.

LockFreeHashSet sequential inconsistency (unexpected, B).

This bug arises in the remove method of BucketList, which provides most of the functionality used in LockFreeHashSet, which is not written out in the book but appears in the companion code. At line 81, the remove method sets the mark of the node to be removed to true, marking it logically to be removed, but does not check first to ensure the node hasn't already been marked for removal. This means two concurrent remove methods can both return true, indicating they both have removed an item from the hash set, when in fact only one item has been removed. We are able to fix this bug easily by changing the call pred.next.attemptMark(curr, true) to pred.next.compareAndSet(curr, curr, false, true) on line 81 in BucketList. This is very similar to the first violation above.

StripedCuckooHashSet null pointer dereference (unexpected, B).

There is a bug in method relocate [Fig. 13.27] [3], called by add [Fig. 13.26] [3]. Method relocate carries out a ``shuffling'' process, moving items between two hash tables to ensure no table entry has too many items. It acquires the locks it needs before moving any items; however, it determines the value of the item to be relocated (iSet.get(0)) without holding any locks, meaning up until that point another thread could remove all items from the probe set or even call resize [Fig. 13.30] [3] and invalidate any references held by the calling process before it can call get(0). Modifying relocate to obtain the needed lock before get(0) is not sufficient---a deadlock is then possible if another thread is in the middle of acquiring all locks in resize.

StripedCuckooHashSet infinite state (unexpected, C).

This implementation can enter a state in which it will repeatedly call resize [Fig. 13.30] [3] and increase the size of the data structure without bound. This violation depends on specific hash function characteristics and thus cannot be found with the identity hash model. In the example found by COLLECT, the hash function maps all items to 0. In this case, regardless of the size of the set, when enough items are added to trigger a resize, resize will continue to recursively call resize (through add [Fig. 13.26] [3]) forever. This also applies to RefinableCuckooHashSet.

FineGrainedHeap non-termination (unexpected, A).

The add method of FineGrainedHeap requires the use of fair locks. The code for add can be seen in [Figure 15.11] [3]. In the simplest case, with two threads both trying to add to an empty FineGrainedHeap, it is possible for one thread to be blocked waiting for a ReentrantLock that the other thread will repeatedly attempt to lock and then unlock, not able to make progress until the blocked thread progresses and sets the status field to AVAILABLE. Because the Java ReentrantLock by default does not guarantee favoring the longest-waiting thread, one thread can be stuck waiting for its lock while the other thread repeatedly locks and unlocks the same lock indefinitely, a non-termination violation. Correcting this non-termination requires the use of fair locks, modelled by our FairReentrantLock utility class. With weak fairness in thread scheduling enabled in COLLECT as well, the above violation does not occur.

FineGrainedHeap deadlock (unexpected, B).

FineGrainedHeap has a second bug. A deadlock can occur with two threads when the heap contains one item. At the start, the heap contains one node u with score 1. Thread 0 calls add [Fig. 15.11] [3] with an item of score 0 while thread 1 calls removeMin [Fig. 15.10] [3], and the following sequence of events occurs:
  1. Thread 0, in add, acquires the heapLock, determines that its child node (call it v) will be the second node in the tree, locks v, releases heapLock, initializes v with score 0, and unlocks v.
  2. Thread 1, in removeMin, acquires heapLock, determines that its bottom node (to be swapped with ROOT) is the second node v, and locks v.
  3. Thread 0, in add, enters the while loop to move v up to the appropriate position in the tree, determines that v's parent is u, and locks u. It then attempts to lock v, its child, but blocks because thread 1 holds the v lock.
  4. Thread 1 attempts to lock u, the ROOT node, but blocks because thread 0 holds the u lock.
At this point each thread is holding the lock that the other needs to proceed.

Original SkipQueue non-termination (expected, A).

SkipQueue implements a priority queue using a list of nodes, each containing AtomicMarkableReferences to successor nodes. In the original version ([Fig. 15.5] [1] and companion code) method findAndMarkMin, called by removeMin, incorrectly traverses the skip queue: it fails to advance curr when it encounters a marked node, preventing the while loop from terminating, causing a non-termination violation. The errata provides a fix, adjusting the else condition to ensure curr advances. However, both [2] and [3] apply the errata's fix incorrectly, leaving an extra syntactically-invalid closing brace near the end of the method [Fig. 15.13] [3]. Correcting for this typo, we do not find a non-termination violation.

SkipQueue sequential inconsistency (expected, E).

As described in [Sec. 15.5] [3], SkipQueue is not LZ. It meets QC, a weaker correctness condition. Figure 15.15 of [3] illustrates an LZ violation; the example is also an SC violation. We detect a similar, but simpler, SC violation by using E. The violation occurs for a schedule consisting of 3 threads: thread 0 executes two adds of items with scores 0 and 2, resp., while threads 1 and 2 execute one removeMin each, on a SkipQueue starting with one item with score 1.

References

[1] Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco, CA (2008), https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-370591-4

[2] Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, San Francisco, CA, revised reprint edn. (2012), https://shop.elsevier.com/books/the-art-of-multiprocessor-programming-revised-reprint/herlihy/978-0-12-397337-5

[3] Herlihy, M., Shavit, N., Luchangco, V., Spear, M.: The Art of Multiprocessor Programming. Morgan Kaufmann, Cambridge, MA, second edn. (2020), https://www.elsevier.com/books/the-art-of-multiprocessor-programming/herlihy/978-0-12-415950-1

[4] Černý, P., Radhakrishna, A., Zufferey, D., Chaudhuri, S., Alur, R.: Model checking of linearizability of concurrent list implementations. In: Proceedings of the 22nd International Conference on Computer Aided Verification. p. 465–479. CAV’10, Springer-Verlag, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_41


Last modified: Fri Jan 31 13:58:28 EST 2025