Summary:
* Add NUM_WRITERS to ledger_cleanup to enable multiple writers.
(Note that our insert_shreds() is still single threaded because
it has a lock that limits only one writer at a time.)
* Make pre-generated slots more performent by directly inserting
into the shared queue. Otherwise, the main-thread which
prepares the slots will be slower than the writers.
* Correct the shred insertion time -- before this diff it did not
wait for joining all writer threads.