multithreading - Issues with parallelizing the creation of key-value pairs in a Ruby hash -
working ruby , wrote following code using parallel , jruby 1.7.19 speed creation of hash array many values:
hash = {} array = [ {"id" => "a001", "value" => 1}, {"id" => "b002", "value" => 0}, {"id" => "c003", "value" => 3}, {"id" => "d004", "value" => 0}] parallel.each(array, { in_threads: 5 }) |item| if keep_item?(item) hash[item["id"]] = item end end def keep_item?(item) item["value"] > 0 end it brought attention there issues adding keys hashes in parallel in ruby. there risks code (thread-safe, loss of data, strange locks i'm unaware of, etc) such should have left regular series #each call?
hash isn't thread safe. if keep_item? visits hash, there race condition. if doesn't, there concurrent updates hash, error prone.
if there's no lock or other synchronization, theoretically there's no guarantee updates non-thread-safe hash on 1 thread visible on other thread. concurrent updates of hash without synchronization may lose data, or cause other strange issue. depends on implementation of ruby hash.
you data simple enough, process them using normal each. if use parallel, , add mutex/lock thread safe access, synchronization overhead add time cost overall process. , it's safe parallel version use more time.
parallel useful when task io bounded, or cpu bounded long have free cores , task doesn't need exchange data between each other.
Comments
Post a Comment