Joined: 1 Jul 05
Regarding the recent increasing graphic related errors, we suspect it is a problem of thread synchronization. Basically Rosetta working thread does the simulation which changes all the atom coordinates ( which are saved in shared memory) while the graphic thread tries to read data from that place to draw the graphic or screensaver. Currently there is no locking mechanism to ensure the shared memory is accessed by one thread at a time and this could generate some conflicts or memory corruption and then trigger an error. On one of our local computers, when screensaver or graphic is turned on, it caught errors at a rate of at least one per day on average and without any graphics, it ran flawlessly. The errors which have been observed include crashing(0xc0000005), hung-up (0x40010004) and being stuck( watchdog ending). All the errors were not reproducable with same random number seeds and we think that is due to the radomness in graphic process. Another side proof was that showing sidechains requires accessing shared memory more often and intensively, and after turning off sidechains and rotating, the graphic error rates drop but the problem is not solved completely. There seems to be an correlation between two. We are currently working on adding the new locking mechanism and we will post an update when it is ready to be tested out. Meanwhile, if you have experienced freqeuent client-errors, please temporarily disable the boinc_graphics/screensaver to reduce the problem. Thanks for everybody's support.
©2020 University of Washington