Message boards : Number crunching : Today 3 wu's failed with Unhandled Exception Detected...
Author | Message |
---|---|
alex Send message Joined: 21 Dec 14 Posts: 8 Credit: 2,668,966 RAC: 0 |
Hi, normally Rosetta runs fine on my PC's, but today 3 wu' failed with very similar errors Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0127C0E8 write attempt to address 0x00ACE12C Engaging BOINC Windows Runtime Debugger... https://boinc.bakerlab.org/result.php?resultid=1050398575 https://boinc.bakerlab.org/result.php?resultid=1050397973 https://boinc.bakerlab.org/result.php?resultid=1050392563 As I'm looking through my results, i found another pc too with errors of the same kind. Looks like a system problem, not a pc problem. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Hi, You are running into the same problem I had when I tried my new Ryzen 2600 on Windows 10. It did terribly; about 7 out of 8 failed. I have now switched it to Ubuntu 18.04.1, and the first one was OK. I will get a full load of results today (I run the 24-hour work units). In general, my Ryzen 1700 and 2700 did well on Ubuntu also, when I updated to the latest Linux kernel. I think they have some serious fixing to do with their Windows compiler, or whatever. |
alex Send message Joined: 21 Dec 14 Posts: 8 Credit: 2,668,966 RAC: 0 |
I'ts not that clear. One Ryzen has 0 errors, another 100% https://boinc.bakerlab.org/rosetta/results.php?hostid=3544884 https://boinc.bakerlab.org/rosetta/results.php?hostid=3258387 But yes, Intel is error free. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I'ts not that clear. One Ryzen has 0 errors, another 100% Chances are, it was some security update to Win 10 that is borking one machine. But the possibilities are endless. Yes, Intel is error-free, on both Windows (at least Win7) and Ubuntu insofar as I have seen. But the output is inconsistent. i7-4790 (Ubuntu 16.04.5): https://boinc.bakerlab.org/rosetta/results.php?hostid=3573441&offset=0&show_names=0&state=4&appid= i7-8700 (Ubuntu 18.04.1): https://boinc.bakerlab.org/rosetta/results.php?hostid=3493841&offset=0&show_names=0&state=4&appid= As you can see, the faster machine is doing worse. I have seen it do very well though, with around 1200 points per work unit, for a time. And the Ryzen 2600 (Ubuntu 18.04.1) continues to do well: https://boinc.bakerlab.org/rosetta/results.php?hostid=3576251&offset=0&show_names=0&state=4&appid= It is not even clear whether it is our machines, or their scoring functions. But the errors are another matter. The Rosetta people need to fix that, whatever it is. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,561,131 RAC: 6,603 |
The Rosetta people need to fix that, whatever it is. Problems of Win10+Rosetta are not new. 4.07 app is released February 2018, so it seems that they have not haste to debug. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Problems of Win10+Rosetta are not new. It is curious that it works on some systems but not others. I think another possible source of the problem is memory. It is well-known to be difficult to get reliable memory operation on the Ryzen motherboards. I have found over the past few days that even with Ubuntu, my Ryzen 2600 system will freeze up about once a day running Rosetta on all cores. I can run that memory on WCG for months at a time with no problem. It seems that Rosetta brings out the worst in it. Reducing the memory from 2666 MHz to 2400 MHz may help. But I am ordering more that is actually on the QVL list to make sure. On Win 10, that might produce errors in the work units rather than freezes. |
San-Fernando-Valley Send message Joined: 16 Mar 16 Posts: 12 Credit: 143,229 RAC: 0 |
NO, Intel is not error free! After aprox. 22 hours of crunching 12 out of 29 on three different INTEL rigs get following ERROR (or similar ones): Name RK190110-A_HT_DHD_59_B_HT_DHD_51.pdb-fnd_SAVE_ALL_OUT_711702_3297_0 Workunit 947259516 Created 14 Jan 2019, 14:36:48 UTC Sent 14 Jan 2019, 15:54:08 UTC Report deadline 22 Jan 2019, 15:54:08 UTC Received 15 Jan 2019, 14:46:28 UTC Server state Over Outcome Computation error Client state Cancelled by server Exit status 202 (0x000000CA) EXIT_ABORTED_BY_PROJECT Computer ID 3584822 Run time 22 hours 51 min 13 sec CPU time 22 hours 50 min 17 sec Validate state Invalid Credit 0.00 Device peak FLOPS 5.30 GFLOPS Application version Rosetta Mini v3.78 windows_intelx86 Peak working set size 301.50 MB Peak swap size 283.98 MB Peak disk usage 454.88 MB .... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x75F1338D Engaging BOINC Windows Runtime Debugger... Intel HT off. Plenty RAM memory. Plenty disk space. Plenty performance. Just ROSETTA running. WIN10 and WIN7 is installed. Unless we hear from project staff what the problem is and what we can do to avoid wasting 12 times over 22 hours of time, we will stop crunching. Maybe we are just making a "simple dumb" mistake on our side? Looking forward to a serious reply! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
EXIT_ABORTED_BY_PROJECTis an indication that R@h Project Team determined they should cancel work that was already released to hosts. It relates to the batch of work, not to the host processing the work. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Today 3 wu's failed with Unhandled Exception Detected...
©2024 University of Washington
https://www.bakerlab.org