Message boards : Number crunching : Problems with Rosetta version 5.41
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
martino.corti Send message Joined: 23 Nov 06 Posts: 1 Credit: 180 RAC: 0 |
Hi all, I am working with Rosetta 5.41 and got a few "Computation error" in the last weeks. In one case I did notice that the error occurred immediatly after an order of "Activity / Suspend" (issued by me through the standard "BOINC Manager": from time to time I need to have the whole PC available), but I wasn't sure of the evidence. Today I had some spare time to verify the possible causal correlation: -) the PC was dedicated to BOINC; -) Rosetta 5.1 was running on a WU; -) I issued the "Activity / Suspend" order using "BOINC Manager" and the WU was reported immediatly under "Computational Error". I suspect that the same happens when BOINC system is preempting a WU: in one case the CPU was close to the configured 60' (59'54"), but this is harder to verify explicitly since a WU could be left anywhere when the PC is shut down (which is not causing the problem). I hope this oservation can be of help. Martino |
Thomas F. Bates IV Send message Joined: 10 May 06 Posts: 5 Credit: 2,853,254 RAC: 0 |
FYI - Two recent WUs caused 5.41 to consume 100% CPU and it refused to suspend on user activity; I had to kill the process. Win2kPro, BOINC 5.4.9 |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 71 |
> Have had 8 WUs fail with the same error code and all on the same machine, Host 264297, a P4 2.53 GHz @ 2.75 GHz, not running Boinc screensaver just a standard Windows screensaver. Has been fine till 2 days ago. https://boinc.bakerlab.org/rosetta/result.php?resultid=50937552 https://boinc.bakerlab.org/rosetta/result.php?resultid=50895262 https://boinc.bakerlab.org/rosetta/result.php?resultid=50854943 https://boinc.bakerlab.org/rosetta/result.php?resultid=50734311 https://boinc.bakerlab.org/rosetta/result.php?resultid=50414918 https://boinc.bakerlab.org/rosetta/result.php?resultid=50327688 (debug info) https://boinc.bakerlab.org/rosetta/result.php?resultid=50288762 All have "exit code - 1073741819" A few WUs have processed in between the failed ones but so far none have run on the 9/12/06. I turned off the Boinc graphics 3 weeks ago and all has been running error free till now. |
Faust Send message Joined: 7 Sep 06 Posts: 14 Credit: 49,559 RAC: 0 |
I don't know if that's a problem - but I think it's very unlikely for my fastest machine to only get 3.29 credits for a completed WU :) (claimed crdit 42.04) I don't see any errors and it also didn't crash. Screensaver was also off. Result I'm also having the same problem Feet1st described in a dedicated thread - 'RAC dropping, BOINC dropping comms'. It happens alot lately. but I guess that's a Boinc issue. Faust. Faust. |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 71 |
> Have had 8 WUs fail with the same error code and all on the same machine, Host 264297, a P4 2.53 GHz @ 2.75 GHz, not running Boinc screensaver just a standard Windows screensaver. Has been fine till 2 days ago. Another two same machine, https://boinc.bakerlab.org/rosetta/result.php?resultid=50984487 (This one hung for hours so aborted, Boinc Manager showed nothing happening). https://boinc.bakerlab.org/rosetta/result.php?resultid=51029592 (Same error as before "Exit code -1073741819) Have now updated Boinc from 5.5.0 to 5.4.11 to see what happens. |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 71 |
> Have had 8 WUs fail with the same error code and all on the same machine, Host 264297, a P4 2.53 GHz @ 2.75 GHz, not running Boinc screensaver just a standard Windows screensaver. Has been fine till 2 days ago. > Well that was a great success, first 2 WUs with new Boinc version and had 2 more failures, still with the same error (exit code -1073741819) https://boinc.bakerlab.org/rosetta/result.php?resultid=51110957 https://boinc.bakerlab.org/rosetta/result.php?resultid=51130389 Also had one lock up for hours on one of my Linux machines with the cpu time not moving nor the % done or time left, so had to abort https://boinc.bakerlab.org/rosetta/result.php?resultid=50256485. The only thing I have done with the Pentium 4 machine is change that host from always running to run as per preferences so as to give the cpu a break. |
Jnargus Send message Joined: 4 Oct 06 Posts: 5 Credit: 7,935,482 RAC: 190 |
"Nice" to see that others are having the same problem I am. My linux box 366920 still seems to be getting Rosetta credit but most of the time the WU just stops updating with no messages. I have now suspended Rosetta on this machine as it "wastes" one of my cores when the WU is running but nothing is happening. My windows boxes don't seem to be having this problem so Rosetta will still run there. |
[AF>Slappyto] popolito Send message Joined: 8 Mar 06 Posts: 13 Credit: 1,041,105 RAC: 5 |
The last applications use a lot of memory (I have only 256mb of ram, I can't crunch for rosetta). |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Futura Sciences The project added the Docking work units and found they took considerable more memory then other work units, and so they changed things so that these work units would only be sent to computers with more memory. That was around mid October. The system requirements still shows 256MB should work fine. Most work units only use around 110MB. Those that are known to use more will only be sent to machines with more then the 256MB memory. So, unless you are having the screensaver problems, you should be crunching fine with 256MB. And if you are having screensaver problems, please just turn the screensaver to (none) until the problems are resolved. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
There are plenty of jobs which does not have a memory requirement in the queue and you should be able to receive jobs to crunch. What messages did you get? The last applications use a lot of memory (I have only 256mb of ram, I can't crunch for rosetta). |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
exit code -1073741819) is graphic-related. Not sure about the failure one on the linux machine and it may just get stuck. Does it happen very often? > Have had 8 WUs fail with the same error code and all on the same machine, Host 264297, a P4 2.53 GHz @ 2.75 GHz, not running Boinc screensaver just a standard Windows screensaver. Has been fine till 2 days ago. |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Hi, in your results from the linux host, I saw 'segmention violation" in those "cilent error" WUs. I assume these are the WUs you are reporting here. Can you describe a little bit more on what you have seen? Those jobs got stuck also? Did you manually abort those WUs? This will help us understand those stderr.txt files better and decide whether your reported problems are the same as Conan has reported in his post below. Thanks. "Nice" to see that others are having the same problem I am. My linux box 366920 still seems to be getting Rosetta credit but most of the time the WU just stops updating with no messages. I have now suspended Rosetta on this machine as it "wastes" one of my cores when the WU is running but nothing is happening. My windows boxes don't seem to be having this problem so Rosetta will still run there. |
Jnargus Send message Joined: 4 Oct 06 Posts: 5 Credit: 7,935,482 RAC: 190 |
Sun 10 Dec 2006 09:25:27 AM EST|rosetta@home|Pausing task BENCH_ABRELAX_SAVE_ALL_OUT_1ctf__BARCODE_R7_R40_filters_1441_553_0 (removed from memory) Sun 10 Dec 2006 09:25:27 AM EST|rosetta@home|Pausing task BENCH_ABRELAX_SAVE_ALL_OUT_1ctf__BARCODE_R7_R41_filters_1441_553_0 (removed from memory) These are the two WU I currently have. I just resumed the Rosetta project and BOINC nicely paused the two Einstein WU that were going but never gave me a message saying that the Rosetta WU had been restarted. The BOINC manager task pane shows the two Rosetta WUs as "Running" but the time and percentages are not changing and from what I've seen in the past the processes that are running in memory are not actually doing anything. I have reset the Rosetta project several time to clean out the WUs that are in the state these two are now. After Resetting the project it seems to work fine for a while and then I get some WUs that error out and then the rest just seem to hang. I have had them hang at around 1 hour of CPU time, the state the two I have now are at, and some times it gets over two hours before it hangs. All the WUs that have not been reported were ones that just hung and I since cleared them out by resetting the project. If you need any more info let me know 2006-12-10 08:23:53 [rosetta@home] Unrecoverable error for result BENCH_ABRELAX_SAVE_ALL_OUT_1ctf__BARCODE_R10_R13_filters_1441_512_0 (process exited with code 131 (0x83)) 2006-12-10 08:23:55 [rosetta@home] Unrecoverable error for result BENCH_ABRELAX_SAVE_ALL_OUT_4ubpA_BARCODE_R2_R77_filters_1441_493_0 (process got signal 11) These are the last two lines from the stderrae.txt file. (I would have included the previous post but I'm still too new at this to know how ;-) (Now I see the Reply to this Post button!)(Doh) |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one clapped out after 6hrs 40min not long after it restarted. On 10hr run time. No graphics or screensaver. https://boinc.bakerlab.org/rosetta/result.php?resultid=51329026 |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 71 |
> Thanks Chu, no it does not happen often, if fact hardly at all since I turned the Boinc screensaver off 3-4 weeks ago. But If you say that error is graphic related then there must be a problem with running ANY screensaver with Rosetta as I only run a standard Windows one and have since even turned that off. The machine has started to process again ok now. Spoke to soon, came home today and found I had a power glitch of some sort turning off all my computers bar 1 (on a UPS), on turning them back on the Intel that has been creating all theses reports and was just working again, has fried its power supply so further testing will have to wait (this may of been the reason for the errors? a failing PSU?). The workunits that lockup occur now and then and if caught early don't cause a problem, but I have had a couple go for days before I found them and lost heaps of processing because of it. exit code -1073741819) is graphic-related. Not sure about the failure one on the linux machine and it may just get stuck. Does it happen very often? |
Jnargus Send message Joined: 4 Oct 06 Posts: 5 Credit: 7,935,482 RAC: 190 |
https://boinc.bakerlab.org/result.php?resultid=51444398 https://boinc.bakerlab.org/result.php?resultid=51444399 I just aborted these two WU for failing to do anything. My system was still happily "Running" these WUs but nothing was happening. I also started Rosetta on my other linux (Debian) box and I will let you know if it has problems. Hope this helps Hi, in your results from the linux host, I saw 'segmention violation" in those "cilent error" WUs. I assume these are the WUs you are reporting here. Can you describe a little bit more on what you have seen? Those jobs got stuck also? Did you manually abort those WUs? This will help us understand those stderr.txt files better and decide whether your reported problems are the same as Conan has reported in his post below. Thanks. |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,815,413 RAC: 423 |
Getting quite a few of the messages "rosetta@home|rosetta not responding to screensaver, exiting" roughly one a day. I have reset the project with no change in frequency. Was happening occasionally with 5.40 but higher frequency with 5.41. I can't tell if it's graphics related. I have screensaver set to start after 2 minutes idle and goes to blank screen after 3 minutes. Processor: 2 GenuineIntel Intel(R) Pentium(R) D CPU 3.40GHz Memory: 2.00 GB physical, 3.85 GB virtual Disk: 222.65 GB total, 180.39 GB free Windows XP BOINC runs all the time, projects left in memory. |
[AF>Slappyto] popolito Send message Joined: 8 Mar 06 Posts: 13 Credit: 1,041,105 RAC: 5 |
There are plenty of jobs which does not have a memory requirement in the queue and you should be able to receive jobs to crunch. What messages did you get? No, I wanted to say, I can't crunch for rosetta because the application takes too much memory and it's annoying when I use the computer (the application takes more than 100mo). |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Problem? My end? Rosetta's end? Result ID 51580518 CPU time 17330.796875 stderr out <core_client_version>5.4.11</core_client_version> <stderr_txt> # random seed: 1782254 # cpu_run_time_pref: 21600 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score 3.29644 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .cc1tig.out </stderr_txt> Validate state Valid Claimed credit 62.638971651533 Granted credit 20 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Are these W.U's bad i had another error last night it was the same type as the other one i reported before, FRA_t369 something. I was the second to do it if anyone is interested, it Ran for 1hr 20 something. https://boinc.bakerlab.org/rosetta/result.php?resultid=51542218 |
Message boards :
Number crunching :
Problems with Rosetta version 5.41
©2025 University of Washington
https://www.bakerlab.org