Message boards : Number crunching : Report Problems with Rosetta Version 5.07
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Bin Qian Send message Joined: 13 Jul 05 Posts: 33 Credit: 36,897 RAC: 0 |
Thanks for reporting. I think Moderator9 is right - it's likely a file transfer error and probably just an isolated case. just noticed in my previous message that rosetta vesion is shown as 5.01 |
Bin Qian Send message Joined: 13 Jul 05 Posts: 33 Credit: 36,897 RAC: 0 |
Hi Jose, ".fragments.cc line:722" says that Rosetta thinks the "fragment file" it's reading has wrong format. Since all the work units named HBLR_1.0_1dtj_ROT_TRIALS_TRIE_462_xxxxx_x will read in the same "fragment file" during rosetta initialization stage, it probably indicates that the file in your reported WU has crashed or been truncated during file transfering. We have received successful results for this batch so this is very likely an isolated case. But we will keep an eye on it. Thanks. A new type of error has shown up. (Meaning a "non 107 Type" . ) |
![]() ![]() Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
Iam running two WUs with HT on my P4, HBLR_xx and AB_CASP6.xx. Memory usage is 300MB but my Task Manager displays 911MB RAM total, seems to be a memory leak of 300MB somewhere (+300MB for XP)? |
Nightbird Send message Joined: 17 Sep 05 Posts: 70 Credit: 32,418 RAC: 0 |
and a problem : the wu stopped at 33.22 % done. I did a screenshoot with this wu "not working" (1di2) and an other wu working (2tif) ![]() |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Now I got a "client error" Does this means that the data I produced was not received by you? https://boinc.bakerlab.org/rosetta/result.php?resultid=18820316 Result ID 18820316 Name JUMP_ALLBARCODE03_1tul__468_770_0 Workunit 15562277 Created 1 May 2006 11:58:34 UTC Sent 1 May 2006 16:07:20 UTC Received 2 May 2006 9:46:12 UTC Server state Over Outcome Client error Client state Done Exit status -1073741819 (0xc0000005) Report deadline 15 May 2006 16:07:20 UTC CPU time 8928.046875 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 1732251 # random seed: 1732251 # cpu_run_time_pref: 14400 </stderr_txt> Validate state Invalid Claimed credit 31.1240952250415 Granted credit 0 application version 5.07 This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
The CPU efficiency is a "guess" from Boincview and not necessarily true. If the WU is really stuck (which happens rarely), Rosetta will auto-terminate it after an hour and return the result. |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
This just happened 7 units lost to computation errors in less than 8 minutes. This a verbatim copy of the message log recorded in the BOINC Manager. I am confused and searching for reasons of why this is continuously happening. To the rythm of "As the beats goes on"... 5/2/2006 7:28:48 AM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE04_1tul__468_770_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:28:48 AM||request_reschedule_cpus: process exited 5/2/2006 7:28:48 AM|rosetta@home|Computation for result JUMP_ALLBARCODE04_1tul__468_770_0 finished 5/2/2006 7:28:48 AM|rosetta@home|Starting result HBLR_1.0_1dtj_ROT_TRIALS_TRIE_462_13053_0 using rosetta version 507 5/2/2006 7:29:37 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1dtj_ROT_TRIALS_TRIE_462_13053_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:29:37 AM||request_reschedule_cpus: process exited 5/2/2006 7:29:37 AM|rosetta@home|Computation for result HBLR_1.0_1dtj_ROT_TRIALS_TRIE_462_13053_0 finished 5/2/2006 7:29:37 AM|rosetta@home|Starting result HBLR_1.0_1dtj_ROT_TRIALS_TRIE_461_13052_0 using rosetta version 507 5/2/2006 7:30:10 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1dtj_ROT_TRIALS_TRIE_461_13052_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:30:10 AM||request_reschedule_cpus: process exited 5/2/2006 7:30:10 AM|rosetta@home|Computation for result HBLR_1.0_1dtj_ROT_TRIALS_TRIE_461_13052_0 finished 5/2/2006 7:30:10 AM|rosetta@home|Starting result JUMP_ALLBARCODE07_1tul__468_2204_0 using rosetta version 507 5/2/2006 7:31:08 AM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE07_1tul__468_2204_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:31:08 AM||request_reschedule_cpus: process exited 5/2/2006 7:31:08 AM|rosetta@home|Computation for result JUMP_ALLBARCODE07_1tul__468_2204_0 finished 5/2/2006 7:31:09 AM|rosetta@home|Starting result HBLR_1.0_1n0u_ROT_TRIALS_TRIE_462_14487_0 using rosetta version 507 5/2/2006 7:31:14 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1n0u_ROT_TRIALS_TRIE_462_14487_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:31:14 AM||request_reschedule_cpus: process exited 5/2/2006 7:31:14 AM|rosetta@home|Computation for result HBLR_1.0_1n0u_ROT_TRIALS_TRIE_462_14487_0 finished 5/2/2006 7:31:14 AM|rosetta@home|Starting result HBLR_1.0_1mky_ROT_TRIALS_TRIE_462_14706_0 using rosetta version 507 5/2/2006 7:31:44 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_ROT_TRIALS_TRIE_462_14706_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:31:44 AM||request_reschedule_cpus: process exited 5/2/2006 7:31:44 AM|rosetta@home|Computation for result HBLR_1.0_1mky_ROT_TRIALS_TRIE_462_14706_0 finished 5/2/2006 7:31:44 AM|rosetta@home|Starting result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_15256_0 using rosetta version 507 5/2/2006 7:31:47 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_15256_0 ( - exit code -1073741819 (0xc0000005)) 5/2/2006 7:31:47 AM||request_reschedule_cpus: process exited 5/2/2006 7:31:47 AM|rosetta@home|Computation for result HBLR_1.0_1di2_ROT_TRIALS_TRIE_461_15256_0 finished This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Astro![]() Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Jose, download and run memtest86+ for several loops (a few hours). See if it finds a faulty memory module. Open your case and look for dust bunnies which could cause overheating. You might also run Speedfan and see what temps your system is at. tony |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Jose, download and run memtest86+ for several loops (a few hours). See if it finds a faulty memory module. Open your case and look for dust bunnies which could cause overheating. You might also run Speedfan and see what temps your system is at. Tony and the rest. It is clear now that everything is futile. Another wu JST FAILED. I am going to download one more unti. Shpuld that unit fail, I will detach. I am just at the end of my frustration levels. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Jose, download and run memtest86+ for several loops (a few hours). See if it finds a faulty memory module. Open your case and look for dust bunnies which could cause overheating. You might also run Speedfan and see what temps your system is at. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
I have only one machine. And dear Lord, my machine has only one processor. If more than one machine appear it is because of the quirks caused by the BOINC systesm when one has had to reattach to solve problems and the abscence of the merge functions that would give the real picture. As to the 4 processors...I really dont know what to say...but I doubt that something as obvious as a processor could be hidden when I inspected my motherboard. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Astro![]() Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Jose, running a boinc project (any) can be a "fortune teller" for your system. Since it runs the cpu at high levels for long periods, it will "test" your system. When errors appear in the boinc projects it can be a signal that it's time to maintain/service your machine. Let's face it, if there is an issue, you'll have to face/find it eventually anyway. Stopping a project will only delay the inevitable. Now, I don't know if your puter is having a problem or not. What I do see is that you're reporting an error that others are not. Given that it seems to be just you, then it is reasonable to think that it might be your system that needs attention. Running those tests and maybe GIMPS-Prime95, you'll be able to either find the issue, or rule it out as a cause. Calm down my friend, no need to get an ulcer from this stuff. LOL tony |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Jose, running a boinc project (any) can be a "fortune teller" for your system. Since it runs the cpu at high levels for long periods, it will "test" your system. When errors appear in the boinc projects it can be a signal that it's time to maintain/service your machine. Let's face it, if there is an issue, you'll have to face/find it eventually anyway. Stopping a project will only delay the inevitable. I am calm. Right now detaching and removing BOINC is becoming the more rational of the possibilities. I will have my machine checked up. But, I need the frustration this is causing as I need a callus in my but. I am sad. I thought I could do something useful but, alas all I have been able to do is mwaste my time and yours. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Astro![]() Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I am calm. Right now detaching and removing BOINC is becoming the more rational of the possibilities. I will have my machine checked up. But, I need the frustration this is causing as I need a callus in my but. I am sad. I thought I could do something useful but, alas all I have been able to do is mwaste my time and yours. Well, Jose, you must do what you must do. Remember, Boinc takes advantage of otherwise "unused" cycles. So in effect, you're choosing to waste those cycles, rather than allowing them to come to some benefit. You need to do what's best for you. Good luck in whatever you choose. tony- |
Whl. Send message Joined: 29 Dec 05 Posts: 203 Credit: 275,802 RAC: 0 |
Pardon the intrusion guys, but does'nt the 4 just mean it is a Pentium 4 ? ![]() |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
I am calm. Right now detaching and removing BOINC is becoming the more rational of the possibilities. I will have my machine checked up. But, I need the frustration this is causing as I need a callus in my but. I am sad. I thought I could do something useful but, alas all I have been able to do is mwaste my time and yours. Tony the cycles are being wasted: Most of the errors are producing waste. And yes, I will do what I must do. Take care Jose This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Nightbird Send message Joined: 17 Sep 05 Posts: 70 Credit: 32,418 RAC: 0 |
The problem is that the wu 1di2 is in this state since 2 days now. Perhaps i must abort the wu. (?) |
Astro![]() Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Jose, How many puters do you really have? I see six IDENTICAL puters in your account and the benchmarks are all over the map. 1) Measured floating point speed 2009.88 million ops/sec Measured integer speed 4014.11 million ops/sec 2) Measured floating point speed 2012.98 million ops/sec Measured integer speed 4045.58 million ops/sec 3) Measured floating point speed 545.31 million ops/sec Measured integer speed 3966.71 million ops/sec 4) Measured floating point speed 1276.07 million ops/sec Measured integer speed 5114.47 million ops/sec 5) Measured floating point speed 1986.21 million ops/sec Measured integer speed 3371.27 million ops/sec 6) Measured floating point speed 1154.1 million ops/sec Measured integer speed 235.34 million ops/sec If you just have one machine continuously being attached/detached then you have a issue here. Note: none of this conversation belongs in this thread, maybe a mod could move them. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I did a screenshoot with this wu "not working" (1di2) and an other wu working (2tif) Are you saying that the CPU time is not increasing, even though it's "running"? Is the idle process getting all the CPU time when this WU is "running"? I've seen something like that months ago (but not recently). It happened when BOINC stopped the WU and ran the benchmark. For some reason the rosetta client didn't restart even though BOINC said it was "running". I was able to see this by looking through the "messages". Restarting BOINC got the WU going again. This would be serious, because if the rosetta client isn't actually running then the watchdog won't be running either. Is there anything in the messages around the time that this WU stopped? |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
First of all I would exit BOINC and restart and see if the WU "revives". If that isn't the case I'd abort it. |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.07
©2025 University of Washington
https://www.bakerlab.org