Problems with Rosetta version 5.81

Message boards : Number crunching : Problems with Rosetta version 5.81

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 48640 - Posted: 14 Nov 2007, 1:11:05 UTC

14-Nov-2007 00:58:52 [rosetta@home] Computation for task MFR_SYMM_FOLD_AND_DOCK_RELAX_GB1_mutant_2286_15962_0 finished
14-Nov-2007 00:58:52 [rosetta@home] Output file MFR_SYMM_FOLD_AND_DOCK_RELAX_GB1_mutant_2286_15962_0_0 for task MFR_SYMM_FOLD_AND_DOCK_RELAX_GB1_mutant_2286_15962_0 absent

And now another!!
ID: 48640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 48652 - Posted: 14 Nov 2007, 13:04:48 UTC

still getting errors!!!!!!!!
ID: 48652 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 48658 - Posted: 14 Nov 2007, 18:25:29 UTC - in response to Message 48652.  

still getting errors!!!!!!!!

can someone tell if the problem is my computer or on rosettas end ??????????????
ID: 48658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 48661 - Posted: 14 Nov 2007, 18:31:31 UTC - in response to Message 48658.  

still getting errors!!!!!!!!

can someone tell if the problem is my computer or on rosettas end ??????????????


try removing the project form your boinc, and remove all files of rosetta from your harddrive, and then reinstall and add again to boinc. i head it to 1 time that all my wu's failed. and it helped for me
ID: 48661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 48665 - Posted: 14 Nov 2007, 19:50:09 UTC - in response to Message 48661.  

still getting errors!!!!!!!!

can someone tell if the problem is my computer or on rosettas end ??????????????


try removing the project form your boinc, and remove all files of rosetta from your harddrive, and then reinstall and add again to boinc. i head it to 1 time that all my wu's failed. and it helped for me

ok i have a work unit that is half done if that errors out ill try that
ID: 48665 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dingo
Avatar

Send message
Joined: 17 Sep 05
Posts: 14
Credit: 4,569,964
RAC: 0
Message 48666 - Posted: 14 Nov 2007, 19:55:21 UTC
Last modified: 14 Nov 2007, 19:57:17 UTC

I am getting multiple wu's with errors on this PC but not all wu's just a few. But it is waisted time if I do not get credit ?? I don't seem to be having any problems with other PC's.


655532 SWAN 313.95 4,418.22 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [x86 Family 6 Model 15 Stepping 11] Microsoft Windows XP
Professional Edition, Service Pack 2, (05.01.2600.00)


Exapmle of the wu's

119655219
119534960
119524831

Proud Founder and member of



Have a look at my WebCam
ID: 48666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 48677 - Posted: 14 Nov 2007, 22:00:08 UTC - in response to Message 48666.  

{...}
Exapmle of the wu's

119655219
119534960
119524831


Every one of the above WUs is a MFR_SYMM_FOLD_ etc.

That series of workunits seems to have an extraordinarily high failure rate.

Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 48677 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mhhall

Send message
Joined: 28 Mar 06
Posts: 7
Credit: 10,188,899
RAC: 3
Message 48681 - Posted: 15 Nov 2007, 1:46:23 UTC - in response to Message 48578.  

1) I note that the "explain" item does not document what a "Compute Error" is....

2) Work unit 106606679
1n0u__TREEJUMP_ABRELAX_NOTOR-1n0u_-_BARCODE__2241_1083
Appears to have failed on two different machines.

Seems like 2nd time that this has happened to me recently... other job was WU 106970621:
2reb__TREEJUMP_ABRELAX_TOR_EQ_-5_PROB_.5_SAVE_ALL_OUT-2reb_-_BARCODE__2243_7638_0

This looks like a programming issue on both counts
in same routine (ERROR:: Exit from: .pose.cc line: 769)
Or, there is a issue with software running
processes on my machine.


Seems that my machine has generated another "Compute Error" on work unit id
108644649 --- This time without the previously noted error. Since my previous
post, I have upgraded the BOINC Manager on my machine to 5.10.28. Was worried that my problem above might be due to out of date BOINC Manager.


ID: 48681 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dingo
Avatar

Send message
Joined: 17 Sep 05
Posts: 14
Credit: 4,569,964
RAC: 0
Message 48684 - Posted: 15 Nov 2007, 4:57:10 UTC

Yes you are correct. I have the same error on at least one other PC that is the same name MFR_SYMM_FOLD_ etc.

https://boinc.bakerlab.org/rosetta/result.php?resultid=120355804



Proud Founder and member of



Have a look at my WebCam
ID: 48684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 48698 - Posted: 15 Nov 2007, 15:20:59 UTC

Does anyone ever get this kind of result? Six hours and no decoys...and no credit. Not the first time this has happened to me. I get, er..um, annoyed, and suspend the project until I calm down a few days later.
ID: 48698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 48699 - Posted: 15 Nov 2007, 16:02:07 UTC - in response to Message 48698.  

Does anyone ever get this kind of result? Six hours and no decoys...and no credit. Not the first time this has happened to me. I get, er..um, annoyed, and suspend the project until I calm down a few days later.

That happens when the computation returned is deemed "invalid". Now as to why it's invalid is really up to you to sort out. I'd ask myself "does this happen often", if it's one in a thousand, I'd ignore it. If it's one in 100 then I'd start paying attention and prepare to look at maintenance. If it's 1/results page, then I'd start looking for the problem. Generally, I'd check (in order)
1)The message boards to see if it's a widely reported issue
2)for dust build up and ensure fans function
3)If I overclocked, I'd turn it down a snidge
4)Check to ensure UPS is working and not dropping power
5)Check Memory with memtest86+
6)Check harddrive with chkdsk or other program (perhaps from manufacturer)


ID: 48699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5661
Credit: 5,697,389
RAC: 1,919
Message 48700 - Posted: 15 Nov 2007, 16:19:48 UTC
Last modified: 15 Nov 2007, 16:21:31 UTC

treejump error number 100 or something like that.
https://boinc.bakerlab.org/result.php?resultid=118040131
core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 1488900
ERROR:: Exit from: .pose.cc line: 769

</stderr_txt>
]]>

this gives me compute and client errors
ID: 48700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 48707 - Posted: 15 Nov 2007, 19:13:35 UTC - in response to Message 48699.  

Does anyone ever get this kind of result? Six hours and no decoys...and no credit. Not the first time this has happened to me. I get, er..um, annoyed, and suspend the project until I calm down a few days later.

That happens when the computation returned is deemed "invalid". Now as to why it's invalid is really up to you to sort out. I'd ask myself "does this happen often", if it's one in a thousand, I'd ignore it. If it's one in 100 then I'd start paying attention and prepare to look at maintenance. If it's 1/results page, then I'd start looking for the problem. Generally, I'd check (in order)
1)The message boards to see if it's a widely reported issue
2)for dust build up and ensure fans function
3)If I overclocked, I'd turn it down a snidge
4)Check to ensure UPS is working and not dropping power
5)Check Memory with memtest86+
6)Check harddrive with chkdsk or other program (perhaps from manufacturer)
Well, what about "exit with no finished file" or "no heartbeat from client for 31 seconds"? I think I saw similar verbiage in the message log. Tasks restarted, but I question whether this function works properly after encountering these conditions.
ID: 48707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 48708 - Posted: 15 Nov 2007, 19:24:35 UTC - in response to Message 48707.  
Last modified: 15 Nov 2007, 19:27:06 UTC

Well, what about "exit with no finished file" or "no heartbeat from client for 31 seconds"? I think I saw similar verbiage in the message log. Tasks restarted, but I question whether this function works properly after encountering these conditions.

That's ones even harder to figure out, and I don't know how. You get that message when the application can't talk to Boinc. They both communicate via "shared memory", so something is using/blocking shared memory. I don't know how to check that, but hope someone does. The general recommendations are to ignore it. Dr. Anderson even changed the code to "just not display the message" during some conditions. Anyway, if it becomes severely frequent they recommend a "project reset", but I don't know how/why that would fix it.
ID: 48708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Trey

Send message
Joined: 3 Oct 06
Posts: 11
Credit: 110,142
RAC: 0
Message 48717 - Posted: 16 Nov 2007, 11:47:47 UTC

I have received validate errors on a couple of WUs -- apparently they have also produced errors for someone else as well:


ID: 48717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 48722 - Posted: 16 Nov 2007, 17:47:03 UTC

Result ID
120030734

Name
1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_3419_1

Workunit
106820764
Jmarks
ID: 48722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Trey

Send message
Joined: 3 Oct 06
Posts: 11
Credit: 110,142
RAC: 0
Message 48730 - Posted: 16 Nov 2007, 22:39:09 UTC

I have yet another MFR_SYMM_FOLD_AND_DOCK_RELAX error:

MFR_SYMM_FOLD_AND_DOCK_RELAX_GB1_mutant_2286_6276_1

Why are we still seeing these? These WUs have been reported to have high failure rates for several days now.
ID: 48730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 48731 - Posted: 16 Nov 2007, 23:34:15 UTC - in response to Message 48730.  
Last modified: 16 Nov 2007, 23:35:14 UTC

{...}
Why are we still seeing these? These WUs have been reported to have high failure rates for several days now.


The alert about MFR_SYMM_FOLD_AND_DOCK etc. workunits was posted on 11 November.

The last time a moderator/administrator posted in this thread was 10 November...
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 48731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 48747 - Posted: 17 Nov 2007, 14:39:50 UTC

2 more
120030734 106820764
120014410 107277392
Jmarks
ID: 48747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 21,676,514
RAC: 4,938
Message 48750 - Posted: 17 Nov 2007, 18:10:06 UTC

Here's one that had a validate error.
ID: 48750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Problems with Rosetta version 5.81



©2024 University of Washington
https://www.bakerlab.org