Problems with Rosetta version 5.80

Message boards : Number crunching : Problems with Rosetta version 5.80

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 48365 - Posted: 5 Nov 2007, 3:33:15 UTC

This one failed after 34sec, first one in a while.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=107003644

11/5/2007 2:12:36 PM|rosetta@home|Reason: Unrecoverable error for result 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0 (Incorrect function. (0x1) - exit code 1 (0x1))

11/5/2007 2:12:36 PM|rosetta@home|Computation for task 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0 finished

11/5/2007 2:12:36 PM|rosetta@home|Output file 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0_0 for task 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0 absent

Pete.

ID: 48365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rsubler

Send message
Joined: 24 Jun 07
Posts: 8
Credit: 172,618
RAC: 0
Message 48378 - Posted: 5 Nov 2007, 17:22:05 UTC

After 19 hours of crunching on work unit 106367190 this error condition has suddenly appeared.

I tried limiting BOINC to this work unit, reboooting my computer and running only BOINC -- to no avail. The error condition persists.

The computer is an AMD x2 3800+, both cores are available for BOINC and 90% of the 1 gig physical memory and 75% of a 3 gig swap file are available to BOINC.

Is there anything that I can do besides purging the WU?

Ron
ID: 48378 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 48380 - Posted: 5 Nov 2007, 17:35:11 UTC

rsubler, if you open Windows task manager and go to the processes tab, how much memory does it indicate is used for that process? (it will have Rosetta in the name)
Rosetta Moderator: Mod.Sense
ID: 48380 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rsubler

Send message
Joined: 24 Jun 07
Posts: 8
Credit: 172,618
RAC: 0
Message 48381 - Posted: 5 Nov 2007, 17:43:33 UTC

Mod sense

When I activate the next Rosetta WU (also 5.80), Task Manager shows 130,808k.

When I try to activate the problem WU, Task Manager does not have a Rosetta entry.

This is with all other BOINC WUs suspended.

Ron
ID: 48381 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 48383 - Posted: 5 Nov 2007, 18:04:40 UTC

rsubler, you say "after 19 hrs of crunching"... has it recorded that much CPU time? Or has it been "waiting for memory" that long? If it has recorded that much CPU time, then it should mean it has been in a "running" status for that long. Has the amount of time spent increased in the last several hours?

Looks like your runtime preference must be 24hrs. What is shown for the % completed on it now?

Have you exited and restarted BOINC since you noticed this one?
Rosetta Moderator: Mod.Sense
ID: 48383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rsubler

Send message
Joined: 24 Jun 07
Posts: 8
Credit: 172,618
RAC: 0
Message 48384 - Posted: 5 Nov 2007, 18:11:04 UTC
Last modified: 5 Nov 2007, 18:25:34 UTC

Mod.Sense

1. That is the amount of CPU time shown by Boinc. It has not changed for the last hour or two -- since I noticed the problem.

2. Yes, I have been running 24 hour Rosetta WUs for some months. The problem WU is now showing 79.646% complete.

3. Maybe. I restricted BOINC to the problem WU by suspending all others and then rebooted my system. I did not explicitly exit BOINC. I will try this next.
I just did an exit of BOINC and restart. The problem persists.

Thanks,
Ron
ID: 48384 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 48395 - Posted: 5 Nov 2007, 20:35:08 UTC

I had another task fail last night same type, different computer.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=107168558

11/5/2007 7:25:53 PM|rosetta@home|Reason: Unrecoverable error for result 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0 (Incorrect function. (0x1) - exit code 1 (0x1))

11/5/2007 7:25:53 PM|rosetta@home|Computation for task 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0 finished

11/5/2007 7:25:53 PM|rosetta@home|Output file 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0_0 for task 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0 absent

Is someone looking at this problem.

Pete.


ID: 48395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rsubler

Send message
Joined: 24 Jun 07
Posts: 8
Credit: 172,618
RAC: 0
Message 48403 - Posted: 5 Nov 2007, 23:55:44 UTC

To Mod.Sense

Re: Waiting for memory, rsubler.

The problem has cured itself and the WU is now running.

I wish anyone good luck in duplicating the fault and cure. The WU resumed running
while I was playing an ancient game with DOSBOX. I thought that DOSBOX was a
resource hog, but that is the only variable of which I am aware.

Cheers,
Ron
ID: 48403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,824,453
RAC: 965
Message 48440 - Posted: 7 Nov 2007, 12:08:33 UTC
Last modified: 7 Nov 2007, 12:09:48 UTC

Had one of these errors on the 6th and now another two on the 7th.
2 were on Windows machine and 1 on a Linux machine all with the same error code on the same '1n0u' type WU.

This WU
This WU
This WU

<core_client_version>5.8.15</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1408543
ERROR:: Exit from: .pose.cc line: 769

Also on core client version 5.10.21
ID: 48440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 48470 - Posted: 8 Nov 2007, 19:09:10 UTC

[url=https://boinc.bakerlab.org/rosetta/result.php?resultid=117906669
]a failed one[/url]

ERROR:: Exit from: .pose.cc line: 769
ID: 48470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 48476 - Posted: 8 Nov 2007, 21:26:43 UTC

Result ID 118032106
Name 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13258_0
Workunit 107265106
Created 5 Nov 2007 8:09:11 UTC
Sent 5 Nov 2007 11:10:58 UTC
Received 8 Nov 2007 16:17:30 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 510574
Report deadline 15 Nov 2007 11:10:58 UTC
CPU time 5042.90625
stderr out <core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1419133
ERROR:: Exit from: .pose.cc line: 769

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 20.725275001017
Granted credit 0
application version 5.80

ID: 48476 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 48477 - Posted: 8 Nov 2007, 21:28:10 UTC

To Luuklag.

SNAP.
ID: 48477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 48486 - Posted: 9 Nov 2007, 7:34:20 UTC

Result ID 118192540
Name 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-5_PROB_.5_SAVE_ALL_OUT-1n0u_-_BARCODE__2243_13643_1
Workunit 107277272
Created 5 Nov 2007 22:07:48 UTC
Sent 5 Nov 2007 22:08:22 UTC
Received 8 Nov 2007 23:31:01 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 510574
Report deadline 15 Nov 2007 22:08:22 UTC
CPU time 3377.8125
stderr out <core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1488748
ERROR:: Exit from: .pose.cc line: 769

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 13.8820928833196
Granted credit 0
--also failed as WU 118046654 5 Nov.
ID: 48486 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 48495 - Posted: 9 Nov 2007, 12:30:30 UTC

could someone tell me why i got all these errors
ID: 48495 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10

Message boards : Number crunching : Problems with Rosetta version 5.80



©2024 University of Washington
https://www.bakerlab.org