Problems with Rosetta version 5.98

Message boards : Number crunching : Problems with Rosetta version 5.98

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 54181 - Posted: 5 Jul 2008, 4:45:46 UTC

FRA_t449_CASP8_MANUAL_1_IGNORE_THE_RESTt449_1_ttxxxxT0449_1CHIM_0001_0001_0001_4126_1926_1

errors Too many error results

CPU time 36.42623
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2406548
ERROR:: Exit from: .loop_relax.cc line: 1745

</stderr_txt>
]]>
ID: 54181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 54183 - Posted: 5 Jul 2008, 4:53:29 UTC

FRA_t449_CASP8_MANUAL_1_IGNORE_THE_RESTt449_1_ttxxxxT0449_1CHIM_0001_0001_0001_4126_3868_1

errors Too many error results

stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2404606
ERROR:: Exit from: .loop_relax.cc line: 1745

</stderr_txt>
]]>


Validate state Invalid
ID: 54183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile kb7rzf
Avatar

Send message
Joined: 7 Oct 05
Posts: 16
Credit: 35,427
RAC: 0
Message 54185 - Posted: 5 Jul 2008, 8:48:39 UTC - in response to Message 54164.  

The 2nd result on this work unit also got the same error.

Well, had 1 compute error since I started crunching again here. The wu is FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1226

The info from the STDERR OUT:

stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2143013
======================================================
DONE :: 1 starting structures 20322.7 cpu seconds
This process generated 7 decoys from 7 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1226_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

ID: 54185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,691,837
RAC: 1,806
Message 54186 - Posted: 5 Jul 2008, 10:03:47 UTC

rosetta@home|Task t443_FULL_h001__CASP8_LONGRANGE_JUMP_SAVE_ALL_OUT_BARCODE__4133_145188_0 exited with a DLL initialization error.
|rosetta@home|If this happens repeatedly you may need to reboot your computer.
|rosetta@home|Restarting task t443_FULL_h001__CASP8_LONGRANGE_JUMP_SAVE_ALL_OUT_BARCODE__4133_145188_0 using rosetta_beta version 598

it was at 100% and 5 hrs and when I opened the graphics window it just went blank.
when i tried to close the window then it went into not responding.
finally got the window to close and it reset itself to 42%.
ID: 54186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BrnmccO1

Send message
Joined: 26 Jun 07
Posts: 17
Credit: 578,825
RAC: 0
Message 54188 - Posted: 5 Jul 2008, 16:27:27 UTC

159639723

Compute error after full run, also failed on someone else's host as well. Output file missing.

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2136320
======================================================
DONE :: 1 starting structures 10289.6 cpu seconds
This process generated 1 decoys from 1 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2909_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


ID: 54188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 54189 - Posted: 5 Jul 2008, 16:27:42 UTC

this FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_78 WU seemed to crunch correctly, then it bombed out with:

<message><file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_78_1_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>

This WU did the same for the other cruncher as well.
ID: 54189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BrnmccO1

Send message
Joined: 26 Jun 07
Posts: 17
Credit: 578,825
RAC: 0
Message 54190 - Posted: 5 Jul 2008, 16:32:26 UTC - in response to Message 54189.  

this FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_78 WU seemed to crunch correctly, then it bombed out with:

<message><file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_78_1_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>

This WU did the same for the other cruncher as well.


I got the same -161 Output file missing error from one of my t453's as well.
ID: 54190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 54199 - Posted: 6 Jul 2008, 1:38:10 UTC

Another 8hrs wasted, is this a boinc,app,server or workunit problem Someone! Anyone!

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=159608923

7/6/2008 11:19:02 AM|rosetta@home|Output file for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_823_1 absent

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2143416
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 29142 cpu seconds
This process generated 10 decoys from 10 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_823_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

pete.

ID: 54199 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 54200 - Posted: 6 Jul 2008, 3:09:29 UTC

Here's two more that crunched to completion, then bombed out with the -161 error for both me and the other cruncher:

FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1501
FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1279
ID: 54200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 54245 - Posted: 7 Jul 2008, 16:17:39 UTC

t451_M4_grishin_IGNORE_THE_REST_renumbered_4150_1393_0

Outcome Validate error

CPU time 10275.77
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
#
</stderr_txt>
]]>


Validate state Invalid
Claimed credit 42.7165467457025
Granted credit 0
application version 5.98
ID: 54245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 54250 - Posted: 7 Jul 2008, 18:17:11 UTC

The Server Status page currently says that the validator is not running.
ID: 54250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 54264 - Posted: 7 Jul 2008, 23:51:47 UTC - in response to Message 54250.  

The Server Status page currently says that the validator is not running.


I've EMailed the Project Team pointing this out. Thanks for pointing it out.
Rosetta Moderator: Mod.Sense
ID: 54264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 54267 - Posted: 8 Jul 2008, 0:49:33 UTC - in response to Message 54264.  

Thanks... we changed the validator code earlier today after testing on RALPH, but there's clearly still an issue! I've contacted DK to revert to the old code.

The Server Status page currently says that the validator is not running.


I've EMailed the Project Team pointing this out. Thanks for pointing it out.


ID: 54267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TeAm Enterprise
Avatar

Send message
Joined: 28 Sep 05
Posts: 18
Credit: 27,904,257
RAC: 4
Message 54268 - Posted: 8 Jul 2008, 2:07:13 UTC

You have bigger problems than the validator code.

Since 5.96 I have never had more problems with errors.

I have just aborted all the T484 WUs since these don't work on my machine. Are you folks getting any science or just problems.

Jim
Crunch with friends - TeAm Anandtech
ID: 54268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alan Roberts

Send message
Joined: 7 Jun 06
Posts: 61
Credit: 6,901,926
RAC: 0
Message 54456 - Posted: 12 Jul 2008, 14:25:13 UTC

I've had so many problems with Mini (see this post) that I've had to resort to filtering it off of quite a few of my dual-core/dual-CPU machines.

This morning I walked into my home's listening room to find my recycled laptop, low-power music server (that to-date has happily consumed anything Rosetta sent its way) making excessive noise. Checking I found this 5.98 WU stuck at 100% CPU, even though the machine's preferences were set for max of 70% of CPU (BOINC 5.10.45, and BOINC CPU setting has been honored in the past). Within BOINC, CPU time used and progress were {b]not[/b] advancing, the job was sitting at 20-something percent progress.

Suspending the project did not suspend the job. Shutting down the BOINC service did. Ran a round of Windows updates and rebooted. The work unit restarted and ran with CPU throttling for about 10 minutes, then locked up at 100% again. This time I aborted the task ... I believe the first time across any of the machines on my team that I've had to abandon a 5.98 work unit.

The worst news for me is that the long (possibly better part of two days) non-cycling fan run seems to have put the fan into a permanent high-noise mode. I've got a spare fan assembly, but won't really enjoy the time to tear down and reassemble the unit this weekend.

I guess I'll reinstall and setup Threadmaster, since the BOINC/Rosetta combination seems to be trending towards less operational reliability.

I know everyone is busy with CASP, but I have to emphasize that this is important to me, and I assume to others who are trying to contribute with machines that are not dedicated crunchers. Most of the machines on my team are there because I committed to not loading the machine during business hours (time-of-day and when needed manual suspends) and not overheating the machine (CPU limits). If I can't reliably do this with minimal ongoing effort I'll end up having to pull machines off the project.
ID: 54456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 54467 - Posted: 13 Jul 2008, 3:58:27 UTC

errors Too many error results

FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2851_0


CPU time 8831.965
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2136378
======================================================
DONE :: 1 starting structures 8831.64 cpu seconds
This process generated 4 decoys from 4 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2851_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Validate state Invalid
ID: 54467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Harwood

Send message
Joined: 15 Nov 05
Posts: 1
Credit: 1,789,800
RAC: 0
Message 54562 - Posted: 18 Jul 2008, 1:14:00 UTC

I am running on an AMD Athlon in Win Server 2k8 with rosetta_beta_5.98_windows_x86_64.exe and I noticed in the task manager that it is running in 32 bit compatablity. Could it be that a flag is thrown and this is a 64 bit app? Irregardless, we are not getting the performance for the project. Its running, but it could be better.
ID: 54562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 54566 - Posted: 18 Jul 2008, 13:05:57 UTC - in response to Message 54562.  

I am running on an AMD Athlon in Win Server 2k8 with rosetta_beta_5.98_windows_x86_64.exe and I noticed in the task manager that it is running in 32 bit compatablity. Could it be that a flag is thrown and this is a 64 bit app? Irregardless, we are not getting the performance for the project. Its running, but it could be better.


At this point, there is no true 64bit application. The Project Team is aware of the performance implications of that fact.
Rosetta Moderator: Mod.Sense
ID: 54566 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Azurrio

Send message
Joined: 20 Feb 06
Posts: 8
Credit: 237,979
RAC: 0
Message 54591 - Posted: 21 Jul 2008, 11:12:06 UTC

Computer/validate error on this
ID: 54591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alberthuang

Send message
Joined: 5 Dec 05
Posts: 6
Credit: 171,257
RAC: 0
Message 54624 - Posted: 23 Jul 2008, 7:31:22 UTC

My computer's OS is Windows XP SP3, using the BOINC manager version 5.10.45. It computed the workunit n004__BOINC_SYMMETRY_C4SYMM_FOLD_AND_DOCK_RELAX-n004_-t484__4207_923 with Rosetta beta version 5.98, and showed compute error after full run. Then a windows message also showed that Windows C++ Runtime error at the same time, and the output file n004__BOINC_SYMMETRY_C4SYMM_FOLD_AND_DOCK_RELAX-n004_-t484__4207_923_0_0 for this task was missing. The task detail is in the following:

Task ID 176331854
Name n004__BOINC_SYMMETRY_C4SYMM_FOLD_AND_DOCK_RELAX-n004_-t484__4207_923_0
Workunit 160936951
Created 9 Jul 2008 2:51:35 UTC
Sent 9 Jul 2008 2:52:15 UTC
Received 18 Jul 2008 10:17:47 UTC
Server state Over
Outcome Client error
Client state Done
Exit status 3 (0x3)
Computer ID 224205
Report deadline 19 Jul 2008 2:52:15 UTC
CPU time 14199.22
stderr out <core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
系統找不到指定的路徑。 (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 1315613

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 27.2519995630333
Granted credit 0
application version 5.98

And before this workunit crashed, the BOINC manager downloaded a workunit with Rosetta beta version 5.98. At the same time, two previous files of Rosetta@home were deleted when the BOINC manager got server request of Rosetta@home! I wondered if this workunit's crash was in connection with the deletion of two previous files.


ID: 54624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.98



©2024 University of Washington
https://www.bakerlab.org