Problems with Rosetta version 5.98

Message boards : Number crunching : Problems with Rosetta version 5.98

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 55917 - Posted: 21 Sep 2008, 2:50:55 UTC

Can someone from the lab have a look at this one it has problems!


t040_1_NMRREF_1_t040_1_S_00001_0000482_0IGNORE_THE_REST_core_4463_863_2


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=174466679


stderr out <core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3420912
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 20508 cpu seconds
This process generated 7 decoys from 7 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Workunit error - check skipped

pete.

ID: 55917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R.L. Casey

Send message
Joined: 7 Jun 06
Posts: 91
Credit: 2,727,582
RAC: 0
Message 55918 - Posted: 21 Sep 2008, 4:01:18 UTC - in response to Message 55917.  

Can someone from the lab have a look at this one it has problems!


t040_1_NMRREF_1_t040_1_S_00001_0000482_0IGNORE_THE_REST_core_4463_863_2


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=174466679


stderr out <core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3420912
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 20508 cpu seconds
This process generated 7 decoys from 7 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Workunit error - check skipped

pete.


Pete,
Looking at the Workunit, it seems that the validator rejected your resukts because the maximum number of results, two, had already been received by the time your result was returned. It looks as if the Projet was in error to send the WU to you, since it alrady had sent it out twice on September 10, the second time sent because of an error returned by the first cruncher. The, this second cruncher took over ten days to return a result, and it just happened to be between the time it was sent to you on the 20th and the time you returned your result! Based upon the time the WU was issued to you, it appears that the project had assumed no result would be returned by the second cruncher afer ten days had elapsed. If I were in charge of the world, I'd validate your WU and grant you credit, and purge the (slightly earlier) result based upon the fact that it was so late that it was assumed to be lost, but... :-)

Thanks for crunching!!
ID: 55918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jack

Send message
Joined: 19 Feb 07
Posts: 11
Credit: 513,346
RAC: 48
Message 55965 - Posted: 23 Sep 2008, 1:02:21 UTC

Rosetta Beta 5.98 does not share CPU fairly with other BOINC projects. My preferences were set to the default 60 minutes and I tried changing it to 70 minutes just to force an update, but Rosetta keeps running long past the time it should pause and let another project have some time.

This task: 193947855 started running at 2:45 this afternoon, used over six hours of CPU time and was still running when I suspended it so other projects could get some time. On previous recent tasks, Rosetta would run about 3 hours before pausing.
ID: 55965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55966 - Posted: 23 Sep 2008, 2:20:05 UTC

The BOINC Manager is making the fairness decisions, not Rosetta. It is also trying to optimize the use of your machine's crunch time by not losing work that is still in progress, but has not been checkpointed yet.

Either Rosetta is owed time from when other projects were crunching, or perhaps it has a long running model and has not taken a checkpoint yet. BOINC keeps track of it all. If Rosetta does run long before checkpointing, BOINC will pay back the other project(s) the time they are owed.

When you tell BOINC to "switch" every xx minutes, you aren't actually instructing it to switch. You are simply telling it how often to consider making a switch.

The manager will enforce the resource shares you have configured between your projects. But look for this to be true over the course of 100 hours, not 100 minutes.
Rosetta Moderator: Mod.Sense
ID: 55966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56007 - Posted: 24 Sep 2008, 17:47:28 UTC
Last modified: 24 Sep 2008, 17:49:30 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=193810495
t042_1_NMRREF_1_t042_1_S_00001_0006461IGNORE_THE_REST_010000_4471_6183_0
core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2556542

this pig locked up my system and it moved on to another task which locked up and also locked up einstein.


also t042_1_NMRREF_1_t042_1_S_00002_0009608IGNORE_THE_REST_050000_4471_6695_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=193877090

died shortly afterwards, my system was locked up and I had to use task manager to abort boinc mgr and then rosetta and einstein and then reboot to get control of my system again.
ID: 56007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56010 - Posted: 24 Sep 2008, 20:16:20 UTC

looked up the error message, seems that perhaps my graphics driver caused a conflict? I updated it and will see what happens.
ID: 56010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 56013 - Posted: 25 Sep 2008, 4:22:10 UTC

This errored after 4hrs, 44min it had done 2 models just before it finished.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=177652466

9/25/2008 1:58:06 PM|rosetta@home|Output file AA2A_11_modeling_1_AA2A_1_AA2A_2VTA_align_4556_2991_0_0 for task AA2A_11_modeling_1_AA2A_1_AA2A_2VTA_align_4556_2991_0 absent

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2669370
ERROR:: Exit from: .refold.cc line: 338

</stderr_txt>

pete.

ID: 56013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 56014 - Posted: 25 Sep 2008, 4:25:22 UTC

ID: 56014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56022 - Posted: 25 Sep 2008, 9:54:32 UTC

AA2A_6_modeling_1_AA2A_1_AA2A_2RH1_align_4492_13278_0

this completed ok but gave this error message or warning

CPU time 20691.98
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3360223
# cpu_run_time_pref: 21600
# random seed: 3360223
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 21600
# random seed: 3360223
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 21600
# random seed: 3360223
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 20691 cpu seconds
This process generated 7 decoys from 7 attempts
0 starting pdbs were skipped
======================================================


ID: 56022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,308,072
RAC: 164
Message 56039 - Posted: 26 Sep 2008, 18:19:31 UTC

This workunit is valid but stderr out is enormous:

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 43200
# random seed: 2792818
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
.
/// This line is repeated 516 times ///
.
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
======================================================
DONE :: 1 starting structures 43239.7 cpu seconds
This process generated 45 decoys from 45 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

ID: 56039 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56064 - Posted: 27 Sep 2008, 20:32:23 UTC
Last modified: 27 Sep 2008, 20:33:39 UTC

HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_4554_12169_1

This failed on my computer and another one: 888227
That person errors out with:
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00BCCCF2 write attempt to address 0x1F46E428


I error out with:
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00BCCCF2 write attempt to address 0x1F5063B8


get crappy credit for wasting 8541.453 seconds on this task only a big fat goose egg

Better quality control of your coding is needed.
ID: 56064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1611
Credit: 28,967,205
RAC: 19,601
Message 56067 - Posted: 28 Sep 2008, 2:12:42 UTC

[url=https://boinc.bakerlab.org/rosetta/result.php?resultid=194793599[Task ID 194793599[/url] gave a Compute Error with the std err:

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3424460
Can't acquire lockfile - exiting
[...]
Can't acquire lockfile - exiting

I get this frequently under MiniRosetta 1.34 but this is the first time under Beta 5.98. All other WUs have run fine though.
ID: 56067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 162
Credit: 703,854
RAC: 0
Message 56068 - Posted: 28 Sep 2008, 3:42:30 UTC - in response to Message 56067.  
Last modified: 28 Sep 2008, 3:46:25 UTC

Task ID 194793599 gave a Compute Error with the std err:

Sid Celery I've fixed up the the link to your result for you
Have a crunching good day!!
ID: 56068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Otto

Send message
Joined: 6 Apr 07
Posts: 27
Credit: 3,567,665
RAC: 0
Message 56080 - Posted: 29 Sep 2008, 13:55:39 UTC
Last modified: 29 Sep 2008, 13:56:38 UTC

(I'm using Boinc 6.2.19 on Windows XP)

When I press the graphics button in the main view, no graphics will show up, but the button itself disappears, so that I'm unable to see any graphics at all. Strange bug/issue.
ID: 56080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56089 - Posted: 29 Sep 2008, 20:19:12 UTC - in response to Message 56080.  

I asked this question awhile back, and I believe the answer was that the new boinc managers do not support the graphics application of 5.98.

(I'm using Boinc 6.2.19 on Windows XP)

When I press the graphics button in the main view, no graphics will show up, but the button itself disappears, so that I'm unable to see any graphics at all. Strange bug/issue.

ID: 56089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 56091 - Posted: 29 Sep 2008, 21:16:32 UTC - in response to Message 56089.  

I asked this question awhile back, and I believe the answer was that the new boinc managers do not support the graphics application of 5.98.

(I'm using Boinc 6.2.19 on Windows XP)

When I press the graphics button in the main view, no graphics will show up, but the button itself disappears, so that I'm unable to see any graphics at all. Strange bug/issue.


Nearly. They still do, but not when BOINC is installed in the protected mode (as a service).

But still, it is strange that the button was initially available for a click.

Peter
ID: 56091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56109 - Posted: 30 Sep 2008, 13:00:48 UTC - in response to Message 56091.  

I asked this question awhile back, and I believe the answer was that the new boinc managers do not support the graphics application of 5.98.

(I'm using Boinc 6.2.19 on Windows XP)

When I press the graphics button in the main view, no graphics will show up, but the button itself disappears, so that I'm unable to see any graphics at all. Strange bug/issue.


Nearly. They still do, but not when BOINC is installed in the protected mode (as a service).

But still, it is strange that the button was initially available for a click.

Peter


You say 'protected mode', thing is I installed Boinc mgr in its standard form, I did not select anything different than the defaults. So is 'protected mode' the default? If so, how do you change it so that 5.98 will show the graphics?
ID: 56109 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 56112 - Posted: 30 Sep 2008, 14:19:17 UTC - in response to Message 56109.  

not when BOINC is installed in the protected mode (as a service).

You say 'protected mode', thing is I installed Boinc mgr in its standard form, I did not select anything different than the defaults. So is 'protected mode' the default? If so, how do you change it so that 5.98 will show the graphics?

Yes, default form for 6.x is a service. In this mode, science apps are running under newly added [i]boinc_project[/usr] account and have no access to your (the logged-in user) desktop. (See also How to install BOINC as a service (BOINC 6 series) on Windows?.)

Run the installation again and when on the BOINC Configuration page, press the "Advanced" button and then switch off the "Protected application execution" (a.k.a. "Service") mode checkbox. The client and applications will be then run under your account, started directly as Manager's child processes (the "good old way").

Peter
ID: 56112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56129 - Posted: 30 Sep 2008, 20:08:52 UTC - in response to Message 56112.  

not when BOINC is installed in the protected mode (as a service).

You say 'protected mode', thing is I installed Boinc mgr in its standard form, I did not select anything different than the defaults. So is 'protected mode' the default? If so, how do you change it so that 5.98 will show the graphics?

Yes, default form for 6.x is a service. In this mode, science apps are running under newly added [i]boinc_project[/usr] account and have no access to your (the logged-in user) desktop. (See also How to install BOINC as a service (BOINC 6 series) on Windows?.)

Run the installation again and when on the BOINC Configuration page, press the "Advanced" button and then switch off the "Protected application execution" (a.k.a. "Service") mode checkbox. The client and applications will be then run under your account, started directly as Manager's child processes (the "good old way").

Peter


thanks, i will look at that this weekend. don't have allot of time during the week to do much here on the computer except read/write and run.
ID: 56129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4877
Credit: 4,570,325
RAC: 1,904
Message 56160 - Posted: 1 Oct 2008, 21:45:05 UTC
Last modified: 1 Oct 2008, 21:46:17 UTC

I just watched this HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_4539_39952_0 one die on me. 1:48 computation time and then it even pops up a windows error box.

Error msg is this:
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 871217
Report deadline 8 Oct 2008 18:59:56 UTC
CPU time 6482.563
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2791509


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00BCCCF2 write attempt to address 0x1F4B486C

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.1.5


Dump Timestamp : 10/01/08 23:39:03
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
Debugger Engine : 4.0.5.0
Symbol Search Path: E:boincprojectsslots;E:boincprojectsprojectsboinc.bakerlab.org_rosetta;srv*C:WINDOWSTEMPsymbols*http://msdl.microsoft.com/download/symbols;srv*C:WINDOWSTEMPsymbols*https://boinc.bakerlab.org/rosetta/symstore


SymGetModuleInfo(): GetLastError = 87
ModLoad: 00000000 00000000 ( Symbols Loaded) repeats 23 times in total
*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 4016, Write: 0, Other 6432

- I/O Transfers Counters -
Read: 0, Write: 66417, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 51956, QuotaPeakPagedPoolUsage: 51956
QuotaNonPagedPoolUsage: 4200, QuotaPeakNonPagedPoolUsage: 5032

- Virtual Memory Usage -
VirtualSize: 252223488, PeakVirtualSize: 256581632

- Pagefile Usage -
PagefileUsage: 206942208, PeakPagefileUsage: 225918976

- Working Set Size -
WorkingSetSize: 120836096, PeakWorkingSetSize: 139542528, PageFaultCount: 2228047

*** Dump of thread ID 2180 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, ,

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00C4FCF3 read attempt to address 0xC0000000

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>

NO CREDIT! My RAC is already low enough, now thanks to this it goes lower yet. GEES
ID: 56160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.98



©2021 University of Washington
https://www.bakerlab.org