Message boards : Number crunching : Problems with Rosetta version 5.98
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Can someone from the lab have a look at this one it has problems! t040_1_NMRREF_1_t040_1_S_00001_0000482_0IGNORE_THE_REST_core_4463_863_2 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=174466679 stderr out <core_client_version>5.10.30</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 3420912 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 20508 cpu seconds This process generated 7 decoys from 7 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> Validate state Workunit error - check skipped pete. |
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
Can someone from the lab have a look at this one it has problems! Pete, Looking at the Workunit, it seems that the validator rejected your resukts because the maximum number of results, two, had already been received by the time your result was returned. It looks as if the Projet was in error to send the WU to you, since it alrady had sent it out twice on September 10, the second time sent because of an error returned by the first cruncher. The, this second cruncher took over ten days to return a result, and it just happened to be between the time it was sent to you on the 20th and the time you returned your result! Based upon the time the WU was issued to you, it appears that the project had assumed no result would be returned by the second cruncher afer ten days had elapsed. If I were in charge of the world, I'd validate your WU and grant you credit, and purge the (slightly earlier) result based upon the fact that it was so late that it was assumed to be lost, but... :-) Thanks for crunching!! |
Jack Send message Joined: 19 Feb 07 Posts: 11 Credit: 521,099 RAC: 0 |
Rosetta Beta 5.98 does not share CPU fairly with other BOINC projects. My preferences were set to the default 60 minutes and I tried changing it to 70 minutes just to force an update, but Rosetta keeps running long past the time it should pause and let another project have some time. This task: 193947855 started running at 2:45 this afternoon, used over six hours of CPU time and was still running when I suspended it so other projects could get some time. On previous recent tasks, Rosetta would run about 3 hours before pausing. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The BOINC Manager is making the fairness decisions, not Rosetta. It is also trying to optimize the use of your machine's crunch time by not losing work that is still in progress, but has not been checkpointed yet. Either Rosetta is owed time from when other projects were crunching, or perhaps it has a long running model and has not taken a checkpoint yet. BOINC keeps track of it all. If Rosetta does run long before checkpointing, BOINC will pay back the other project(s) the time they are owed. When you tell BOINC to "switch" every xx minutes, you aren't actually instructing it to switch. You are simply telling it how often to consider making a switch. The manager will enforce the resource shares you have configured between your projects. But look for this to be true over the course of 100 hours, not 100 minutes. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=193810495 t042_1_NMRREF_1_t042_1_S_00001_0006461IGNORE_THE_REST_010000_4471_6183_0 core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2556542 this pig locked up my system and it moved on to another task which locked up and also locked up einstein. also t042_1_NMRREF_1_t042_1_S_00002_0009608IGNORE_THE_REST_050000_4471_6695_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=193877090 died shortly afterwards, my system was locked up and I had to use task manager to abort boinc mgr and then rosetta and einstein and then reboot to get control of my system again. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
looked up the error message, seems that perhaps my graphics driver caused a conflict? I updated it and will see what happens. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This errored after 4hrs, 44min it had done 2 models just before it finished. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=177652466 9/25/2008 1:58:06 PM|rosetta@home|Output file AA2A_11_modeling_1_AA2A_1_AA2A_2VTA_align_4556_2991_0_0 for task AA2A_11_modeling_1_AA2A_1_AA2A_2VTA_align_4556_2991_0 absent <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2669370 ERROR:: Exit from: .refold.cc line: 338 </stderr_txt> pete. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
Had a couple of WUs exit from refold.cc AA2A_7_modeling_1_AA2A_1_AA2A_2RH1_align_4493_43027_0 AA2A_7_modeling_1_AA2A_1_AA2A_2RH1_align_4493_40790_0 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
AA2A_6_modeling_1_AA2A_1_AA2A_2RH1_align_4492_13278_0 this completed ok but gave this error message or warning CPU time 20691.98 stderr out <core_client_version>6.2.19</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 3360223 # cpu_run_time_pref: 21600 # random seed: 3360223 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 21600 # random seed: 3360223 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 21600 # random seed: 3360223 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 20691 cpu seconds This process generated 7 decoys from 7 attempts 0 starting pdbs were skipped ====================================================== |
AdeB Send message Joined: 12 Dec 06 Posts: 45 Credit: 4,428,086 RAC: 0 |
This workunit is valid but stderr out is enormous: <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 43200 # random seed: 2792818 sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range . /// This line is repeated 516 times /// . sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range ====================================================== DONE :: 1 starting structures 43239.7 cpu seconds This process generated 45 decoys from 45 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_4554_12169_1 This failed on my computer and another one: 888227 That person errors out with: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00BCCCF2 write attempt to address 0x1F46E428 I error out with: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00BCCCF2 write attempt to address 0x1F5063B8 get crappy credit for wasting 8541.453 seconds on this task only a big fat goose egg Better quality control of your coding is needed. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2128 Credit: 41,304,151 RAC: 10,377 |
[url=https://boinc.bakerlab.org/rosetta/result.php?resultid=194793599[Task ID 194793599[/url] gave a Compute Error with the std err: <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 3424460 Can't acquire lockfile - exiting [...] Can't acquire lockfile - exiting I get this frequently under MiniRosetta 1.34 but this is the first time under Beta 5.98. All other WUs have run fine though. |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
Task ID 194793599 gave a Compute Error with the std err: Sid Celery I've fixed up the the link to your result for you Have a crunching good day!! |
Otto Send message Joined: 6 Apr 07 Posts: 27 Credit: 3,567,665 RAC: 0 |
(I'm using Boinc 6.2.19 on Windows XP) When I press the graphics button in the main view, no graphics will show up, but the button itself disappears, so that I'm unable to see any graphics at all. Strange bug/issue. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I asked this question awhile back, and I believe the answer was that the new boinc managers do not support the graphics application of 5.98. (I'm using Boinc 6.2.19 on Windows XP) |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
I asked this question awhile back, and I believe the answer was that the new boinc managers do not support the graphics application of 5.98. Nearly. They still do, but not when BOINC is installed in the protected mode (as a service). But still, it is strange that the button was initially available for a click. Peter |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I asked this question awhile back, and I believe the answer was that the new boinc managers do not support the graphics application of 5.98. You say 'protected mode', thing is I installed Boinc mgr in its standard form, I did not select anything different than the defaults. So is 'protected mode' the default? If so, how do you change it so that 5.98 will show the graphics? |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
not when BOINC is installed in the protected mode (as a service). Yes, default form for 6.x is a service. In this mode, science apps are running under newly added [i]boinc_project[/usr] account and have no access to your (the logged-in user) desktop. (See also How to install BOINC as a service (BOINC 6 series) on Windows?.) Run the installation again and when on the BOINC Configuration page, press the "Advanced" button and then switch off the "Protected application execution" (a.k.a. "Service") mode checkbox. The client and applications will be then run under your account, started directly as Manager's child processes (the "good old way"). Peter |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
not when BOINC is installed in the protected mode (as a service). thanks, i will look at that this weekend. don't have allot of time during the week to do much here on the computer except read/write and run. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I just watched this HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_4539_39952_0 one die on me. 1:48 computation time and then it even pops up a windows error box. Error msg is this: Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) Computer ID 871217 Report deadline 8 Oct 2008 18:59:56 UTC CPU time 6482.563 stderr out <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2791509 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00BCCCF2 write attempt to address 0x1F4B486C Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.1.5 Dump Timestamp : 10/01/08 23:39:03 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 Debugger Engine : 4.0.5.0 Symbol Search Path: E:boincprojectsslots ;E:boincprojectsprojectsboinc.bakerlab.org_rosetta;srv*C:WINDOWSTEMPsymbols*http://msdl.microsoft.com/download/symbols;srv*C:WINDOWSTEMPsymbols*https://boinc.bakerlab.org/rosetta/symstore SymGetModuleInfo(): GetLastError = 87 ModLoad: 00000000 00000000 ( Symbols Loaded) repeats 23 times in total *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 4016, Write: 0, Other 6432 - I/O Transfers Counters - Read: 0, Write: 66417, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 51956, QuotaPeakPagedPoolUsage: 51956 QuotaNonPagedPoolUsage: 4200, QuotaPeakNonPagedPoolUsage: 5032 - Virtual Memory Usage - VirtualSize: 252223488, PeakVirtualSize: 256581632 - Pagefile Usage - PagefileUsage: 206942208, PeakPagefileUsage: 225918976 - Working Set Size - WorkingSetSize: 120836096, PeakWorkingSetSize: 139542528, PageFaultCount: 2228047 *** Dump of thread ID 2180 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00C4FCF3 read attempt to address 0xC0000000 Engaging BOINC Windows Runtime Debugger... </stderr_txt> ]]> NO CREDIT! My RAC is already low enough, now thanks to this it goes lower yet. GEES |
Message boards :
Number crunching :
Problems with Rosetta version 5.98
©2024 University of Washington
https://www.bakerlab.org