Report Problems with Rosetta Version 5.16 II

Author	Message
rdickjune Send message Joined: 15 May 06 Posts: 5 Credit: 5,529 RAC: 0	Message 17219 - Posted: 27 May 2006, 5:26:27 UTC I'm getting errors occasionally and need to know where to report them for my version of Rosetta. The error messages have been as follows: (type is in red:) rosetta@home 5/26/2006 9:19:51 PM rosetta not responding to screensaver, exiting rosetta@home 5/26/2006 9:19:51 PM Unrecoverable error.....,etc... (-exit code_ 1 (0xcffffffff) .....after more dialog, the end result states that the application is terminated. I have had this same error happen a number of times since I started running Rosetta. (less than two weeks) I think I read on the site somewhere that credit is earned for all work done. I am not concerned in this regard. I see that there are links provided to report errors for specific versions of Rosetta. I don't see a link for the newer clients to report bugs, so this is why I'm using this thread. If you move this to another thread, please let me know to where it's been moved for future reference. Thank you. ID: 17219 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 17252 - Posted: 27 May 2006, 19:19:11 UTC see last post in "Report Problems with Rosetta Version 5.16 I", this is my second error since turning the screensaver back on. It's wuid=18193778 Result ID 21719438 Name T0283_CONTACTS_MAP_FROM_hom006_535_21537_0 Workunit 18193778 Created 27 May 2006 4:39:53 UTC Sent 27 May 2006 6:37:21 UTC Received 27 May 2006 19:14:59 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xc000000d) Computer ID 212252 Report deadline 3 Jun 2006 6:37:21 UTC CPU time 1213.703125 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # cpu_run_time_pref: 28800 # random seed: 3172964 </stderr_txt> Validate state Invalid Claimed credit 4.80583329460571 Granted credit 0 application version 5.16 ID: 17252 · Rating: 0 · rate: / Reply Quote

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 17255 - Posted: 27 May 2006, 20:35:54 UTC Last modified: 27 May 2006, 20:39:22 UTC https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18032663 The computing error came after 13,500+ seconds of processing , and 6 models (it was working on number 7) This and no other is the root from which a Tyrant springs; when he first appears he is a protector.â€ Plato ID: 17255 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 17257 - Posted: 27 May 2006, 22:19:40 UTC OK, now there's three fatal windows errors since turning screensaver back on. wuid=18250663. I'm now going to turn screensaver back off. tony Result ID 21780793 Name T0283_CONTACTS_CONSERVATIVE_MAP_FROM_hom006_547_8549_0 Workunit 18250663 Created 27 May 2006 17:01:15 UTC Sent 27 May 2006 19:15:00 UTC Received 27 May 2006 22:16:39 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xc000000d) Computer ID 212252 Report deadline 3 Jun 2006 19:15:00 UTC CPU time 9206.765625 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # random seed: 2943352 # cpu_run_time_pref: 28800 </stderr_txt> Validate state Invalid Claimed credit 36.4555218363274 Granted credit 0 application version 5.16 ID: 17257 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 17260 - Posted: 28 May 2006, 0:07:32 UTC - in response to Message 17257. OK, now there's three fatal windows errors since turning screensaver back on. wuid=18250663. I'm now going to turn screensaver back off. tony Result ID 21780793 Name T0283_CONTACTS_CONSERVATIVE_MAP_FROM_hom006_547_8549_0 Workunit 18250663 Created 27 May 2006 17:01:15 UTC Sent 27 May 2006 19:15:00 UTC Received 27 May 2006 22:16:39 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xc000000d) Computer ID 212252 Report deadline 3 Jun 2006 19:15:00 UTC CPU time 9206.765625 stderr out 5.4.9 - exit code -1073741811 (0xc000000d) # random seed: 2943352 # cpu_run_time_pref: 28800 Validate state Invalid Claimed credit 36.4555218363274 Granted credit 0 application version 5.16 Yes--Rom in analyzing the current error breakdown thinks that most are associated with the graphics failing. he is testing a solution in which rosetta keeps going and results get returned even if there is a problem with the graphics. ID: 17260 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 17262 - Posted: 28 May 2006, 0:56:03 UTC - in response to Message 17260. Yes--Rom in analyzing the current error breakdown thinks that most are associated with the graphics failing. he is testing a solution in which rosetta keeps going and results get returned even if there is a problem with the graphics. If you don't mind, I'll email Rom directly to see if there's anything I can do. I'm one of his Alpha testers anyway. tony ID: 17262 · Rating: 0 · rate: / Reply Quote

senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0	Message 17265 - Posted: 28 May 2006, 5:34:53 UTC Last modified: 28 May 2006, 5:38:55 UTC 1/8/2005 3:36:18 PM\|\|request_reschedule_cpus: process exited 1/8/2005 3:36:18 PM\|rosetta@home\|Computation for result T0283_CONTACTS_MAP_FROM_hom006_535_22929_0 finished 1/8/2005 3:36:19 PM\|rosetta@home\|Started upload of T0283_CONTACTS_MAP_FROM_hom006_535_22929_0_0 1/8/2005 3:36:24 PM\|rosetta@home\|Finished upload of T0283_CONTACTS_MAP_FROM_hom006_535_22929_0_0 1/8/2005 3:36:24 PM\|rosetta@home\|Throughput 31466 bytes/sec 1/8/2005 3:54:13 PM\|rosetta@home\|Deferring communication with project for 1 days, 19 hours, 59 minutes, and 57 seconds 1/8/2005 4:01:56 PM\|\|Insufficient work; requesting more 1/8/2005 4:01:56 PM\|LHC@home\|Deferring communication with project for 71 weeks, 5 days, 7 hours, 29 minutes, and 28 seconds 1/8/2005 4:54:14 PM\|rosetta@home\|Deferring communication with project for 1 days, 18 hours, 59 minutes, and 56 seconds 1/8/2005 11:02:00 PM\|\|Insufficient work; requesting more 1/8/2005 11:02:00 PM\|LHC@home\|Deferring communication with project for 71 weeks, 5 days, 0 hours, 29 minutes, and 24 seconds 1/8/2005 11:54:18 PM\|rosetta@home\|Deferring communication with project for 1 days, 11 hours, 59 minutes, and 53 seconds 1/9/2005 12:02:00 AM\|\|Insufficient work; requesting more 1/9/2005 12:02:00 AM\|LHC@home\|Deferring communication with project for 71 weeks, 4 days, 23 hours, 29 minutes, and 24 seconds 1/9/2005 12:30:39 AM\|\|request_reschedule_cpus: project op 1/9/2005 12:30:40 AM\|rosetta@home\|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 1/9/2005 12:30:40 AM\|rosetta@home\|Requesting 0 seconds of work, returning 1 results 1/9/2005 12:30:42 AM\|rosetta@home\|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 1/9/2005 12:31:17 AM\|\|request_reschedule_cpus: project op 1/9/2005 12:31:19 AM\|rosetta@home\|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 1/9/2005 12:31:19 AM\|rosetta@home\|Requesting 8640 seconds of work, returning 0 results 1/9/2005 12:31:20 AM\|rosetta@home\|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 1/9/2005 12:31:20 AM\|rosetta@home\|Message from server: Not sending work - last RPC too recent: 38 sec 1/9/2005 12:31:20 AM\|rosetta@home\|No work from project 1/9/2005 12:31:21 AM\|rosetta@home\|Deferring communication with project for 4 minutes and 1 seconds It says on the homepage that there are 19,000 workunits in the queue yet I cannot get any workunits. 6 hours comp time wasted....argh. Anyone else having this problem??????????? **Edit** Hmm, I just got work now? This is interesting. I noticed that for some reason workunits get stuck in the status "ready to report" under the worktab but never actually get uploaded even though BOINC has contacted rosetta servers. Only after I manually press the update button will the workunit go through. I am running version 4.45. Any ideas???? ID: 17265 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 17270 - Posted: 28 May 2006, 10:04:18 UTC - in response to Message 17265. This is interesting. I noticed that for some reason workunits get stuck in the status "ready to report" under the worktab but never actually get uploaded even though BOINC has contacted rosetta servers. Only after I manually press the update button will the workunit go through. I am running version 4.45. Any ideas???? Reporting is done seperately from uploading to reduce network comms on the server side. JM7 (the man who wrote the scheduler) says this: Results are reported any time the project is contacted for an update. Updates occur at the first of: 1) A result report is due within 24 hours. 2) It has been at least the connect interval since the result completed. 3) (5.4) It is less than the connect interval till the report deadline. 4) Work is needed. 5) A manual update. ID: 17270 · Rating: 0 · rate: / Reply Quote

Robinski Send message Joined: 7 Mar 06 Posts: 51 Credit: 85,383 RAC: 0	Message 17280 - Posted: 28 May 2006, 20:52:35 UTC I just saw I had an error today with r287__CONTACTEIGHT_SHORTRELAX_SAVE_ALL_OUT_hom001__563_711 see: Result Possible this is due to the fact I manualy stopped the boinc service but I am not sure if this was around the same time. Otherwise it is just an error which occured. It was an Invalide Function error: <core_client_version>5.5.0</core_client_version> <message> Onjuiste functie. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2482790 # cpu_run_time_pref: 3600 ERROR:: Exit at: .dock_structure.cc line:401 </stderr_txt> Member of the Dutch Power Cows Trying to get the world on IPv6, do you have it? check here: IPv6.RHarmsen.nl ID: 17280 · Rating: 0 · rate: / Reply Quote

anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0	Message 17301 - Posted: 29 May 2006, 14:04:44 UTC Hi I'm cruching this https://boinc.bakerlab.org/rosetta/result.php?resultid=21976659 Wu now. When I select print to screen on my economiprogram the wu halts. No movment what so ever after on grafics. Anders n ID: 17301 · Rating: 0 · rate: / Reply Quote

Rollo Send message Joined: 2 Jan 06 Posts: 21 Credit: 106,369 RAC: 0	Message 17317 - Posted: 29 May 2006, 18:39:25 UTC Last modified: 29 May 2006, 18:41:56 UTC I am crunching on 21970571 right now. It stops after reaching 1.210% at time step 2833. If I stop boinc and let it restart from the last checkpoint (here: from the beginning), it stops at same step 2833 in model 1. For me this seems reproducible. Any suggestions, what I can do to produce a reasonable error report, except abort the workunit or wait for the watchdog? ID: 17317 · Rating: 0 · rate: / Reply Quote

tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0	Message 17321 - Posted: 29 May 2006, 20:29:19 UTC - in response to Message 17317. I am crunching on 21970571 right now. It stops after reaching 1.210% at time step 2833. If I stop boinc and let it restart from the last checkpoint (here: from the beginning), it stops at same step 2833 in model 1. For me this seems reproducible. Any suggestions, what I can do to produce a reasonable error report, except abort the workunit or wait for the watchdog? I'd wait at least an hour. In theory the watchdog should terminate it after an hour. This is a good opportunity to see if it really works as it should. In any case let it run a few hours and if it really keeps stuck at step 2833 abort and you get all the credits for the time crunched. ID: 17321 · Rating: 0 · rate: / Reply Quote

Aglarond Send message Joined: 29 Jan 06 Posts: 26 Credit: 446,212 RAC: 0	Message 17325 - Posted: 29 May 2006, 23:29:58 UTC I had another one of that nasty R@H screensaver crashes. It was result T0283_CONTACTS_CONSERVATIVE_HALFHB_MAP_FROM_hom006_575_8907_0. However today I zipped memory dump that windows was going to send to microsoft. If you think it will help you, you can download WERa78d.dir00.zip (16.1 MB). (I will leave it there for download for at least a month) I was also thinking why this happens only on this particular computer. This is only one of my computers, that has localized version of windows (Slovak language version). Do you think it can be the reason for screensaver crash? ID: 17325 · Rating: 0 · rate: / Reply Quote

Winkle Send message Joined: 22 May 06 Posts: 88 Credit: 1,354,930 RAC: 0	Message 17337 - Posted: 30 May 2006, 7:11:39 UTC Last modified: 30 May 2006, 7:23:30 UTC I am currently crunching JUMP_RELAX_LONGRANGEPAIR_PARALLEL_t285__SAVE_ALL_OUT_548_11268_0 using rosetta version 516 It is at 1% after 2.5 hrs. Boincview tells me that it has 5.25 hrs to complete. Normally it takes around 2.8 hrs per WU. It is running on a Dell P3 1G #225837 Wait.... That was wierd... It just went straight to 100% At 2:43 Any Ideas ?? Edit... I think this is it... https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18401795 https://boinc.bakerlab.org/rosetta/result.php?resultid=21942605 End Edit... Thanks Ian ID: 17337 · Rating: 0 · rate: / Reply Quote

tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0	Message 17340 - Posted: 30 May 2006, 7:38:35 UTC - in response to Message 17337. I am currently crunching JUMP_RELAX_LONGRANGEPAIR_PARALLEL_t285__SAVE_ALL_OUT_548_11268_0 using rosetta version 516 It is at 1% after 2.5 hrs. Boincview tells me that it has 5.25 hrs to complete. Normally it takes around 2.8 hrs per WU. It is running on a Dell P3 1G #225837 Wait.... That was wierd... It just went straight to 100% At 2:43 Any Ideas ?? Edit... I think this is it... https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18401795 https://boinc.bakerlab.org/rosetta/result.php?resultid=21942605 End Edit... Thanks Ian Quoting from the FAQ: Depending on how the Wu is configured, some may have over 1,500,000 steps in the first model and still not reach 1%. This can take over 5 hours of CPU time. There are a few even larger ones. ID: 17340 · Rating: 0 · rate: / Reply Quote

Winkle Send message Joined: 22 May 06 Posts: 88 Credit: 1,354,930 RAC: 0	Message 17346 - Posted: 30 May 2006, 9:10:00 UTC - in response to Message 17340. Thanks I will have a read ID: 17346 · Rating: 0 · rate: / Reply Quote

Rollo Send message Joined: 2 Jan 06 Posts: 21 Credit: 106,369 RAC: 0	Message 17350 - Posted: 30 May 2006, 11:25:14 UTC - in response to Message 17321. I am crunching on 21970571 right now. It stops after reaching 1.210% at time step 2833. If I stop boinc and let it restart from the last checkpoint (here: from the beginning), it stops at same step 2833 in model 1. For me this seems reproducible. Any suggestions, what I can do to produce a reasonable error report, except abort the workunit or wait for the watchdog? I'd wait at least an hour. In theory the watchdog should terminate it after an hour. This is a good opportunity to see if it really works as it should. In any case let it run a few hours and if it really keeps stuck at step 2833 abort and you get all the credits for the time crunched. The watchdog killed the workunit. I have made a backup, so I can rerun it, if that is of any interest. ID: 17350 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 17 Sep 05 Posts: 29 Credit: 413,302 RAC: 0	Message 17367 - Posted: 30 May 2006, 14:55:07 UTC Last modified: 30 May 2006, 14:55:34 UTC I have a runaway WU (here). It reports 100% done, but even after over 11h it keeps on running, even though I limited WU time to 1h. HTH ID: 17367 · Rating: 0 · rate: / Reply Quote

Tom Philippart Send message Joined: 29 May 06 Posts: 183 Credit: 834,667 RAC: 0	Message 17372 - Posted: 30 May 2006, 15:37:06 UTC my problem: the progress percentage jumps, it's not advancing fluently (1%-24%-48%...) i can live with it, but still it would be cool to have this solved ID: 17372 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 17375 - Posted: 30 May 2006, 16:07:04 UTC - in response to Message 17372. my problem: the progress percentage jumps, it's not advancing fluently (1%-24%-48%...) i can live with it, but still it would be cool to have this solved This is normal and described in this faq about the runtime preference. The % complete for Rosetta is not as definate and easy to compute as some other projects. A given WU will run through as many complete models as possible. Given the percentages in your example, once it completed the first model (see the model number on the graphic) it estimated it would get about 3 more models completed before it reaches your runtime preference. Each completed model is what the scientists need for their work. What happens within a model is not as important. But the additional updates to the % complete were basically added to help diagnose any problems with a given set of WUs. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 17375 · Rating: 0 · rate: / Reply Quote