Message boards : Number crunching : Problems with Rosetta version 5.93
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
As for the latest problems: ...We have noted these problems and are working on solutions to avoid these problems in the future. ...Ingemar is referring to the "no new work" messages that people were seeing the last 2 weeks of so of December and early January. This was observed mostly on systems with only the project minimum 256MB of memory. Rosetta Moderator: Mod.Sense |
cnick6 Send message Joined: 30 May 06 Posts: 29 Credit: 12,597,623 RAC: 0 |
Had a compute failure on Linux 5.93 https://boinc.bakerlab.org/rosetta/result.php?resultid=132057576 |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Yup, Looks like the 1zpy's are getting their normal "watchdog" endings (but in this case valid and credited. resultid=131958175 (this is the only one for me so far. I.E no other errors of any kind except this one, so not so bad) |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
Perhaps you need to stop new development for a bit, and concentrate on FIXING the broken crap you have now. (See Thomas Liebold's post earlier in this thread) Every release is accompanied by an endless litany of failed WUs. In no case, for a project of this size, is a few hours of testing an application even remotely adequate. Fix the problems, and perhaps those who have left this project might consider coming back. In this case it was only 2 days between the boinc and ralph update which I agree is on the short side. However, I just want to alert you to one issue: how long one tests the code on ralph before submitting to boinc depends on the nature of the update. In this case the latest two updates on ralph concerned only code in one scientific protocol. The rest of rosetta stayed the same and the bulk of the jobs sent out were running on identical code base as the previous boinc version. As for the new code, we had tested it on ralph and locally without problems so we were condfident it would not mess things up. Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,860,059 RAC: 3,073 |
In this case it was only 2 days between the boinc and ralph update which I agree is on the short side. However, I just want to alert you to one issue: how long one tests the code on ralph before submitting to boinc depends on the nature of the update. In this case the latest two updates on ralph concerned only code in one scientific protocol. The rest of rosetta stayed the same and the bulk of the jobs sent out were running on identical code base as the previous boinc version. As for the new code, we had tested it on ralph and locally without problems so we were condfident it would not mess things up. Thanks for the response Ingemar - it makes a big difference. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
My 4800 using linux had a watchdog ended WU, but it too was valid and credited. Here's the current scoreboard of 5.93 for my hosts. It seems consistent that it's the 1zpy wus that I have issues with. Perhaps, since I get credit and the wu is considered valid, that this is more of an "informational message", rather than an "error"??? |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
|
Robby1959 Send message Joined: 10 May 07 Posts: 38 Credit: 9,298,741 RAC: 0 |
can anyone tell me why my wifes laptop toshiba sat. keeps running work while it is in use? the machine is set not to run and the server is set not to run I am stumped it has a 1.6 intel dual core w/ 1 gig of ram, I think another laptop I set up is having the same problems and its bogging down the system any ideas btw it will snooze if told to also how long is the snooze timer |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
My AMD64 X2 4800 using Windows had TWO "1zpy" wus ended by the watchdog yesterday. At this point, I don't see the need to link to the results as they must have PLENTY of samples to work with. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
can anyone tell me why my wifes laptop toshiba sat. keeps running work while it is in use? the machine is set not to run and the server is set not to run I am stumped it has a 1.6 intel dual core w/ 1 gig of ram, I think another laptop I set up is having the same problems and its bogging down the system any ideas btw it will snooze if told to also how long is the snooze timer Robby, you want to review the "preferences" on that PC, which are a local override of your web-based preferences, that pertain only to that specific machine. And just make sure that the checkbox for customized preferences is checked (if that's what you wish to do) and then what it says in the "do work after idle for..." (the "advanced" pulldown, then preferences, then processor usage tab in the advanced view). If the customized preferences box is not checked, then it's just going to use the web-based preferences, which are defined for up to 4 different venues. You can see which venue this machine is considered to be at in the messages as BOINC starts. I believe the snooze is for 30min. Rosetta Moderator: Mod.Sense |
Robby1959 Send message Joined: 10 May 07 Posts: 38 Credit: 9,298,741 RAC: 0 |
can anyone tell me why my wifes laptop toshiba sat. keeps running work while it is in use? the machine is set not to run and the server is set not to run I am stumped it has a 1.6 intel dual core w/ 1 gig of ram, I think another laptop I set up is having the same problems and its bogging down the system any ideas btw it will snooze if told to also how long is the snooze timer |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
im running a 9.93 task, boinc twist rings etc etc. but now when i look at the grpahics there is no image in the box searching nor in the box accepted nor in low energy, the only image showed is that of native. for the rest it is running normaly. dragging around in the boxes dousn't gets the vieuw back, and restarting the graphics also gives no result. for screes mail me [edit] ok it triggered the debugger now, after more then 2 hours error |
Karl Send message Joined: 12 May 06 Posts: 11 Credit: 188,211 RAC: 0 |
What is happening with my work unit accounting? On Jan 6, my average was 462 and some decimal. Today it is 408.75. This is an enormous drop in work units. I haven't changed the any of my prferences at all over the last week. |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 345 |
Getting a number of errors on my Windows machine, no problems on my Linux machines, Get the following error message <core_client_version>5.8.15</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 1309647 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 21600 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 21600 # random seed: 1309647 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 21600 # random seed: 1309647 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 21600 # random seed: 1309647 No heartbeat from core client for 31 sec - exiting Too many restarts with no progress. Keep application in memory while preempted. ====================================================== DONE :: 1 starting structures 0 cpu seconds This process generated 0 decoys from 0 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>n003_1_NMRREF_n003_1_id_model_13_idl_Structural_Genomics_Target_2486_6284_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> On WU 131203333 WU 131203968 WU 132195281 WU 132270738 |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
and another 2 errors the second one didn't eaven start, it said maximum disk usage exceeded, but the next task just started fine, so im like WTF :O ?!?!?! and the first errored out because of sin / cosin out of range 1 2 the last few days i had 6 errors and only 1 succes almost all on the tasks like "mlt__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-2mlt_-crystal_foldanddock" anyone has an idea what going on, would like some1 explaining this. |
Michael Matthews Send message Joined: 12 Dec 05 Posts: 3 Credit: 37,852 RAC: 0 |
I am running BOINC Manger 5.10.30 on a Windows XP SP 2 operating system (with 1 GB of RAM). The entire computer has crashed and switched off whenever the Rosetta 5.93 Beta has been running. This only happens when the BOINC Manager runs Rosetta 5.93 Beta (why is a beta version being sent out?). When the BOINC Manager runs the SETI@home application there are never any crashes. I believe that the Rosetta work units that crashed are: 120779979 <https://boinc.bakerlab.org/rosetta/workunit.php?wuid=120779979> 120604480 <https://boinc.bakerlab.org/rosetta/workunit.php?wuid=120604480> I looked in the <C:Program FilesBOINC> and <C:Program FilesBOINCprojectsboinc.bakerlab.org_rosetta> directories for some kind of log file that might explain what the error was that caused the crash but I did not find anything conclusive. I did see this part of a log in the <C:Program FilesBOINCstdoutdae.txt> log file which corresponds to around the last time my system crashed and powered off last night (timestamps are in the PST timezone): 10-Jan-2008 10:52:04 [SETI@home] Sending scheduler request: To fetch work. Requesting 94 seconds of work, reporting 0 completed tasks 10-Jan-2008 10:52:39 [SETI@home] Scheduler request succeeded: got 0 new tasks 10-Jan-2008 10:52:44 [rosetta@home] Sending scheduler request: To fetch work. Requesting 2890 seconds of work, reporting 0 completed tasks 10-Jan-2008 10:52:49 [rosetta@home] Scheduler request succeeded: got 1 new tasks 10-Jan-2008 10:52:51 [rosetta@home] Started download of vf_1ail_.fasta.gz 10-Jan-2008 10:52:51 [rosetta@home] Started download of vf_1ail_.psipred_ss2.gz 10-Jan-2008 10:52:52 [rosetta@home] Finished download of vf_1ail_.fasta.gz 10-Jan-2008 10:52:52 [rosetta@home] Finished download of vf_1ail_.psipred_ss2.gz 10-Jan-2008 10:52:52 [rosetta@home] Started download of paths.vf2.17.5.txt.gz 10-Jan-2008 10:52:52 [rosetta@home] Started download of boinc_vf_aa1ail_03_05.200_v1_3.gz 10-Jan-2008 10:52:54 [rosetta@home] Finished download of paths.vf2.17.5.txt.gz 10-Jan-2008 10:52:54 [rosetta@home] Finished download of boinc_vf_aa1ail_03_05.200_v1_3.gz 10-Jan-2008 10:52:54 [rosetta@home] Started download of boinc_vf_aa1ail_09_05.200_v1_3.gz 10-Jan-2008 10:52:54 [rosetta@home] Started download of boinc_vf_aa1ail_05_05.200_v1_3.gz 10-Jan-2008 10:52:55 [rosetta@home] Finished download of boinc_vf_aa1ail_09_05.200_v1_3.gz 10-Jan-2008 10:52:55 [rosetta@home] Finished download of boinc_vf_aa1ail_05_05.200_v1_3.gz 10-Jan-2008 10:52:55 [rosetta@home] Started download of boinc_vf_aa1ail_17_05.200_v1_3.gz 10-Jan-2008 10:52:55 [rosetta@home] Started download of vf_1ail.pdb.gz 10-Jan-2008 10:52:56 [rosetta@home] Finished download of vf_1ail.pdb.gz 10-Jan-2008 10:52:56 [rosetta@home] Started download of abrelax_description.txt 10-Jan-2008 10:52:58 [rosetta@home] Finished download of boinc_vf_aa1ail_17_05.200_v1_3.gz 10-Jan-2008 10:52:58 [rosetta@home] Finished download of abrelax_description.txt 10-Jan-2008 10:53:28 [rosetta@home] Starting 1ail__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-17-S3-5--1ail_-vf__2534_625_0 10-Jan-2008 10:53:28 [rosetta@home] Starting task 1ail__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-17-S3-5--1ail_-vf__2534_625_0 using rosetta_beta version 593 10-Jan-2008 11:51:17 [---] Suspending computation - user is active 10-Jan-2008 11:51:17 [---] Suspending network activity - user is active 10-Jan-2008 13:18:22 [---] Resuming computation 10-Jan-2008 13:18:22 [---] Resuming network activity 10-Jan-2008 13:22:39 [---] Suspending computation - user is active 10-Jan-2008 13:22:39 [---] Suspending network activity - user is active 10-Jan-2008 14:23:53 [---] Resuming computation 10-Jan-2008 14:23:53 [---] Resuming network activity 10-Jan-2008 14:34:18 [---] Suspending computation - user is active 10-Jan-2008 14:34:18 [---] Suspending network activity - user is active 10-Jan-2008 14:58:29 [---] Resuming computation 10-Jan-2008 14:58:29 [---] Resuming network activity 10-Jan-2008 15:20:35 [SETI@home] Restarting task 01mr07ag.12577.7025.16.6.28_2 using setiathome_enhanced version 527 10-Jan-2008 15:25:19 [---] Suspending computation - user is active 10-Jan-2008 15:25:19 [---] Suspending network activity - user is active 10-Jan-2008 15:45:19 [---] Resuming computation 10-Jan-2008 15:45:19 [---] Resuming network activity 10-Jan-2008 16:33:22 [SETI@home] Sending scheduler request: To fetch work. Requesting 82 seconds of work, reporting 0 completed tasks 10-Jan-2008 16:33:27 [SETI@home] Scheduler request succeeded: got 1 new tasks 10-Jan-2008 16:33:29 [SETI@home] Started download of 22fe07ah.17355.19704.14.6.137 10-Jan-2008 16:33:31 [SETI@home] Finished download of 22fe07ah.17355.19704.14.6.137 10-Jan-2008 17:59:38 [SETI@home] Computation for task 01mr07ag.12577.7025.16.6.28_2 finished 10-Jan-2008 17:59:38 [SETI@home] Starting 29no06ae.28122.22955.13.6.124_0 10-Jan-2008 17:59:38 [SETI@home] Starting task 29no06ae.28122.22955.13.6.124_0 using setiathome_enhanced version 527 10-Jan-2008 17:59:41 [SETI@home] Started upload of 01mr07ag.12577.7025.16.6.28_2_0 10-Jan-2008 17:59:43 [SETI@home] Finished upload of 01mr07ag.12577.7025.16.6.28_2_0 10-Jan-2008 18:21:41 [SETI@home] Sending scheduler request: To fetch work. Requesting 64 seconds of work, reporting 1 completed tasks 10-Jan-2008 18:21:46 [SETI@home] Scheduler request succeeded: got 1 new tasks 10-Jan-2008 18:21:48 [SETI@home] Started download of 02mr07ad.17282.483.5.6.18 10-Jan-2008 18:21:49 [SETI@home] Finished download of 02mr07ad.17282.483.5.6.18 11-Jan-2008 06:47:59 [---] Starting BOINC client version 5.10.30 for windows_intelx86 11-Jan-2008 06:47:59 [---] log flags: task, file_xfer, sched_ops 11-Jan-2008 06:47:59 [---] Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3 11-Jan-2008 06:47:59 [---] Data directory: C:Program FilesBOINC 11-Jan-2008 06:47:59 [---] Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.20GHz [x86 Family 15 Model 2 Stepping 4] 11-Jan-2008 06:47:59 [---] Processor features: fpu tsc sse sse2 mmx 11-Jan-2008 06:47:59 [---] OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00) 11-Jan-2008 06:47:59 [---] Memory: 1023.48 MB physical, 2.40 GB virtual 11-Jan-2008 06:47:59 [---] Disk: 74.52 GB total, 27.12 GB free 11-Jan-2008 06:47:59 [---] Local time is UTC -8 hours 11-Jan-2008 06:47:59 [rosetta@home] URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 99273; location: home; project prefs: default 11-Jan-2008 06:47:59 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 33200; location: (none); project prefs: default 11-Jan-2008 06:47:59 [---] General prefs: from rosetta@home (last modified 12-Dec-2005 11:43:10) 11-Jan-2008 06:47:59 [---] Host location: home 11-Jan-2008 06:47:59 [---] General prefs: no separate prefs for home; using your defaults 11-Jan-2008 06:47:59 [---] Preferences limit memory usage when active to 511.74MB 11-Jan-2008 06:47:59 [---] Preferences limit memory usage when idle to 921.14MB 11-Jan-2008 06:47:59 [---] Preferences limit disk usage to 0.93GB This same behavior had also happened sometime ago (I don't remember when). I am seriously considering dropping Rosetta because they release BOINC applications that crashes my system and I cannot afford that. -Michael |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
the suspending because user is active can be corrected by disabling this feature in your profile in RAH. goto computing preferences and the second line from the top 'suspend work while computer is use' and change that to NO. that should get rid of the Suspending computation - user is active/Suspending network activity - user is active problems. since your two tasks have not reported back to the project yet there is nothing to see online about why they may or may not have crashed. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
The entire computer has crashed and switched off whenever the Rosetta 5.93 Beta has been running. Michael, If you puter is shutting down, then some basic computer parameter is being triggered. Have you checked for dust build up on your CPU heatsink, RAM, Power Supply Unit, etc lately? I believe what you are experiencing is heat related, or atleast it's my "best guess". I can't think of anything in the Seti or Rosetta (or any other Boinc Science app) that would cause a system shutdown. When any application runs it creates heat. depending upon the application, some apps work your processor harder than others, but all try to get 100% use out of the processor. The more efficient an app becomes, the hotter the processor should get as it's doing more in less time. anyway, I'd check the temps and/or dust accumulation inside your puter. tony |
Michael Matthews Send message Joined: 12 Dec 05 Posts: 3 Credit: 37,852 RAC: 0 |
The entire computer has crashed and switched off whenever the Rosetta 5.93 Beta has been running. The computer does not have any dust build up or fan problems. The shutting down problem only occurs with the Rosetta Beta 5.93 application and no other software (even ones with high CPU usage). The computer did not shutdown until the Rosetta Beta 5.93 was sent to my computer to run. As I stated before, the SETI@home application (version Enhanced 5.27) never causes this problem (it runs 80% of the time BOINC runs). The computer crashes only with Rosetta Beta 5.93. -Michael |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,860,059 RAC: 3,073 |
do you have graphics/screensaver enabled? |
Message boards :
Number crunching :
Problems with Rosetta version 5.93
©2024 University of Washington
https://www.bakerlab.org