Message boards : Number crunching : Problems with Rosetta version 5.93
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Steve Dodd Send message Joined: 13 Dec 05 Posts: 7 Credit: 3,811,680 RAC: 935 |
I've had a problem recently with wus going way past the allotment time (8 hrs for my preferences). I've had 2 get stuck in the 90% complete range and no further. Looking at the graphics showed the step for the model being tested as not incrementing. WU numbers are: 123352364 and 123338380. Crunch time was ~19 hrs. each. |
Steve Dodd Send message Joined: 13 Dec 05 Posts: 7 Credit: 3,811,680 RAC: 935 |
Add wu 121455059 |
Ingemar Send message Joined: 28 Feb 06 Posts: 20 Credit: 1,680 RAC: 0 |
The 2h4o**** jobs were of a very large protein with very complicated architecture so rosetta gets stuck a lot during model generation. No more jobs of this variety will be sent out due to the problems you report. |
EdMulock Send message Joined: 14 Mar 06 Posts: 30 Credit: 2,347,485 RAC: 0 |
Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps. 2/5/2008 1:30:49 PM|rosetta@home|Task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 exited with a DLL initialization error. 2/5/2008 1:30:49 PM|rosetta@home|If this happens repeatedly you may need to reboot your computer. 2/5/2008 1:30:49 PM|rosetta@home|Restarting task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 using rosetta_beta version 593 2/5/2008 1:30:55 PM|rosetta@home|Task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 exited with a DLL initialization error. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
1tit__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-11-S3-9--1tit_-vf__2731_81_0 died with client error and this message: core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 3223320 no credit granted this happened on feb 2 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Ed, is it just the one task? Or are you now have similar problem with other tasks as well? If just the one task, obviously an abort of that one should clear up it's problems. If it's happening on all of your tasks, I can only suggest doing a detach of the project, and then attach again. This will download a fresh copy of all of the dlls. Rosetta Moderator: Mod.Sense |
EdMulock Send message Joined: 14 Mar 06 Posts: 30 Credit: 2,347,485 RAC: 0 |
Now about 120 different tasks. I've done that ( reset project about 5 times ). ( As stated in the first post ) All finish with "compute error" as reported status; and all restart ( over and over ) after about 4 seconds. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Ed I was not, and am not clear on exactly what you've done. Did you "reset" the project?? Or did you "detach", then "attach" again? I am suggesting a complete detach. Is it possible a virus scanner is consistently corrupting one of the files as they reload? You might try reinstalling BOINC to a new directory, and see if that triggers a message from an antivirus product that you may have overlooked originally. Rosetta Moderator: Mod.Sense |
stoneysilence Send message Joined: 4 May 07 Posts: 13 Credit: 401,055 RAC: 0 |
Got my first Failed Task to my knowledge tonight. Been having problems with the MiniRosettas so at first I thought it was one of them. But after I researched it found it was a 5.93 task. Only ran for a bit over an hour before it apparently crashed. Most units run for 1.5/2.9 hours at least. Something obviously went haywire. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=127347713 https://boinc.bakerlab.org/rosetta/result.php?resultid=139825592 |
aeryise Send message Joined: 5 Nov 07 Posts: 1 Credit: 47,149 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=140509732 https://boinc.bakerlab.org/rosetta/result.php?resultid=138228394 I've also had the strange problem of tasks restarting from zero although when I stopped BOINC and shut down my computer the day before, they were at 30+% or 70+% i.e. nonzero completion. Not sure if this is related to 5.93 in any way, but this restart has only been happening in the past 3 days. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
aeryise, anytime you exit BOINC, you will lose some amount of work. The program would burn up your disk drive (and a lot of valuable computer time) if it was constantly storing everything it has done so far. So, periodically, it does a "checkpoint" where is preserves the work done so far. Some types of tasks a able to checkpoint more frequently then others. The % completed is relative to your configured setting for your preferred runtime, so doesn't tell you definitively. In general, the project tries to checkpoint about every 15minutes, but there are some types of tasks that cannot do so, and may go for an hour or more without taking a checkpoint. So if a checkpoint has not been reached when you exit BOINC, it will restart at 0% complete. It should then proceed normally. Don't worry, if you do several restarts like this without reaching a checkpoint, the Rosetta "watch dog" will figure out that this particular task is not a good fit for your machine, and purge it and get another task which may be able to checkpoint more frequently. Rosetta Moderator: Mod.Sense |
Weasel Send message Joined: 20 Nov 06 Posts: 1 Credit: 334,404 RAC: 0 |
Well, I don’t have the time to read all the post here, (especially since even at 1024 X 768 I have to scroll sideways to read them) so I’ll just state my problems. Even with the “Leave applications in memory while suspended” set to NO, R@H still hangs around after suspending the project, which with 350 MB for a WU, I have to do to get any work done. So – memory hog WUs require suspension, which refuses to give up memory. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Weasel, do you run Linux? Windows? Or Mac? Edit, I see now that all of your hosts are Windows. Rosetta Moderator: Mod.Sense |
KWSN THE Holy Hand Grenade! Send message Joined: 3 May 07 Posts: 5 Credit: 2,542,452 RAC: 0 |
1tit__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-11-S3-9--1tit_-vf__2731_81_0 died with client error and this message: Greg, see my Message 50716, in this thread (on Jan 15) - I'm glad that I'm not the only one with the problem! (Note that it must be R@H 5.93, as this happened on two different OS's and two different builds of BOINC) |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
KWSN- when i click on the link it says no such task. I went looking at your number 2 computer and noticed all the compute errors for all the FRA_t847__2 work. You should post those errors so the team knows that alot of that work crashes, unless your running on RALPH. If that stuff crashes on your system, got to wonder if its going to die on mine. |
Thomas Leibold Send message Joined: 30 Jul 06 Posts: 55 Credit: 19,627,164 RAC: 0 |
Just checked on one of the servers whose performance was below par and found that it was still "running" on a 1zpy workunit. The workunit deadline expired over 1 month ago, confirming that short of manually aborting misbehaving workunits they will never stop on their own. OS: SuSE Linux 10.1 Boinc: 5.10.21 Rosetta: 5.93 Workunit: ? no idea which number, long gone from the server! stderr.txt: Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -95.2845 for 900 seconds This is (as usual!!!) followed by a SIGSEGV with the watchdog crashing and the client failing to terminate properly (and since the client process remains alive Boinc never finds out that there is anything wrong). I'm well aware that this is not specific to the 5.93 client since that issue has been around for a long time, just reporting that it is still an issue. Team Helix |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I've got a validate error on this one, first time ever don't know what happened it ran normal and finished. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=131395160 pete. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Hi, Possibly not a Rosetta 5.93 error but with the Boinc Manager - apologies in advance if this question should go elsewhere. I have added Boinc to a friend's HP Vista laptop and registered with Rosetta as LizzieBarry. I noticed that the Boinc manager isn't given permission to run at bootup on the machine by Windows Defender. I'm able to give it permission to run after the computer hits the desktop, but the computer owner is a complete novice and would not be able do so herself. I'm not a greatly technical person and only use XP at home, so I'm not familiar with Vista, but can someone advise how I can ensure Boinc manager starts on bootup without any user intervention? It's been suggested I go into Boinc Manager properties and to chose 'Run as adminstrator' but this hasn't been successful either. Is it an issue with BM, Defender or the Vista OS? has anyone else seen this and found a solution I can use or can provide a link? Also, if this question gets moved to a better topic, can someone mail me with it's new location. Any help much appreciated. |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,607,712 RAC: 9,914 |
I'm not running Vista myself, but maybe it will work if BOINC is not installed in the 'Program Files'-folder, but in the root of the C: parttion, for example. Client Download Error. This could just be a bad WU Task ID 146467347 WU ID 133460753 I gotta learn how to use links on this message board. Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,607,712 RAC: 9,914 |
More 5.93 errors Here are a couple more errors with 5.93 Task ID WU ID 146382856 133443746 Client error Downloading 0.00 0.00 146381856 133443796 Client error Compute error 0.00 0.00 146381324 133443322 Client error Downloading 0.00 0.00 146380385 133402979 Client error Compute error 0.00 0.00 Thx! Paul |
Message boards :
Number crunching :
Problems with Rosetta version 5.93
©2024 University of Washington
https://www.bakerlab.org