Problems with Rosetta version 5.93

Message boards : Number crunching : Problems with Rosetta version 5.93

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Steve Dodd

Send message
Joined: 13 Dec 05
Posts: 7
Credit: 3,811,680
RAC: 935
Message 51098 - Posted: 30 Jan 2008, 20:29:51 UTC
Last modified: 30 Jan 2008, 20:31:29 UTC

I've had a problem recently with wus going way past the allotment time (8 hrs for my preferences). I've had 2 get stuck in the 90% complete range and no further. Looking at the graphics showed the step for the model being tested as not incrementing. WU numbers are: 123352364 and 123338380. Crunch time was ~19 hrs. each.
ID: 51098 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Steve Dodd

Send message
Joined: 13 Dec 05
Posts: 7
Credit: 3,811,680
RAC: 935
Message 51108 - Posted: 1 Feb 2008, 5:07:28 UTC

Add wu 121455059
ID: 51108 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingemar

Send message
Joined: 28 Feb 06
Posts: 20
Credit: 1,680
RAC: 0
Message 51147 - Posted: 3 Feb 2008, 2:56:27 UTC
Last modified: 3 Feb 2008, 2:57:11 UTC

The 2h4o**** jobs were of a very large protein with very complicated architecture so rosetta gets stuck a
lot during model generation. No more jobs of this variety will be sent out due to the problems you report.
ID: 51147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EdMulock
Avatar

Send message
Joined: 14 Mar 06
Posts: 30
Credit: 2,347,485
RAC: 0
Message 51185 - Posted: 5 Feb 2008, 18:37:13 UTC


Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps.


2/5/2008 1:30:49 PM|rosetta@home|Task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 exited with a DLL initialization error.
2/5/2008 1:30:49 PM|rosetta@home|If this happens repeatedly you may need to reboot your computer.
2/5/2008 1:30:49 PM|rosetta@home|Restarting task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 using rosetta_beta version 593
2/5/2008 1:30:55 PM|rosetta@home|Task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 exited with a DLL initialization error.

ID: 51185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 51221 - Posted: 7 Feb 2008, 14:35:04 UTC

1tit__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-11-S3-9--1tit_-vf__2731_81_0 died with client error and this message:

core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3223320

no credit granted

this happened on feb 2
ID: 51221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 51222 - Posted: 7 Feb 2008, 16:18:57 UTC - in response to Message 51185.  


Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps.


Ed, is it just the one task? Or are you now have similar problem with other tasks as well?

If just the one task, obviously an abort of that one should clear up it's problems.

If it's happening on all of your tasks, I can only suggest doing a detach of the project, and then attach again. This will download a fresh copy of all of the dlls.
Rosetta Moderator: Mod.Sense
ID: 51222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EdMulock
Avatar

Send message
Joined: 14 Mar 06
Posts: 30
Credit: 2,347,485
RAC: 0
Message 51241 - Posted: 8 Feb 2008, 15:36:20 UTC - in response to Message 51222.  


Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps.


Ed, is it just the one task? Or are you now have similar problem with other tasks as well?

If just the one task, obviously an abort of that one should clear up it's problems.

If it's happening on all of your tasks, I can only suggest doing a detach of the project, and then attach again. This will download a fresh copy of all of the dlls.



Now about 120 different tasks. I've done that ( reset project about 5 times ). ( As stated in the first post )

All finish with "compute error" as reported status; and all restart ( over and over ) after about 4 seconds.
ID: 51241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 51242 - Posted: 8 Feb 2008, 16:06:01 UTC

Ed I was not, and am not clear on exactly what you've done. Did you "reset" the project?? Or did you "detach", then "attach" again? I am suggesting a complete detach.

Is it possible a virus scanner is consistently corrupting one of the files as they reload? You might try reinstalling BOINC to a new directory, and see if that triggers a message from an antivirus product that you may have overlooked originally.
Rosetta Moderator: Mod.Sense
ID: 51242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
stoneysilence

Send message
Joined: 4 May 07
Posts: 13
Credit: 401,055
RAC: 0
Message 51324 - Posted: 11 Feb 2008, 7:29:31 UTC

Got my first Failed Task to my knowledge tonight. Been having problems with the MiniRosettas so at first I thought it was one of them. But after I researched it found it was a 5.93 task. Only ran for a bit over an hour before it apparently crashed. Most units run for 1.5/2.9 hours at least. Something obviously went haywire.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=127347713
https://boinc.bakerlab.org/rosetta/result.php?resultid=139825592
ID: 51324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
aeryise

Send message
Joined: 5 Nov 07
Posts: 1
Credit: 47,149
RAC: 0
Message 51412 - Posted: 15 Feb 2008, 8:15:27 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=140509732
https://boinc.bakerlab.org/rosetta/result.php?resultid=138228394

I've also had the strange problem of tasks restarting from zero although when I stopped BOINC and shut down my computer the day before, they were at 30+% or 70+% i.e. nonzero completion. Not sure if this is related to 5.93 in any way, but this restart has only been happening in the past 3 days.
ID: 51412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 51416 - Posted: 15 Feb 2008, 17:07:34 UTC

aeryise, anytime you exit BOINC, you will lose some amount of work. The program would burn up your disk drive (and a lot of valuable computer time) if it was constantly storing everything it has done so far. So, periodically, it does a "checkpoint" where is preserves the work done so far. Some types of tasks a able to checkpoint more frequently then others.

The % completed is relative to your configured setting for your preferred runtime, so doesn't tell you definitively. In general, the project tries to checkpoint about every 15minutes, but there are some types of tasks that cannot do so, and may go for an hour or more without taking a checkpoint.

So if a checkpoint has not been reached when you exit BOINC, it will restart at 0% complete. It should then proceed normally. Don't worry, if you do several restarts like this without reaching a checkpoint, the Rosetta "watch dog" will figure out that this particular task is not a good fit for your machine, and purge it and get another task which may be able to checkpoint more frequently.
Rosetta Moderator: Mod.Sense
ID: 51416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Weasel

Send message
Joined: 20 Nov 06
Posts: 1
Credit: 334,404
RAC: 0
Message 51551 - Posted: 22 Feb 2008, 0:35:22 UTC

Well, I don’t have the time to read all the post here, (especially since even at 1024 X 768 I have to scroll sideways to read them) so I’ll just state my problems.
Even with the “Leave applications in memory while suspended” set to NO, R@H still hangs around after suspending the project, which with 350 MB for a WU, I have to do to get any work done.
So – memory hog WUs require suspension, which refuses to give up memory.

ID: 51551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 51559 - Posted: 22 Feb 2008, 9:43:36 UTC
Last modified: 22 Feb 2008, 9:44:21 UTC

Weasel, do you run Linux? Windows? Or Mac?

Edit, I see now that all of your hosts are Windows.
Rosetta Moderator: Mod.Sense
ID: 51559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile KWSN THE Holy Hand Grenade!

Send message
Joined: 3 May 07
Posts: 5
Credit: 2,542,452
RAC: 0
Message 51660 - Posted: 26 Feb 2008, 18:03:55 UTC - in response to Message 51221.  

1tit__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-11-S3-9--1tit_-vf__2731_81_0 died with client error and this message:

core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3223320

no credit granted

this happened on feb 2


Greg, see my Message 50716, in this thread (on Jan 15) - I'm glad that I'm not the only one with the problem! (Note that it must be R@H 5.93, as this happened on two different OS's and two different builds of BOINC)
ID: 51660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 51663 - Posted: 26 Feb 2008, 18:39:20 UTC

KWSN- when i click on the link it says no such task.
I went looking at your number 2 computer and noticed all the compute errors for all the FRA_t847__2 work. You should post those errors so the team knows that alot of that work crashes, unless your running on RALPH.

If that stuff crashes on your system, got to wonder if its going to die on mine.
ID: 51663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 51700 - Posted: 27 Feb 2008, 23:11:57 UTC

Just checked on one of the servers whose performance was below par and found that it was still "running" on a 1zpy workunit. The workunit deadline expired over 1 month ago, confirming that short of manually aborting misbehaving workunits they will never stop on their own.

OS: SuSE Linux 10.1
Boinc: 5.10.21
Rosetta: 5.93
Workunit: ? no idea which number, long gone from the server!

stderr.txt:
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -95.2845 for 900 seconds

This is (as usual!!!) followed by a SIGSEGV with the watchdog crashing and the client failing to terminate properly (and since the client process remains alive Boinc never finds out that there is anything wrong).
I'm well aware that this is not specific to the 5.93 client since that issue has been around for a long time, just reporting that it is still an issue.
Team Helix
ID: 51700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 51728 - Posted: 29 Feb 2008, 21:15:02 UTC

I've got a validate error on this one, first time ever don't know

what happened it ran normal and finished.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=131395160

pete.

ID: 51728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,518,559
RAC: 10,612
Message 51780 - Posted: 3 Mar 2008, 16:01:10 UTC

Hi,
Possibly not a Rosetta 5.93 error but with the Boinc Manager - apologies in advance if this question should go elsewhere.

I have added Boinc to a friend's HP Vista laptop and registered with Rosetta as LizzieBarry. I noticed that the Boinc manager isn't given permission to run at bootup on the machine by Windows Defender. I'm able to give it permission to run after the computer hits the desktop, but the computer owner is a complete novice and would not be able do so herself. I'm not a greatly technical person and only use XP at home, so I'm not familiar with Vista, but can someone advise how I can ensure Boinc manager starts on bootup without any user intervention? It's been suggested I go into Boinc Manager properties and to chose 'Run as adminstrator' but this hasn't been successful either.

Is it an issue with BM, Defender or the Vista OS? has anyone else seen this and found a solution I can use or can provide a link?

Also, if this question gets moved to a better topic, can someone mail me with it's new location. Any help much appreciated.
ID: 51780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,607,429
RAC: 9,920
Message 51861 - Posted: 9 Mar 2008, 11:59:40 UTC - in response to Message 51782.  

I'm not running Vista myself, but maybe it will work if BOINC is not installed in the 'Program Files'-folder, but in the root of the C: parttion, for example.


Client Download Error. This could just be a bad WU

Task ID 146467347
WU ID 133460753

I gotta learn how to use links on this message board.
Thx!

Paul

ID: 51861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,607,429
RAC: 9,920
Message 51862 - Posted: 9 Mar 2008, 12:05:06 UTC

More 5.93 errors

Here are a couple more errors with 5.93

Task ID WU ID
146382856 133443746 Client error Downloading 0.00 0.00 146381856 133443796 Client error Compute error 0.00 0.00
146381324 133443322 Client error Downloading 0.00 0.00
146380385 133402979 Client error Compute error 0.00 0.00


Thx!

Paul

ID: 51862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Problems with Rosetta version 5.93



©2024 University of Washington
https://www.bakerlab.org