Problems with Rosetta version 5.93

Message boards : Number crunching : Problems with Rosetta version 5.93

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
MerePeer

Send message
Joined: 6 Nov 05
Posts: 3
Credit: 1,787,446
RAC: 0
Message 51086 - Posted: 29 Jan 2008, 23:41:11 UTC - in response to Message 51070.  

Same here. Same problem with 2h4o__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK* just hanging. Restarting boinc results in same problem 8 hours later. Linux box.

ID: 51086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 51087 - Posted: 30 Jan 2008, 1:11:08 UTC
Last modified: 30 Jan 2008, 1:36:10 UTC

I'm not sure what to think. Complaints about the 2h4o wus started atleast 5 days ago. I ran a test on one of mine starting 5 days ago, which leaves 3 full business days and two weekend days for management to make a statement. I've seen or heard nothing. How often do they monitor these boards? Are they of any importance? I'm feeling a bit like any "beta" tests or any other tests are really a waste of our man hours and CPU Seconds. Perhaps, I'll be considered impatient...hmmmm....How long must one wait before one isn't considered as such???

I don't know. I know I've stopped ALL rosetta work. It really isn't what I wanted, but I don't wanna "Pi**" away my CPU time for nothing when it might be spent more wisely. (I.E if my machines are just going to use electricity without scientific benefit, what's the point of leaving them on)

tony

I started at 200K and was shooting for 600K before stopping, but I guess 350K is OK. If that's what they want.(well, would stay 350K but I loaned out a machine before I knew the score, so I have to await it's return before I remove it.)
ID: 51087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Sep 05
Posts: 97
Credit: 3,670,592
RAC: 0
Message 51089 - Posted: 30 Jan 2008, 2:24:58 UTC - in response to Message 51039.  

The problems I was getting over at Ralph appear to have carried over to Rosetta.

The Wu's starting with "2h4o" were causing problems on Ralph so I was supprised to see them over here on Rosetta.



Were you "really" surprised?
ID: 51089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 146
Credit: 3,250,729
RAC: 161
Message 51090 - Posted: 30 Jan 2008, 8:08:24 UTC - in response to Message 51089.  

The problems I was getting over at Ralph appear to have carried over to Rosetta.

The Wu's starting with "2h4o" were causing problems on Ralph so I was surprised to see them over here on Rosetta.



Were you "really" surprised?


G'Day j2satx,
No I guess I was not, considering no response over on Ralph either. A lot of wasted time when these things run to over 21 hours and then often error out.
It is a shame, I do like the project and it's goals, it was one of the best monitored and responsive projects for a good while.
ID: 51090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Sep 05
Posts: 97
Credit: 3,670,592
RAC: 0
Message 51094 - Posted: 30 Jan 2008, 15:00:39 UTC - in response to Message 51090.  

The problems I was getting over at Ralph appear to have carried over to Rosetta.

The Wu's starting with "2h4o" were causing problems on Ralph so I was surprised to see them over here on Rosetta.



Were you "really" surprised?


G'Day j2satx,
No I guess I was not, considering no response over on Ralph either. A lot of wasted time when these things run to over 21 hours and then often error out.
It is a shame, I do like the project and it's goals, it was one of the best monitored and responsive projects for a good while.


I know....I started crunching Ralph again when it looked like they were making a change with the "minis", but seems that was short lived also.
ID: 51094 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 70
Credit: 4,530,841
RAC: 597
Message 51096 - Posted: 30 Jan 2008, 18:30:29 UTC

The interesting thing with all this is that, after that one bad day a couple of weeks ago, I made a minor adjustment to the amount of memory (from 90% to 85% when computer is not in use) and CPU (from 100% to 90%) allowed, and since that time my WUs have been cranking happily away, finishing in the normal 2-4 hours of CPU time, and not overwhelming my Pentium IV. And no errors. Maybe I'm just lucky.
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 51096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Steve Dodd

Send message
Joined: 13 Dec 05
Posts: 6
Credit: 2,757,027
RAC: 213
Message 51098 - Posted: 30 Jan 2008, 20:29:51 UTC
Last modified: 30 Jan 2008, 20:31:29 UTC

I've had a problem recently with wus going way past the allotment time (8 hrs for my preferences). I've had 2 get stuck in the 90% complete range and no further. Looking at the graphics showed the step for the model being tested as not incrementing. WU numbers are: 123352364 and 123338380. Crunch time was ~19 hrs. each.
ID: 51098 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Steve Dodd

Send message
Joined: 13 Dec 05
Posts: 6
Credit: 2,757,027
RAC: 213
Message 51108 - Posted: 1 Feb 2008, 5:07:28 UTC

Add wu 121455059
ID: 51108 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingemar

Send message
Joined: 28 Feb 06
Posts: 20
Credit: 1,680
RAC: 0
Message 51147 - Posted: 3 Feb 2008, 2:56:27 UTC
Last modified: 3 Feb 2008, 2:57:11 UTC

The 2h4o**** jobs were of a very large protein with very complicated architecture so rosetta gets stuck a
lot during model generation. No more jobs of this variety will be sent out due to the problems you report.
ID: 51147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EdMulock
Avatar

Send message
Joined: 14 Mar 06
Posts: 30
Credit: 2,347,485
RAC: 0
Message 51185 - Posted: 5 Feb 2008, 18:37:13 UTC


Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps.


2/5/2008 1:30:49 PM|rosetta@home|Task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 exited with a DLL initialization error.
2/5/2008 1:30:49 PM|rosetta@home|If this happens repeatedly you may need to reboot your computer.
2/5/2008 1:30:49 PM|rosetta@home|Restarting task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 using rosetta_beta version 593
2/5/2008 1:30:55 PM|rosetta@home|Task 1bm8__BOINC_CONTROLABRELAX_VF_IGNORE_THE_REST-S25-9-S3-3--1bm8_-vf__2547_10874_0 exited with a DLL initialization error.

ID: 51185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4872
Credit: 4,184,932
RAC: 1,644
Message 51221 - Posted: 7 Feb 2008, 14:35:04 UTC

1tit__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-11-S3-9--1tit_-vf__2731_81_0 died with client error and this message:

core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3223320

no credit granted

this happened on feb 2
ID: 51221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4015
Credit: 0
RAC: 0
Message 51222 - Posted: 7 Feb 2008, 16:18:57 UTC - in response to Message 51185.  


Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps.


Ed, is it just the one task? Or are you now have similar problem with other tasks as well?

If just the one task, obviously an abort of that one should clear up it's problems.

If it's happening on all of your tasks, I can only suggest doing a detach of the project, and then attach again. This will download a fresh copy of all of the dlls.
Rosetta Moderator: Mod.Sense
ID: 51222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EdMulock
Avatar

Send message
Joined: 14 Mar 06
Posts: 30
Credit: 2,347,485
RAC: 0
Message 51241 - Posted: 8 Feb 2008, 15:36:20 UTC - in response to Message 51222.  


Any clue ? This happens on 8 diferent tasks, reboots, Boinc upgrade to 5.10.30, Reset project, abort task, Nothing helps.


Ed, is it just the one task? Or are you now have similar problem with other tasks as well?

If just the one task, obviously an abort of that one should clear up it's problems.

If it's happening on all of your tasks, I can only suggest doing a detach of the project, and then attach again. This will download a fresh copy of all of the dlls.



Now about 120 different tasks. I've done that ( reset project about 5 times ). ( As stated in the first post )

All finish with "compute error" as reported status; and all restart ( over and over ) after about 4 seconds.
ID: 51241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4015
Credit: 0
RAC: 0
Message 51242 - Posted: 8 Feb 2008, 16:06:01 UTC

Ed I was not, and am not clear on exactly what you've done. Did you "reset" the project?? Or did you "detach", then "attach" again? I am suggesting a complete detach.

Is it possible a virus scanner is consistently corrupting one of the files as they reload? You might try reinstalling BOINC to a new directory, and see if that triggers a message from an antivirus product that you may have overlooked originally.
Rosetta Moderator: Mod.Sense
ID: 51242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
stoneysilence

Send message
Joined: 4 May 07
Posts: 13
Credit: 401,055
RAC: 0
Message 51324 - Posted: 11 Feb 2008, 7:29:31 UTC

Got my first Failed Task to my knowledge tonight. Been having problems with the MiniRosettas so at first I thought it was one of them. But after I researched it found it was a 5.93 task. Only ran for a bit over an hour before it apparently crashed. Most units run for 1.5/2.9 hours at least. Something obviously went haywire.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=127347713
https://boinc.bakerlab.org/rosetta/result.php?resultid=139825592
ID: 51324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
aeryise

Send message
Joined: 5 Nov 07
Posts: 1
Credit: 47,149
RAC: 0
Message 51412 - Posted: 15 Feb 2008, 8:15:27 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=140509732
https://boinc.bakerlab.org/rosetta/result.php?resultid=138228394

I've also had the strange problem of tasks restarting from zero although when I stopped BOINC and shut down my computer the day before, they were at 30+% or 70+% i.e. nonzero completion. Not sure if this is related to 5.93 in any way, but this restart has only been happening in the past 3 days.
ID: 51412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4015
Credit: 0
RAC: 0
Message 51416 - Posted: 15 Feb 2008, 17:07:34 UTC

aeryise, anytime you exit BOINC, you will lose some amount of work. The program would burn up your disk drive (and a lot of valuable computer time) if it was constantly storing everything it has done so far. So, periodically, it does a "checkpoint" where is preserves the work done so far. Some types of tasks a able to checkpoint more frequently then others.

The % completed is relative to your configured setting for your preferred runtime, so doesn't tell you definitively. In general, the project tries to checkpoint about every 15minutes, but there are some types of tasks that cannot do so, and may go for an hour or more without taking a checkpoint.

So if a checkpoint has not been reached when you exit BOINC, it will restart at 0% complete. It should then proceed normally. Don't worry, if you do several restarts like this without reaching a checkpoint, the Rosetta "watch dog" will figure out that this particular task is not a good fit for your machine, and purge it and get another task which may be able to checkpoint more frequently.
Rosetta Moderator: Mod.Sense
ID: 51416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Weasel

Send message
Joined: 20 Nov 06
Posts: 1
Credit: 334,404
RAC: 0
Message 51551 - Posted: 22 Feb 2008, 0:35:22 UTC

Well, I don’t have the time to read all the post here, (especially since even at 1024 X 768 I have to scroll sideways to read them) so I’ll just state my problems.
Even with the “Leave applications in memory while suspended” set to NO, R@H still hangs around after suspending the project, which with 350 MB for a WU, I have to do to get any work done.
So – memory hog WUs require suspension, which refuses to give up memory.

ID: 51551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4015
Credit: 0
RAC: 0
Message 51559 - Posted: 22 Feb 2008, 9:43:36 UTC
Last modified: 22 Feb 2008, 9:44:21 UTC

Weasel, do you run Linux? Windows? Or Mac?

Edit, I see now that all of your hosts are Windows.
Rosetta Moderator: Mod.Sense
ID: 51559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile KWSN THE Holy Hand Grenade!

Send message
Joined: 3 May 07
Posts: 5
Credit: 2,542,452
RAC: 0
Message 51660 - Posted: 26 Feb 2008, 18:03:55 UTC - in response to Message 51221.  

1tit__BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-11-S3-9--1tit_-vf__2731_81_0 died with client error and this message:

core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3223320

no credit granted

this happened on feb 2


Greg, see my Message 50716, in this thread (on Jan 15) - I'm glad that I'm not the only one with the problem! (Note that it must be R@H 5.93, as this happened on two different OS's and two different builds of BOINC)
ID: 51660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Problems with Rosetta version 5.93



©2020 University of Washington
https://www.bakerlab.org