Problems with Rosetta version 5.59

Message boards : Number crunching : Problems with Rosetta version 5.59

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
rechenknecht123

Send message
Joined: 15 Oct 06
Posts: 17
Credit: 2,022
RAC: 0
Message 39158 - Posted: 8 Apr 2007, 13:04:00 UTC - in response to Message 39152.  

Running time 00:24:00 13,327% Time til ready 2281:34:26
Result ID 71817153


Time to complete dropping fast?


yes time to complete dropping very fast in my eyes
runnning time 00:35:00h

complete 19,624%

time til ready 1981:05:45h


runinning time 00:54:18h
complete 30,359%
time til ready 1482:56:34h






What model did at restart at?

now when i click on show grafics the boing manager gets " Grey " and no grafic is shown






ID: 39158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rechenknecht123

Send message
Joined: 15 Oct 06
Posts: 17
Credit: 2,022
RAC: 0
Message 39160 - Posted: 8 Apr 2007, 13:12:32 UTC - in response to Message 39157.  

I don't know what happend. After 11H there should have been atleast a few models made.

Next time you shut down the MAC please check in the grafics how much work
it has done on the WU. Then check again when you restart it.

I will Do this, but no the Boning Manager 5.8.17 gets " Grey" and no grafic window openens,


running time 1:03:12h
ready 35,456%
rest time 1276:50:15h
so i can`t control the checkpoint in the moment.


If it happens again please post here again.
i will do

Remember for now 1 model has to be done before a checkpoint is made.
(they are working at more checkpoints in the Wu)
Anders n


thanks Anders n

rechenknecht

ID: 39160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5661
Credit: 5,699,284
RAC: 2,079
Message 39164 - Posted: 8 Apr 2007, 15:56:00 UTC
Last modified: 8 Apr 2007, 15:56:16 UTC

This seems to be stuck when it is in ab initio stage. As far as I can tell the strand is stuck, its on model 24 step 11,000 and counting higher. But the RMSD is stuck on 13.xx with xx being the variable numbers. The accepted energy is not really stuck, but does not register on the graph. It appears stuck at the top. The progress keeps counting in BOINC manager though, so it's not stuck in a endless loop according to it. I will let it run its course as it is now 6 hrs into the process.
I have one more WU of the same type to run still. Is this normal?
ID: 39164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 39179 - Posted: 8 Apr 2007, 21:46:26 UTC - in response to Message 39164.  

Hi greg_be ... I think the behavior is OK. Please leave it running! Fixing the scale on the graph of the energy is definitely on the "TO DO" list.


This seems to be stuck when it is in ab initio stage. As far as I can tell the strand is stuck, its on model 24 step 11,000 and counting higher. But the RMSD is stuck on 13.xx with xx being the variable numbers. The accepted energy is not really stuck, but does not register on the graph. It appears stuck at the top. The progress keeps counting in BOINC manager though, so it's not stuck in a endless loop according to it. I will let it run its course as it is now 6 hrs into the process.
I have one more WU of the same type to run still. Is this normal?


ID: 39179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rechenknecht123

Send message
Joined: 15 Oct 06
Posts: 17
Credit: 2,022
RAC: 0
Message 39180 - Posted: 8 Apr 2007, 22:12:52 UTC - in response to Message 39179.  
Last modified: 8 Apr 2007, 22:22:54 UTC

Hi Rhiju, and anders n, this WU
Mo 9 Apr 00:02:04 2007|rosetta@home|Restarting task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom001__1638_96906_0 using rosetta version 559

hangs now in the 13h. at 98,745 % no checkpoint wrote,

now i runs from Zero 0 %, 3104:34:17h to run


the 5 run at Easter.
what now
kill for ever or run a 6 time.

rechenknecht






Hi greg_be ... I think the behavior is OK. Please leave it running! Fixing the scale on the graph of the energy is definitely on the "TO DO" list.


This seems to be stuck when it is in ab initio stage. As far as I can tell the strand is stuck, its on model 24 step 11,000 and counting higher. But the RMSD is stuck on 13.xx with xx being the variable numbers. The accepted energy is not really stuck, but does not register on the graph. It appears stuck at the top. The progress keeps counting in BOINC manager though, so it's not stuck in a endless loop according to it. I will let it run its course as it is now 6 hrs into the process.
I have one more WU of the same type to run still. Is this normal?



ID: 39180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Prime Lemur

Send message
Joined: 22 Feb 06
Posts: 1
Credit: 89,553
RAC: 0
Message 39181 - Posted: 9 Apr 2007, 0:12:20 UTC

Just a minor problem observed (may not be RAH's problem):

I'm currently crunching 1fkaA_BOINC_ZEROWATSONCRICK_RNA_ABINITIO-1fkaA-chunk006__1659_76_1. I changed my RAH preferences (resource share) while another project was running. A new WU downloaded from RAH. When BOINC switched back to my first RAH WU, the Progress % reset to zero. (Thankfully) CPU Time did not change, but the To Completion time grew to almost double (to 03:58:45) what it was before switching projects/changing prefs/downloading new WU.

Like I say, the CPU Time did not change, so there was no loss of work. I realise it could be a BOINC issue as much as RAH, but I thought I'd share this anyway.

Prime Lemur
ID: 39181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 39185 - Posted: 9 Apr 2007, 7:31:53 UTC - in response to Message 39180.  

Hi Rhiju, and anders n, this WU
Mo 9 Apr 00:02:04 2007|rosetta@home|Restarting task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom001__1638_96906_0 using rosetta version 559

hangs now in the 13h. at 98,745 % no checkpoint wrote,

now i runs from Zero 0 %, 3104:34:17h to run


the 5 run at Easter.
what now
kill for ever or run a 6 time.

rechenknecht


Hi there
I tried to calculate how long a model should take on your MAC.
It should take 3,5-5,5 H with that kind of Wu.
If you decide to let it run check the grafics sometimes so the steps are counting up.

Anders n

ID: 39185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rechenknecht123

Send message
Joined: 15 Oct 06
Posts: 17
Credit: 2,022
RAC: 0
Message 39186 - Posted: 9 Apr 2007, 9:17:21 UTC - in response to Message 39185.  

Hallo anders n,
this is a other WU- on my other disk partition. it Runs under MAc os 10.49 in the Boinc container 5.8.15 .

at 97,415% ready
cpu time 6:17:00h
time til ready
00:09:54h stands there for 10 min.


Grafic kontrol is ok-
stage symetric relax
stands at model 1.
step 69969
acceptet energy: - 311,4855
Now


Mo 9 Apr 11:02:32 2007|rosetta@home|Resuming task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_298_1 using rosetta version 559





Hi Rhiju, and anders n, this WU
Mo 9 Apr 00:02:04 2007|rosetta@home|Restarting task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom001__1638_96906_0 using rosetta version 559

hangs now in the 13h. at 98,745 % no checkpoint wrote,

now i runs from Zero 0 %, 3104:34:17h to run


the 5 run at Easter.
what now
kill for ever or run a 6 time.

rechenknecht


Hi there
I tried to calculate how long a model should take on your MAC.
It should take 3,5-5,5 H with that kind of Wu.
If you decide to let it run check the grafics sometimes so the steps are counting up.

Anders n


ID: 39186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rechenknecht123

Send message
Joined: 15 Oct 06
Posts: 17
Credit: 2,022
RAC: 0
Message 39187 - Posted: 9 Apr 2007, 9:32:08 UTC - in response to Message 39181.  

Might a Boinc problem.
this WU.
Mo 9 Apr 11:02:32 2007|rosetta@home|Resuming task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_298_1 using rosetta version 559

Running up til 6:24:20h running time at 97,455% ready. resttime 0:09:21h in step 1. modell 69969

then stop the All WUs( seti, simap, r@h) in Bonic- all runs fine. wenn i press start to continue the single
WU.

but as i close boinc over the Quit- button in the Menue this happends.

rechenknecht





Just a minor problem observed (may not be RAH's problem):

I'm currently crunching 1fkaA_BOINC_ZEROWATSONCRICK_RNA_ABINITIO-1fkaA-chunk006__1659_76_1. I changed my RAH preferences (resource share) while another project was running. A new WU downloaded from RAH. When BOINC switched back to my first RAH WU, the Progress % reset to zero. (Thankfully) CPU Time did not change, but the To Completion time grew to almost double (to 03:58:45) what it was before switching projects/changing prefs/downloading new WU.

Like I say, the CPU Time did not change, so there was no loss of work. I realise it could be a BOINC issue as much as RAH, but I thought I'd share this anyway.

Prime Lemur


ID: 39187 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rechenknecht123

Send message
Joined: 15 Oct 06
Posts: 17
Credit: 2,022
RAC: 0
Message 39188 - Posted: 9 Apr 2007, 9:34:08 UTC - in response to Message 39181.  

Might a Boinc problem.
this WU.
Mo 9 Apr 11:02:32 2007|rosetta@home|Resuming task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_298_1 using rosetta version 559

Running up til 6:24:20h running time at 97,455% ready. resttime 0:09:21h in step 1. modell 69969

then stop the All WUs( seti, simap, r@h) in Bonic- all runs fine. wenn i press start to continue the single
WU.

but as i close boinc over the Quit- button in the Menue this happends.

rechenknecht





Just a minor problem observed (may not be RAH's problem):

I'm currently crunching 1fkaA_BOINC_ZEROWATSONCRICK_RNA_ABINITIO-1fkaA-chunk006__1659_76_1. I changed my RAH preferences (resource share) while another project was running. A new WU downloaded from RAH. When BOINC switched back to my first RAH WU, the Progress % reset to zero. (Thankfully) CPU Time did not change, but the To Completion time grew to almost double (to 03:58:45) what it was before switching projects/changing prefs/downloading new WU.

Like I say, the CPU Time did not change, so there was no loss of work. I realise it could be a BOINC issue as much as RAH, but I thought I'd share this anyway.

Prime Lemur


ID: 39188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 39197 - Posted: 9 Apr 2007, 19:55:21 UTC

http://img45.imageshack.us/img45/2572/berru4.gif
this after restart Rosetta

ID: 39197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 24 Sep 05
Posts: 28
Credit: 3,892,197
RAC: 1,520
Message 39204 - Posted: 9 Apr 2007, 22:21:12 UTC
Last modified: 9 Apr 2007, 23:17:13 UTC

I have had occasional problems with V5.59 on Linux Suse 10.2. Not every result had a problem. The following results (as a sample) died:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=64664432
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=64280490

More on this machine have died, but not on other machines.

Host 395030 (the one referenced) is a Celeron 1.3 GHz CPU with 640 MB of RAM.

I received a segment fault on these (and other) results. Everything was working OK before 5.59.

My other computers are fine, both Windows and Linux. This seems strange to me. I'm running BOINC 5.8.17 (Windows) and BOINC 5.8.16 (Linux) on my machines. They ought to be current for now.
ID: 39204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 39212 - Posted: 10 Apr 2007, 1:53:29 UTC

I just finished & returned this one, I don't know why it finished

early and the numbers are odd.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=64512157

Over Success Done 22,383.00 46.47 52.78

cpu_run_time_pref: 36000
======================================================
DONE :: 1 starting structures built 77 (nstruct) times
This process generated 48 decoys from 48 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down

ID: 39212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ty

Send message
Joined: 2 Mar 06
Posts: 2
Credit: 50,697
RAC: 0
Message 39214 - Posted: 10 Apr 2007, 2:23:29 UTC

BOINC 5.8.15 ~ Rosetta 5.59 ~ I have 4 active projects. Switch Apps 70 min.
Workunit: 1lz1_BOINC_POSEDISULF_SAVE_ALL_OUT_1643_3382_0 Result id 72045046
"CPU Time" looks like it increments properly each second.
"Progress" resets to 0 when restarting unit then looks like it increments.
"To Completion" will increment 3~4 seconds then decrement about 12~18 seconds usually 15 seconds. In a 60 second period To Completion paused 11~16 times for 1 second while CPU time incremented. Sometimes To Completion jumped back, sometimes it continued forward counting. The consistancies ~ CPU Time counted 5 seconds forward and To Completion jumped back 11~18 seconds (usually 15 sec).
ID: 39214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 39242 - Posted: 10 Apr 2007, 19:24:05 UTC

What the ***********
https://boinc.bakerlab.org/rosetta/result.php?resultid=71749731
https://boinc.bakerlab.org/rosetta/result.php?resultid=71748628
86,684.79
374.87
172.00 ??????????????????????
ID: 39242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5661
Credit: 5,699,284
RAC: 2,079
Message 39249 - Posted: 10 Apr 2007, 21:49:45 UTC

i had to do a windows security update and restart my system.
i was 2 hrs plus into the crunch when i exited.
when i restarted the wu, the cpu time remained the system but the percent complete went back to 0. Its currently in model 7 where it last left off in models and steps, but the percent complete seems low at 1.43% and counting for just under 3hrs in a 8hr run. Is this correct? or is there a problem with the percent complete stats?
ID: 39249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 39254 - Posted: 11 Apr 2007, 4:12:24 UTC - in response to Message 39249.  
Last modified: 11 Apr 2007, 4:42:47 UTC

i had to do a windows security update and restart my system.
i was 2 hrs plus into the crunch when i exited.
when i restarted the wu, the cpu time remained the system but the percent complete went back to 0. Its currently in model 7 where it last left off in models and steps, but the percent complete seems low at 1.43% and counting for just under 3hrs in a 8hr run. Is this correct? or is there a problem with the percent complete stats?


There is a problem with % to finish when you restart Boinc or when a wu
is preemted and not set to keep in memory.

You do not lose more work now than you did before just % to finish is off.
Anders n

[edit] also the estimated time to finish goes high then fast down again [/edit]
ID: 39254 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 39259 - Posted: 11 Apr 2007, 9:17:01 UTC

I report this incident, since it seems that (imperfect?) internet conditions seem to influence the wu crunching:

(ibook G4 10.3.9)

Result

stderr out
<core_client_version>5.8.17</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 2300066
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
# random seed: 2300066
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
======================================================
DONE :: 1 starting structures built 7 (nstruct) times
This process generated 6 decoys from 6 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

Local messages:

((2 first wus completed successfully.

Then changed location and had to connect to a very unstable wireless network connection.))

Tir 10 Apr 17:00:40 2007|rosetta@home|Sending scheduler request: Requested by user
Tir 10 Apr 17:00:40 2007|rosetta@home|Reporting 1 tasks
Tir 10 Apr 17:04:46 2007|rosetta@home|Scheduler request failed: HTTP internal server error
Tir 10 Apr 17:04:46 2007|rosetta@home|Deferring communication for 1 min 0 sec
Tir 10 Apr 17:04:46 2007|rosetta@home|Reason: scheduler request failed
Tir 10 Apr 17:04:51 2007|ralph@home|Sending scheduler request: To fetch work
Tir 10 Apr 17:04:51 2007|ralph@home|Requesting 22067 seconds of new work
Tir 10 Apr 17:10:03 2007||Project communication failed: attempting access to reference site
Tir 10 Apr 17:10:03 2007|ralph@home|Scheduler request failed: a timeout was reached
Tir 10 Apr 17:10:03 2007|ralph@home|Deferring communication for 1 hr 8 min 5 sec
Tir 10 Apr 17:10:03 2007|ralph@home|Reason: scheduler request failed
Tir 10 Apr 17:10:19 2007|rosetta@home|Sending scheduler request: Requested by user
Tir 10 Apr 17:10:19 2007|rosetta@home|Reporting 1 tasks
Tir 10 Apr 17:11:04 2007|rosetta@home|Task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 exited with zero status but no 'finished' file
Tir 10 Apr 17:11:04 2007|rosetta@home|If this happens repeatedly you may need to reset the project.
Tir 10 Apr 17:11:04 2007|rosetta@home|Restarting task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 using rosetta version 559
((!!!!!!)) Tir 10 Apr 17:11:05 2007||Access to reference site succeeded - project servers may be temporarily down.

Tir 10 Apr 17:12:50 2007|rosetta@home|Scheduler RPC succeeded [server version 509]
Tir 10 Apr 17:12:50 2007|rosetta@home|Deferring communication for 4 min 2 sec
Tir 10 Apr 17:12:50 2007|rosetta@home|Reason: requested by project
Tir 10 Apr 18:18:09 2007|ralph@home|Sending scheduler request: To fetch work
Tir 10 Apr 18:18:09 2007|ralph@home|Requesting 33299 seconds of new work
Tir 10 Apr 18:18:55 2007|rosetta@home|Task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 exited with zero status but no 'finished' file
Tir 10 Apr 18:18:55 2007|rosetta@home|If this happens repeatedly you may need to reset the project.
Tir 10 Apr 18:18:55 2007|rosetta@home|Restarting task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 using rosetta version 559
((!!!!!!)) 2007-04-10 18:23:07 [ralph@home] Scheduler request failed: HTTP internal server error
2007-04-10 18:23:07 [ralph@home] Deferring communication for 41 min 36 sec
2007-04-10 18:23:07 [ralph@home] Reason: scheduler request failed

2007-04-10 18:41:38 [---] Suspending network activity - user request
((After suspending network activity no more problems.))

-- R. A. Mostol
ID: 39259 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Superfluence

Send message
Joined: 11 Apr 07
Posts: 2
Credit: 141
RAC: 0
Message 39290 - Posted: 12 Apr 2007, 2:52:15 UTC
Last modified: 12 Apr 2007, 2:57:51 UTC

I´m on a Mac 10.4.8 iBook and version 5.59 has A LOT of Bugs!

1. Most annoying of all: The crunching starts at 0% everytime Boinc is shut down and restarted. Some of the Time is still there but it´s 0% - Percentage runs faster from this point - BUT if it is restarted again (a second time, third time and so on) this new crunched data is lots and the time is also reset to the same like at the first restart.

So i crunched the same s*** 10 times today - :/

2. When the Mac goes to sleep the Project doesn´t start right. So sometimes a restart of Boinc is needed and guess what: the data is going bye bye...

3. Maybe this is a Boinc Problem, but when rosetta is crunching my iBook is becoming very slow - especially the Internet.

PLZ help me - or fix the Version cause this really "sucks monkey ass!" ;)
ID: 39290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lada JNet

Send message
Joined: 25 Mar 07
Posts: 2
Credit: 1,518
RAC: 0
Message 39297 - Posted: 12 Apr 2007, 8:27:26 UTC - in response to Message 38870.  

Hello,
I'm not sure if you are aware of this problem, but from time to time it happens to me, that computation stops in the middle of work. The CPU stops computing, nothing hangs, restarting BOINC helps. However, I have some computers I am not constantly checking and this glitch yesterday caused one of these computers to idle for ten hours before I found out and restarted BOINC... I'm afraid I cannot keep subscribed to Rosetta project on unnatended computers if this will occasionally happen.

Some friends of mine observed the same problem and had to do the same. Well, we do not produce as much significant amount of credit than some other users, however I believe that losing any computation potential is a waste for Rosetta...

If you would let me know when this is resolved, I'd like to continue to crunch more work from Rosetta again on unattended computers.
ID: 39297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Problems with Rosetta version 5.59



©2024 University of Washington
https://www.bakerlab.org