can it really be so slow ?

Message boards : Number crunching : can it really be so slow ?

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Padanian

Send message
Joined: 27 Sep 05
Posts: 14
Credit: 15,190
RAC: 0
Message 890 - Posted: 2 Oct 2005, 0:01:50 UTC - in response to Message 872.  

I had a WU stuck at 83.66% for 5 hours (2.8GHz P4). It suddenly jumped to 91.66, and then back to 83.66% per 5 or 6 times. I had to kill it.


Are you running more then one project on that computer, so that it is switching projects. If so, under general preferences try increasing the "time" between project switches to 120 minutes instead of the default 60. From 83.33% on, the calculations are more complex and take longer. Another option is to "leave in memory". I had this same problem, and increasing the time between switches solved it.


I'll try this one out. Thanks
ID: 890 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Doug Worrall
Avatar

Send message
Joined: 19 Sep 05
Posts: 60
Credit: 58,445
RAC: 0
Message 891 - Posted: 2 Oct 2005, 0:23:57 UTC

For sure that should not be stuck for that long.Myseld have put a W/U through the
wringer,and it had nada errors.First,went into Root,oops,thats o,k.,Woops
Rebotted on a Rosetta W/u due to swap issues.Then tried "Aborting",doesnt Boinc
Manager start doing the "Chicken".Flashing and Frozen.Did Cntrl Alt Backsp, no
go,dohhhh.Had to Manually Reset P.C.Signed back into user account went to Boinc
and it was there,and the W/U {thinking w/u corrupted} I Paused it,Ran Predictor
when I found it was back.36 hours later took Rosetta off Pause let w/u finish.
Result
Rosetta@home
Result

Result ID 84726
Name 1pvaA_abrelax_no_cst_15903_0
Workunit 72423
Created 26 Sep 2005 21:18:38
Sent 30 Sep 2005 22:17:27
Received 1 Oct 2005 23:12:02
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 3584
Report deadline 28 Oct 2005 22:17:27
CPU time 8660.681376
stderr out

4.43




Validate state Valid
Claimed credit 10.5439759610782
Granted credit 10.5439759610782
application version 4.77

WOW,This is great news for either a great O.S sooo Stable or that W/u was
a Superduper W/U and if so,and Rosetta does not need to be handled with "Kid Gloves" then will begin Round Robin again.I thought a R. unit could not be
stopped and you could not Quit Boinc?Apologise for getting Off Topic
Doug Worrall
Boinc Synergy
ID: 891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Rebirther
Avatar

Send message
Joined: 17 Sep 05
Posts: 116
Credit: 41,315
RAC: 0
Message 900 - Posted: 2 Oct 2005, 8:30:49 UTC

Also got a bad WU. After 2h and 1% I have aborted this.
WU 68187
ID: 900 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>france>pas-de-calais]symaski62

Send message
Joined: 19 Sep 05
Posts: 47
Credit: 33,871
RAC: 0
Message 906 - Posted: 2 Oct 2005, 12:56:29 UTC - in response to Message 900.  
Last modified: 2 Oct 2005, 12:57:46 UTC

Also got a bad WU. After 2h and 1% I have aborted this.
WU 68187


WU 68187




ID: 906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pconfig

Send message
Joined: 26 Sep 05
Posts: 6
Credit: 56,254
RAC: 0
Message 908 - Posted: 2 Oct 2005, 16:24:10 UTC

Note: I haven't got a hanging wu since the new type of wu's are handed out...
Proud member of the Dutch Power Cows
ID: 908 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Padanian

Send message
Joined: 27 Sep 05
Posts: 14
Credit: 15,190
RAC: 0
Message 909 - Posted: 2 Oct 2005, 16:24:57 UTC - in response to Message 890.  

I had a WU stuck at 83.66% for 5 hours (2.8GHz P4). It suddenly jumped to 91.66, and then back to 83.66% per 5 or 6 times. I had to kill it.


Are you running more then one project on that computer, so that it is switching projects. If so, under general preferences try increasing the "time" between project switches to 120 minutes instead of the default 60. From 83.33% on, the calculations are more complex and take longer. Another option is to "leave in memory". I had this same problem, and increasing the time between switches solved it.


I'll try this one out. Thanks


It worked out

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=112718

ID: 909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Juerschi

Send message
Joined: 17 Sep 05
Posts: 8
Credit: 14,145
RAC: 0
Message 913 - Posted: 2 Oct 2005, 21:30:49 UTC
Last modified: 2 Oct 2005, 21:31:08 UTC

Today I had my first WU stucking at 1% over 1hour. A normal WU is crunched at 2 hours at my host. I closed Boinc, started it again and WU finished in normal time without any error
ID: 913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 3,595,612
RAC: 22,796
Message 920 - Posted: 3 Oct 2005, 9:15:52 UTC
Last modified: 3 Oct 2005, 9:17:30 UTC

I get 4-6 WU's a day stuck at the 1% Mark, all my PC's the are P4 HT type. If I didn't monitor them fairly closely I can easily envision all of them after a few days just doing nothing but sitting there with WU's at the 1% Completion Mark.

This is most definitely a problem that needs to be addressed by the Dev's as it is a lot of wasted CPU Time. Some of the stuck 1% WU's I don't catch until after 3 or 4 hr's of running time. The WU's should have been done by then but I either have to shut down BOINC & restart it again or Abort the WU & try to get another WU to pass the 1% Mark.
ID: 920 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 931 - Posted: 4 Oct 2005, 0:25:52 UTC

Can anyone who sees this error (stuck at 1%) send me the stdout.txt file while it is still stuck and running located in the boinc client installation (for windows, it is likely located at c:/Program Files/BOINC/slots/0/). Please, only the first one who can do this, reply to this post and I will confirm and then you can email me the file, so I just get one email rather then emails from all of you.

Thanks!
ID: 931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 3,595,612
RAC: 22,796
Message 932 - Posted: 4 Oct 2005, 0:33:54 UTC
Last modified: 4 Oct 2005, 1:33:25 UTC

I have 1 WU right stuck @ 1% now showing 3:03:30 running time with 303:12:15 to Completion Time. I can send the stdout.txt file to you if you still need it.

I'm going to just suspend the WU for now and start another one ...

PS: File sent David ...
ID: 932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Shaktai
Avatar

Send message
Joined: 21 Sep 05
Posts: 56
Credit: 575,419
RAC: 0
Message 934 - Posted: 4 Oct 2005, 2:34:17 UTC
Last modified: 4 Oct 2005, 2:34:44 UTC

I have seen them stuck at 1% on both Windows and Mac. Seems to occur more often on the dual core boxes, but that could just be because they crunch more units. Had one stuck today for about 12 hours on my AMD 64 X2 4200. A simple quit and restart of BOINC has fixed the problem every time. maybe 4-5 a week for me.


Team MacNN - The best Macintosh team ever.
ID: 934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Padanian

Send message
Joined: 27 Sep 05
Posts: 14
Credit: 15,190
RAC: 0
Message 936 - Posted: 4 Oct 2005, 6:10:37 UTC

It happened to me with this last WU. 2 hours spent on 1%.
I had to abort the WU and resume, after that a computation error showed up.
And no, I don't release Rosetta from memory. Maybe I shall try to attribute rosetta to one logical CPU only to avoid what Shaktai is supposing here above me.
I think the devs must look seriously into this, because is affecting the whole BOINC efficiency very much.
ID: 936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 945 - Posted: 4 Oct 2005, 13:23:03 UTC - in response to Message 936.  

It happened to me with this last WU. 2 hours spent on 1%.
I had to abort the WU and resume, after that a computation error showed up.
And no, I don't release Rosetta from memory. Maybe I shall try to attribute rosetta to one logical CPU only to avoid what Shaktai is supposing here above me.
I think the devs must look seriously into this, because is affecting the whole BOINC efficiency very much.


I'm looking into it. Can everyone on this thread (no one else as I don't want too many emails) who is having the problem send me the stdout.txt files in slot0 and slot1 (if you are running dual cpu) in the BOINC installation? dekim at u.washington.edu.

Thanks,

David K
ID: 945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Shaktai
Avatar

Send message
Joined: 21 Sep 05
Posts: 56
Credit: 575,419
RAC: 0
Message 966 - Posted: 5 Oct 2005, 0:54:43 UTC - in response to Message 945.  

I'm looking into it. Can everyone on this thread (no one else as I don't want too many emails) who is having the problem send me the stdout.txt files in slot0 and slot1 (if you are running dual cpu) in the BOINC installation? dekim at u.washington.edu.

Thanks,

David K


You've got mail. 2 computers. One a Pentium D 840 dual core and the other an AMD 64 X2 4200 dual core. Both just happened to have 1 each work unit frozen at 1%. One for 6 hours and 1 for 18 hours. Hope that helps.



Team MacNN - The best Macintosh team ever.
ID: 966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 3,595,612
RAC: 22,796
Message 977 - Posted: 5 Oct 2005, 11:31:46 UTC
Last modified: 5 Oct 2005, 12:01:20 UTC

David, I'm sending you 2 stdout.txt files to you because both WU's were stuck @ 91.67% this morning when I got up. Both WU's were between 5 & 6 Hours CPU running time when the normal completion time for this P4 3.4Ghz HT PC is around 3 1/2 hours.

I Suspended the Project & shut down BOINC & Re-Started it again & the WU's showed about 3 hours running time @ 91.67% . After they ran for a little while the % for both WU's actually dropped back to 83.33% with the completion time climbing up over 50 hours...

PS: Both of the WU's have finished now without any futher complications ... The files I sent can still be of some help hopefully to see why the WU's were stuck in the first place ...
ID: 977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Shaktai
Avatar

Send message
Joined: 21 Sep 05
Posts: 56
Credit: 575,419
RAC: 0
Message 1081 - Posted: 7 Oct 2005, 22:14:03 UTC

I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it.


Team MacNN - The best Macintosh team ever.
ID: 1081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Padanian

Send message
Joined: 27 Sep 05
Posts: 14
Credit: 15,190
RAC: 0
Message 1108 - Posted: 8 Oct 2005, 14:04:06 UTC - in response to Message 1081.  

I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it.


No, i've just got one HT machine, and it get stuck this morning at 83.33%
ID: 1108 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Shaktai
Avatar

Send message
Joined: 21 Sep 05
Posts: 56
Credit: 575,419
RAC: 0
Message 1110 - Posted: 8 Oct 2005, 15:05:14 UTC - in response to Message 1108.  
Last modified: 8 Oct 2005, 15:07:43 UTC

I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it.


No, i've just got one HT machine, and it get stuck this morning at 83.33%


There are two different problems. One where it gets stuck at 1%, and the one where it gets stuck at 83.33%. For many folks the solution to the 83.33% issue was to "leave application in memory" and if running more then one project, to extend the time between "switches" from the default of 60 minutes, to 90-180 minutes (depending on the speed of the machine). With the 83.33%, the calculations become much more complex and take longer. If you are switching between projects every 60 minutes, then it may not reach the next step (91.66%) before the switch and then will restart at the 83.33% when it switches back. David Kim is looking at both of these issues for fixes.


Team MacNN - The best Macintosh team ever.
ID: 1110 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Padanian

Send message
Joined: 27 Sep 05
Posts: 14
Credit: 15,190
RAC: 0
Message 1114 - Posted: 8 Oct 2005, 15:40:39 UTC - in response to Message 1110.  

I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it.


No, i've just got one HT machine, and it get stuck this morning at 83.33%


There are two different problems. One where it gets stuck at 1%, and the one where it gets stuck at 83.33%. For many folks the solution to the 83.33% issue was to "leave application in memory" and if running more then one project, to extend the time between "switches" from the default of 60 minutes, to 90-180 minutes (depending on the speed of the machine). With the 83.33%, the calculations become much more complex and take longer. If you are switching between projects every 60 minutes, then it may not reach the next step (91.66%) before the switch and then will restart at the 83.33% when it switches back. David Kim is looking at both of these issues for fixes.


I'm well aware of the workaround, though neither leaving the app in memory, nor increasing the switching time helped me.

ID: 1114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 1119 - Posted: 8 Oct 2005, 18:43:14 UTC - in response to Message 1110.  


There are two different problems. One where it gets stuck at 1%, and the one where it gets stuck at 83.33%. For many folks the solution to the 83.33% issue was to "leave application in memory" and if running more then one project, to extend the time between "switches" from the default of 60 minutes, to 90-180 minutes (depending on the speed of the machine). With the 83.33%, the calculations become much more complex and take longer. If you are switching between projects every 60 minutes, then it may not reach the next step (91.66%) before the switch and then will restart at the 83.33% when it switches back. David Kim is looking at both of these issues for fixes.


Thanks, I changed my setting to 120 minutes. I have one of these 83.33 % units at the moment and I'll hate to abort it! But if 120 min isn't enough, I'll change it to 180 min.

If it's still stuck after this, I'll abort it! But as I am the second one, who's getting this WU, one could get a suspicion that it's bad!


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 1119 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : can it really be so slow ?



©2024 University of Washington
https://www.bakerlab.org