What's with all the errors???

Message boards : Number crunching : What's with all the errors???

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2477
Credit: 46,506,558
RAC: 3,357
Message 65119 - Posted: 26 Jan 2010, 17:13:40 UTC

I now I made the claim, but it's an assumption until Roger confirms it's the same in the real world.

A couple of exception errors on one machine but otherwise still looking good from here.
ID: 65119 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65128 - Posted: 27 Jan 2010, 14:07:00 UTC - in response to Message 65119.  

I now I made the claim, but it's an assumption until Roger confirms it's the same in the real world.

A couple of exception errors on one machine but otherwise still looking good from here.


Not sure if those exceptions were from past or present. I'm showing 33 successes without error. Looking good to me. Thanks again to everyone.
ID: 65128 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Memory of Sampson Stein

Send message
Joined: 29 Dec 07
Posts: 3
Credit: 65,585,980
RAC: 0
Message 65134 - Posted: 28 Jan 2010, 3:38:40 UTC

I am not getting any errors but what is happening since the bew year is I am getting WU that should take around 3 hours running for 120 jours or more if i dont catch it and squash it.

I have a bunch of boincers diff types/ diff op systems/ diff boinc versions and they are all acting up this way.

Anyone else having this problem?
ID: 65134 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1898
Credit: 12,724,450
RAC: 581
Message 65135 - Posted: 28 Jan 2010, 9:27:14 UTC - in response to Message 65134.  

I am not getting any errors but what is happening since the bew year is I am getting WU that should take around 3 hours running for 120 jours or more if i dont catch it and squash it.

I have a bunch of boincers diff types/ diff op systems/ diff boinc versions and they are all acting up this way.

Anyone else having this problem?


WOW some of your machines have many, many day caches on them! Anyway do you have the setting checked to leave units in memory when swapping? It is under Your Account, Computing Preferences and is in the top section and says
"Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes')"

If you have it set to No then change it to Yes and see if the problems clear up.
ID: 65135 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Memory of Sampson Stein

Send message
Joined: 29 Dec 07
Posts: 3
Credit: 65,585,980
RAC: 0
Message 65137 - Posted: 28 Jan 2010, 11:18:55 UTC - in response to Message 65135.  
Last modified: 28 Jan 2010, 11:22:41 UTC

I am not getting any errors but what is happening since the bew year is I am getting WU that should take around 3 hours running for 120 jours or more if i dont catch it and squash it.

I have a bunch of boincers diff types/ diff op systems/ diff boinc versions and they are all acting up this way.

Anyone else having this problem?


WOW some of your machines have many, many day caches on them! Anyway do you have the setting checked to leave units in memory when swapping? It is under Your Account, Computing Preferences and is in the top section and says
"Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes')"

If you have it set to No then change it to Yes and see if the problems clear up.


I'll check that out, Thanks
the weird thing is before the rosetta servers went down over the holidays they all worked fine, the problem started after the rosetta servers came back online.
ID: 65137 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 65139 - Posted: 28 Jan 2010, 16:04:16 UTC - in response to Message 65134.  

I am not getting any errors but what is happening since the bew year is I am getting WU that should take around 3 hours running for 120 jours or more if i dont catch it and squash it.

I have a bunch of boincers diff types/ diff op systems/ diff boinc versions and they are all acting up this way.

Anyone else having this problem?

120 hours!? Are you sure you are not looking at elapsed (wall clock) time rather than cpu time? At some point BOINC manager began displaying wall clock time in addition to cpu time and I think in some configurations of the display only elapsed time is visible.

Snags
ID: 65139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Memory of Sampson Stein

Send message
Joined: 29 Dec 07
Posts: 3
Credit: 65,585,980
RAC: 0
Message 65140 - Posted: 28 Jan 2010, 19:21:00 UTC - in response to Message 65139.  

I am not getting any errors but what is happening since the bew year is I am getting WU that should take around 3 hours running for 120 jours or more if i dont catch it and squash it.

I have a bunch of boincers diff types/ diff op systems/ diff boinc versions and they are all acting up this way.

Anyone else having this problem?

120 hours!? Are you sure you are not looking at elapsed (wall clock) time rather than cpu time? At some point BOINC manager began displaying wall clock time in addition to cpu time and I think in some configurations of the display only elapsed time is visible.

Snags



nope, actual cpu time

most of the time i just abort them but I found if i suspend the WU then resume it it starts running normal.

I have watched some of the ones that have stalled and the progess is .001% every few minutes on a C2Q machine. at the same time the other WU on that machine are progressing at a normal rate.
ID: 65140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 65141 - Posted: 28 Jan 2010, 20:13:18 UTC

nope, actual cpu time

I am running two of the cl1... and found that they were going for over ten hours while the cpu time said 2 and 4 hours respectively. After restarting boinc they are running from their last checkpoint and ironically both are working on model 10

ID: 65141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 65142 - Posted: 28 Jan 2010, 23:59:36 UTC

Having restarted these cl1... work units they finished correctly at just over 6 hours.
ID: 65142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : What's with all the errors???



©2025 University of Washington
https://www.bakerlab.org