Loads and loads of computing errors today

Message boards : Number crunching : Loads and loads of computing errors today

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 1635 - Posted: 23 Oct 2005, 6:07:57 UTC

I've spent the last two hours poring over the results from the last few days of runs--they are pretty amazing! I don't know what the origin of the computing errors is, but all the results that have been returned are fine. the "random_length_20" runs are forcing one residue in each 20 amino acid segment into one randomly selected conformation--thie idea is to keep the different runs spread out by fixing several randomly selected residues into specific but randomly selected states. this is like forcing different explorers to different (randomly selected) regions of the globe. (it is remotely possible that in some rare cases, the residues are fixed into states that cause an error like you saw below, but this doesn't seem very likely).

so far, of the different runs you all have done, the "random_length_20" runs are sampling the most broadly, and we are just about to start jobs where more residues are randomly fixed (they will have names like random_length_15, etc.).

On a different note, many people put a lot of effort into reducing the memory footprint over the last few weeks--is this reducing some of the problems all of you were having earlier?
ID: 1635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 1637 - Posted: 23 Oct 2005, 6:32:19 UTC - in response to Message 1635.  
Last modified: 23 Oct 2005, 6:34:01 UTC

[quote]I've spent the last two hours poring over the results from the last few days of runs--they are pretty amazing! I don't know what the origin of the computing errors is, ......"

>Well, for me personally, all errors on four boxes were cleared when I upgraded to BOINC 5.2.2 from BOINC 4.19. Our other boxes were running BOINC 5.2.1 and were just fine and didn't miss a beat when R@H 4.78 was introduced. IMHO it was pretty obvious that R@H 4.78 was not very happy running on the older BOINC version. Cheers, Rog.
ID: 1637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Rebirther
Avatar

Send message
Joined: 17 Sep 05
Posts: 116
Credit: 41,315
RAC: 0
Message 1638 - Posted: 23 Oct 2005, 7:03:03 UTC
Last modified: 23 Oct 2005, 7:04:22 UTC


ID: 1638 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 42
Message 1640 - Posted: 23 Oct 2005, 9:15:36 UTC
Last modified: 23 Oct 2005, 9:16:22 UTC

I have not seen any errors using the 4.78 application and the 4.25 core client. 3.2GHz P-IV HT Win XP SP2.

I can't go to version 5 yet as LHC@Home does not accept 5x yet.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 1640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 421
Message 1647 - Posted: 23 Oct 2005, 13:23:51 UTC
Last modified: 23 Oct 2005, 13:26:39 UTC

I've been running for almost a month on my Linux system with AMD's XP2600+ processor and don't believe I've had any errors that I have not caused myself. I'm running R@H full time. I'm using version 4.72 of the core client and I compiled it myself in order to optimize it. (Did that before I even knew of R@H.) Won't switch to 5.X untill I can compile it myself once I get some libraries updated (libcurl, etc.) or one of the other people who made optimized 4.x clients available does so for 5.x, too.

Just another data point.

-Charlie
ID: 1647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 42
Message 1650 - Posted: 23 Oct 2005, 15:07:22 UTC

Off topic, but I don't really see why people bother with the optimised core client, it uses so little CPU and runs for such a short time anyway. Client apps, that's different of course.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 1650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 421
Message 1652 - Posted: 23 Oct 2005, 17:03:19 UTC - in response to Message 1650.  
Last modified: 23 Oct 2005, 17:05:25 UTC

Off topic, but I don't really see why people bother with the optimised core client, it uses so little CPU and runs for such a short time anyway. Client apps, that's different of course.


I run Boinc on two systems at home. One is my Linux server which has an AMD XP2600+ cpu. Runs at just over 2 GHz. The other is my old slow Windows box with 98SE. It's an old 300 Mhz PII. With the stock core client on both, the Linux machine would claim about half the credit the windows machine would for similar workunits. Didn't matter which project it was. It all had to do with the benchmarks run by the core client. The Linux client was simply not optimized the way the windows client was (and still is with the latest 5.x client as far as my experiments can tell.) Therefore, the benchmarks ran proportionalely slower on the Linux box and since those are used to calculate the claimed credit, it would claim about half the credit the windows box would.

If I compile my own Linux core client and optimize it, the credit claimed by the Linux box is just about the same as that claimed by my old slow windows box for similar workunits. I know I'm not pumping workunits through the Linux box any faster.

I realize the important thing in any project is the science, not the credit. Indeed, I choose to run R@H 100% of the time (with other projects ready to go in case R@H goes down) because of the science. I feel a project with the potential to help fight cancer, diabetes and other afflictions is more important than searching for ET as interesting and exciting as that search can be. However, the credit is an attraction. I know I can't compete with others who have several fast machines running, but I do like to keep an eye on how well I'm doing relative to others near me in the standings. It just adds to the fun.

-Charlie
ID: 1652 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 42
Message 1655 - Posted: 23 Oct 2005, 18:57:56 UTC

I hear what you say but am more puzzled then before. I don't see how compiling in MMX, SSE etc. would improve the integer or floating point performance that the benchmarks return.

I would expect the superior floating point unit in the AMD to outperform a similar Pentium in large apps like Rosetta. In little programs like Seti, the small size means it basically fits in the large cache of the newer Pentiums which makes up for the poorer floating point units.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 1655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 421
Message 1663 - Posted: 23 Oct 2005, 20:27:05 UTC - in response to Message 1655.  
Last modified: 23 Oct 2005, 20:31:19 UTC

I hear what you say but am more puzzled then before. I don't see how compiling in MMX, SSE etc. would improve the integer or floating point performance that the benchmarks return.

I would expect the superior floating point unit in the AMD to outperform a similar Pentium in large apps like Rosetta. In little programs like Seti, the small size means it basically fits in the large cache of the newer Pentiums which makes up for the poorer floating point units.


I don't think it's so much the MMX and/or SEE options. There are other optimizations the compiler can do. Anyway, here are the compile options I use. I believe this is the same as Ned Slider recommends on his website:

CFLAGS="-march=athlon-xp -O3 -fomit-frame-pointer -funroll-loops -fforce-addr -ffast-math -ftracer"

There is a discussion of the effects of these option at http://forums.pcper.com/showthread.php?t=354308. Got that from Ned Slider's website located at http://www.pperry.f2s.com/index.htm.
-Charlie
ID: 1663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 1664 - Posted: 23 Oct 2005, 20:30:12 UTC - in response to Message 1635.  


On a different note, many people put a lot of effort into reducing the memory footprint over the last few weeks--is this reducing some of the problems all of you were having earlier?


Pardon my ignorance, but does "reducing the memory footprint" mean the client uses less RAM, and thus perhaps boxes with 256mb can successfully be utilized now?

Thanks! :)

(And by the way, I never did have any problems, thank goodness.)

Regards,
Bob P.
ID: 1664 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 42
Message 1666 - Posted: 23 Oct 2005, 21:40:55 UTC
Last modified: 23 Oct 2005, 21:41:14 UTC

You got it in one Bob. Reducing the "footprint" of some parameter means making it smaller. Reduced memory footprint - uses less memory, reduced desk footprint - use less table space, reduced me footprint - less smelly trekking shoes for my wife to complain about.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 1666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 17 Sep 05
Posts: 22
Credit: 405,640
RAC: 0
Message 1680 - Posted: 24 Oct 2005, 7:43:28 UTC

Here a little summary for application 4.78:

571322 446833 23 Oct 2005 20:34:07 UTC 24 Oct 2005 7:24:04 UTC Over Client error Computing 0.00 0.00 ---
571307 446742 23 Oct 2005 20:34:07 UTC 24 Oct 2005 7:24:04 UTC Over Client error Computing 0.00 0.00 ---
558197 453166 23 Oct 2005 16:11:33 UTC 23 Oct 2005 20:34:07 UTC Over Client error Computing 0.00 0.00 ---
558170 453140 23 Oct 2005 16:11:33 UTC 23 Oct 2005 20:34:07 UTC Over Client error Computing 0.00 0.00 ---
548794 444768 23 Oct 2005 13:09:45 UTC 23 Oct 2005 16:11:33 UTC Over Client error Computing 0.00 0.00 ---
548776 444750 23 Oct 2005 13:09:45 UTC 23 Oct 2005 16:11:33 UTC Over Client error Computing 0.00 0.00 ---
520387 417610 24 Oct 2005 7:24:04 UTC 21 Nov 2005 7:24:04 UTC In Progress Unknown New --- --- ---
520386 417609 24 Oct 2005 7:24:04 UTC 21 Nov 2005 7:24:04 UTC In Progress Unknown New --- --- ---
490343 393057 23 Oct 2005 1:01:58 UTC 23 Oct 2005 13:09:45 UTC Over Client error Computing 0.00 0.00 ---
490342 393056 23 Oct 2005 1:01:58 UTC 23 Oct 2005 13:09:45 UTC Over Client error Computing 0.00 0.00 ---
480167 382959 22 Oct 2005 19:02:22 UTC 23 Oct 2005 1:01:58 UTC Over Client error Computing 0.00 0.00 ---
470188 373076 22 Oct 2005 13:13:12 UTC 23 Oct 2005 1:01:58 UTC Over Client error Computing 0.00 0.00 ---
470187 373075 22 Oct 2005 13:13:12 UTC 22 Oct 2005 23:46:03 UTC Over Success Done 5,559.07 14.27 14.27
441698 350373 21 Oct 2005 15:32:31 UTC 22 Oct 2005 13:13:12 UTC Over Client error Computing 0.00 0.00 ---

Except for one single wu all others errored out...

greetz, Uli

ID: 1680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 17 Sep 05
Posts: 22
Credit: 405,640
RAC: 0
Message 1684 - Posted: 24 Oct 2005, 10:41:30 UTC

Ok, the problem is not limited on AMD Athlons. Just got the same on a 1.8 GHz P4 Willamette CPU:
592071 480093 24 Oct 2005 9:33:00 UTC 24 Oct 2005 10:34:03 UTC Over Client error Computing 0.00 0.00 ---
591998 480021 24 Oct 2005 9:33:00 UTC 24 Oct 2005 10:34:03 UTC Over Client error Computing 0.00 0.00 ---
571322 446833 23 Oct 2005 20:34:07 UTC 24 Oct 2005 7:24:04 UTC Over Client error Computing 0.00 0.00 ---
571307 446742 23 Oct 2005 20:34:07 UTC 24 Oct 2005 7:24:04 UTC Over Client error Computing 0.00 0.00 ---

If nothing is been done on this, i'll detach for a while, cause this is simply a waste of bandwidth :/
greetz, Uli

ID: 1684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 17 Sep 05
Posts: 22
Credit: 405,640
RAC: 0
Message 1686 - Posted: 24 Oct 2005, 12:37:33 UTC
Last modified: 24 Oct 2005, 13:01:43 UTC


greetz, Uli

ID: 1686 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 1689 - Posted: 24 Oct 2005, 13:23:30 UTC - in response to Message 1666.  

You got it in one Bob. Reducing the "footprint" of some parameter means making it smaller. Reduced memory footprint - uses less memory, reduced desk footprint - use less table space, reduced me footprint - less smelly trekking shoes for my wife to complain about.


Thanks! I think I will try a 256mb box and see what happens....if the shoes are still too large (which I don't think they will be) I will retire the 256mb box to something else. ;)

Regards,
Bob P.
ID: 1689 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 1692 - Posted: 24 Oct 2005, 17:31:47 UTC
Last modified: 24 Oct 2005, 17:32:05 UTC

@Ulrich

It's too late now, but you should have uninstalled 4.19 (or older) client and then installed the 5.2 client. You wouldn't have lost any WU's then :(



ID: 1692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 17 Sep 05
Posts: 22
Credit: 405,640
RAC: 0
Message 1693 - Posted: 24 Oct 2005, 18:14:30 UTC - in response to Message 1692.  
Last modified: 24 Oct 2005, 18:21:52 UTC

@Ulrich
It's too late now, but you should have uninstalled 4.19 (or older) client and then installed the 5.2 client. You wouldn't have lost any WU's then :(

Thanks, but that's exactly, what i did. The installer already told me to first uninstall the old version :/
BTW: Nothing is really lost. I made a backup first ( It's not the first time this happens to me ;) ) and after i uninstalled 5.2.2, i copied back the old content. What's more annoying is: It already fetched a new CPDN wu, which i now trashed...
greetz, Uli

ID: 1693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 1694 - Posted: 24 Oct 2005, 18:28:21 UTC

I see... well that is annoying, but at least you made a back up :)
ID: 1694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
atotos
Avatar

Send message
Joined: 22 Oct 05
Posts: 8
Credit: 70
RAC: 0
Message 1698 - Posted: 24 Oct 2005, 21:15:45 UTC

I had 3 out of 4 WUs end in computation errors.....
ID: 1698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 1699 - Posted: 25 Oct 2005, 0:22:33 UTC

I had one stuck on 1% for an hour and a half ... suspending it and restarting it did not help. But, a full exit and restart and it went through ...

So, there is something about the start-up that is whonky ...
ID: 1699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Loads and loads of computing errors today



©2024 University of Washington
https://www.bakerlab.org