Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17026 - Posted: 24 May 2006, 21:42:38 UTC

2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169
ID: 17026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 17029 - Posted: 24 May 2006, 21:58:48 UTC - in response to Message 17020.  

Linux is fine -- keep it going, please! I'd recommend aborting the Windows one, just in case.

A short message about one workunit. It is completing
successfully on all clients. But one some
Windows machines, it appears to be grabbing and
excessive amount of memory as the workunit progresses.
If your computer is crunching a workunit
with the following name, please abort it, just in case:

t283__CASP7_MAPRELAX_...

We have canceled these jobs for now. Incidentally,
not too many of these work units have been sent
out, maybe 1000 out of the 100,000 in progress --
but we want to make sure there are no problems during CASP7.

Thanks! The TeraFLOPS estimate has exceeded 30 TFlops
thanks to your efforts and a record low error rate!


If I've got one running on a system here, but can afford the memory to keep it going, should I let it run to completion? Also, I've got another one queued on one of my Linux systems. Should I let that one run as well?


ID: 17029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 17030 - Posted: 24 May 2006, 21:59:31 UTC - in response to Message 17026.  

Jimi, if you happen to be running something
beside rosetta@home, have you noticed application crashes with other
BOINC apps that have graphics?
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169


ID: 17030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17032 - Posted: 24 May 2006, 23:56:47 UTC
Last modified: 24 May 2006, 23:57:52 UTC

Rhiju, I've had those fatal windows errors with both 5.12 and 5.16 on this machine. All errors stopped when I turned off the screensaver. See project list in Signature. No other project has caused this fatal windows error when the screensaver was running. (note: some don't have a screensaver, and the ones will zero RAC, I haven't run in a while)

hope this helps

tony



ID: 17032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Steve Shedroff

Send message
Joined: 7 Nov 05
Posts: 11
Credit: 250,657
RAC: 0
Message 17033 - Posted: 25 May 2006, 0:53:22 UTC - in response to Message 16950.  

This may be coincidence, but I just downloaded the most recent BOINC Client and all my numbers are dropping. Work per day is about 1/2 of what it was before the new client. This is true on MacX and Intel P4 systems. Is it just me?

What version of BOINC did you install?


Sorry, Client is 5.4.9 all both systems. One thing I noticed and changed after I noticed the slow down and it appears to help the Intel boxes. I turned off the screen saver. At least 2 Intel boxes were hanging on the screensaver (no Rosetta progress while screensaver was active). On both these boxes, BOINC & Rosetta has consistently run better without the screensaver mode. The new install defaulted to turning on the screensaver. numbers have been improving since this change.


ID: 17033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 17038 - Posted: 25 May 2006, 5:01:02 UTC
Last modified: 25 May 2006, 5:11:42 UTC

This computer Result ID Work unit ID

Rosetta version 5.16
BOINC version 5.4.9
OS WinXP home service pack 2

error msg:

25/05/2006 1:36:40 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1: exceeded disk limit: 103510443.000000 > 100000000.000000
25/05/2006 1:36:40 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1 (Maximum disk usage exceeded)


edit: to add more info
ID: 17038 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17040 - Posted: 25 May 2006, 6:02:20 UTC
Last modified: 25 May 2006, 6:06:16 UTC

Client errors here 17525724

and here 17359315


Screen saver never runs on this computer. It has run many successful work units before and after these errors.

ID: 17040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17043 - Posted: 25 May 2006, 7:55:22 UTC - in response to Message 17030.  

Sorry Rhiju, just saw this. Rosetta is the only BOINC program on the machine and crunching is all it does at the moment. It looks like memory instability; I lost 4 units in quick succession this morning but a reboot seems to have fixed it.

WUs were:

17959274
17975623
17976020
17981072

I've had trouble with this RAM before and managed to tweak it back into life, it seems to be going sour again. Crucial Ballistix DDR500 2x1GB, it tends to do this kind of thing. :(

Jimi, if you happen to be running something
beside rosetta@home, have you noticed application crashes with other
BOINC apps that have graphics?
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169


ID: 17043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17055 - Posted: 25 May 2006, 12:53:42 UTC
Last modified: 25 May 2006, 13:17:55 UTC

My bad: found the missing WUs, they have low ID numbers.

Still got this weird imbalance between CPU usage though - on a dual-core, instead of 50:50 it's 97:3 (as in one WU using 97% of both cores).

On other machines, RAM usage is 100MB or 235MB. This WU grabbing all the CPU is using 770MB! Is that some kind of doomsday machine? This is it -

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=17193170
ID: 17055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 17057 - Posted: 25 May 2006, 15:15:21 UTC

I finished 1 of them to.
https://boinc.bakerlab.org/rosetta/result.php?resultid=20693542

Top memory useage 382 820 kB

Top Vir. memory 800 092 kB

Anders n

ID: 17057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 17064 - Posted: 25 May 2006, 15:48:38 UTC - in response to Message 17038.  

This computer Result ID Work unit ID

Rosetta version 5.16
BOINC version 5.4.9
OS WinXP home service pack 2

error msg:

25/05/2006 1:36:40 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1: exceeded disk limit: 103510443.000000 > 100000000.000000
25/05/2006 1:36:40 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1 (Maximum disk usage exceeded)


edit: to add more info


Rhiju fixed the problem--a write statement for in house diagnostics for the new "jumping" feature. sorry!


ID: 17064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 17065 - Posted: 25 May 2006, 15:50:07 UTC - in response to Message 17038.  

(Maximum disk usage exceeded)

Got same error (Rosetta version 5.16; BOINC version 5.4.9; Linux) -- Result ID

(1) Could NOT find anywhere that an user would have set a "disk limit" of 100 (megabytes?). If this is built-in to the Rosetta software, the limit ought to be made larger. [And if the error was not caused by the user+computer, credit ought to be given.]

(2) I noticed that __no__ results were uploaded to the server for the failing WU, despite my computer having spent eight hours crunching it. My understanding is that the Rosetta software constructs MULTIPLE 'decoys' while processing a WU -- if a problem arises after eight hours of crunching, *surely* this WU might have had one or more valid 'decoys' completed previous to the crash -- they would deserve being reported.
.
ID: 17065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17068 - Posted: 25 May 2006, 16:40:16 UTC - in response to Message 17065.  
Last modified: 25 May 2006, 17:13:27 UTC

(Maximum disk usage exceeded)

Got same error (Rosetta version 5.16; BOINC version 5.4.9; Linux) -- Result ID

(1) Could NOT find anywhere that an user would have set a "disk limit" of 100 (megabytes?). If this is built-in to the Rosetta software, the limit ought to be made larger. [And if the error was not caused by the user+computer, credit ought to be given.]

(2) I noticed that __no__ results were uploaded to the server for the failing WU, despite my computer having spent eight hours crunching it. My understanding is that the Rosetta software constructs MULTIPLE 'decoys' while processing a WU -- if a problem arises after eight hours of crunching, *surely* this WU might have had one or more valid 'decoys' completed previous to the crash -- they would deserve being reported.
.


If you look at the other user that crunched that WU, they got and error, and credit was issued. Yours will be too... when they run the daily credit granting for the errored WUs. This is regardless of whether error was caused by the user (not that there is much one could do to CAUSE a failure).

Appears your post crossed in time with Dr. Baker's, but #1 was a bug where they were writting too much information to the output files, this is why these size limits are in place, they protect you from such problems. Making the limits larger would only further the problem.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
KaOh

Send message
Joined: 5 Oct 05
Posts: 4
Credit: 260,913
RAC: 6
Message 17069 - Posted: 25 May 2006, 17:05:09 UTC

https://boinc.bakerlab.org/rosetta/results.php?userid=2753
How about mine?
Almost errors.Only 1hr short jobs were normal.
ID: 17069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17070 - Posted: 25 May 2006, 17:18:17 UTC - in response to Message 17069.  

https://boinc.bakerlab.org/rosetta/results.php?userid=2753
How about mine?
Almost errors.Only 1hr short jobs were normal.

I can't link through to your results page, but it looks like all of your results with errors have already been granted credit. You are probably looking at the wrong display. When the credit is issued by the daily run they don't appear in the WU list. But if you look at a specific WU, like this one, at the bottom, you can see credit claimed, and credit granted.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 17072 - Posted: 25 May 2006, 17:54:13 UTC

Jimi wrote:

Still got this weird imbalance between CPU usage though - on a dual-core, instead of 50:50 it's 97:3 (as in one WU using 97% of both cores).

If you run a non threaded application that's setup to use 100% of one cpu.. does task manager show it using 100% of the cpu available on the machine, or just 50%? i.e. has Boinc been tricked into thinking it's on a dual core machine, but Windows is setup with the single core HAL and only using one cpu core for both instances of Boinc? (I'll go test SuperPI on my dual core system at work..) I made the mistake of upgrading my system at work to a dual core, and noticed the stats weren't what I was expecting.. and had to perform a repair install of windows to get the dual cpu HAL setup.
ID: 17072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17073 - Posted: 25 May 2006, 18:03:52 UTC
Last modified: 25 May 2006, 18:04:43 UTC

No Benny, that's down to whether you have "ACPI Multiprocessor PC" in your devices instead of "ACPI Uniprocessor PC", it makes switching between single and dual cores a hassle.

The machine eventually fell over and rebooted, the WU thought about crashing (there's the first line of BOINC debug in the result) but it picked itself up and completed normally.
ID: 17073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 17081 - Posted: 25 May 2006, 19:55:47 UTC - in response to Message 16889.  
Last modified: 25 May 2006, 19:56:35 UTC

Jose, have you tried searching for Malware/adware with Ad-ware SE, and searching for Spybots with Spybot search and destroy in addition to your virus program?? They're free.

tony


And they seem to be the ones running amok in my computer, them and my latest Office Update. I deleted almost all the applications in my computer and I am working very careful to restore them in my computer one by one. Saturday I will be asking a friend to check to see why Rosetta is only using 19% of my CPU while "Idle Stuff "( you know I am a techie by the terminology I use lol lol ) takes more than 70% of the CPU.

At least I was able to complete a WU without error . Slowly but it was completed. ( Watch me jinx the computer again)

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 17081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 17092 - Posted: 25 May 2006, 21:51:40 UTC
Last modified: 25 May 2006, 21:53:36 UTC

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 17092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 17093 - Posted: 25 May 2006, 22:19:17 UTC - in response to Message 17092.  

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker's response, some comments below, which said this workunit has issues which have now been fixed.

Regards,
Bob P.
ID: 17093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2025 University of Washington
https://www.bakerlab.org