Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 17057 - Posted: 25 May 2006, 15:15:21 UTC

I finished 1 of them to.
https://boinc.bakerlab.org/rosetta/result.php?resultid=20693542

Top memory useage 382 820 kB

Top Vir. memory 800 092 kB

Anders n

ID: 17057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 17064 - Posted: 25 May 2006, 15:48:38 UTC - in response to Message 17038.  

This computer Result ID Work unit ID

Rosetta version 5.16
BOINC version 5.4.9
OS WinXP home service pack 2

error msg:

25/05/2006 1:36:40 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1: exceeded disk limit: 103510443.000000 > 100000000.000000
25/05/2006 1:36:40 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1 (Maximum disk usage exceeded)


edit: to add more info


Rhiju fixed the problem--a write statement for in house diagnostics for the new "jumping" feature. sorry!


ID: 17064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 17065 - Posted: 25 May 2006, 15:50:07 UTC - in response to Message 17038.  

(Maximum disk usage exceeded)

Got same error (Rosetta version 5.16; BOINC version 5.4.9; Linux) -- Result ID

(1) Could NOT find anywhere that an user would have set a "disk limit" of 100 (megabytes?). If this is built-in to the Rosetta software, the limit ought to be made larger. [And if the error was not caused by the user+computer, credit ought to be given.]

(2) I noticed that __no__ results were uploaded to the server for the failing WU, despite my computer having spent eight hours crunching it. My understanding is that the Rosetta software constructs MULTIPLE 'decoys' while processing a WU -- if a problem arises after eight hours of crunching, *surely* this WU might have had one or more valid 'decoys' completed previous to the crash -- they would deserve being reported.
.
ID: 17065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17068 - Posted: 25 May 2006, 16:40:16 UTC - in response to Message 17065.  
Last modified: 25 May 2006, 17:13:27 UTC

(Maximum disk usage exceeded)

Got same error (Rosetta version 5.16; BOINC version 5.4.9; Linux) -- Result ID

(1) Could NOT find anywhere that an user would have set a "disk limit" of 100 (megabytes?). If this is built-in to the Rosetta software, the limit ought to be made larger. [And if the error was not caused by the user+computer, credit ought to be given.]

(2) I noticed that __no__ results were uploaded to the server for the failing WU, despite my computer having spent eight hours crunching it. My understanding is that the Rosetta software constructs MULTIPLE 'decoys' while processing a WU -- if a problem arises after eight hours of crunching, *surely* this WU might have had one or more valid 'decoys' completed previous to the crash -- they would deserve being reported.
.


If you look at the other user that crunched that WU, they got and error, and credit was issued. Yours will be too... when they run the daily credit granting for the errored WUs. This is regardless of whether error was caused by the user (not that there is much one could do to CAUSE a failure).

Appears your post crossed in time with Dr. Baker's, but #1 was a bug where they were writting too much information to the output files, this is why these size limits are in place, they protect you from such problems. Making the limits larger would only further the problem.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
KaOh

Send message
Joined: 5 Oct 05
Posts: 4
Credit: 259,829
RAC: 0
Message 17069 - Posted: 25 May 2006, 17:05:09 UTC

https://boinc.bakerlab.org/rosetta/results.php?userid=2753
How about mine?
Almost errors.Only 1hr short jobs were normal.
ID: 17069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17070 - Posted: 25 May 2006, 17:18:17 UTC - in response to Message 17069.  

https://boinc.bakerlab.org/rosetta/results.php?userid=2753
How about mine?
Almost errors.Only 1hr short jobs were normal.

I can't link through to your results page, but it looks like all of your results with errors have already been granted credit. You are probably looking at the wrong display. When the credit is issued by the daily run they don't appear in the WU list. But if you look at a specific WU, like this one, at the bottom, you can see credit claimed, and credit granted.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 17072 - Posted: 25 May 2006, 17:54:13 UTC

Jimi wrote:

Still got this weird imbalance between CPU usage though - on a dual-core, instead of 50:50 it's 97:3 (as in one WU using 97% of both cores).

If you run a non threaded application that's setup to use 100% of one cpu.. does task manager show it using 100% of the cpu available on the machine, or just 50%? i.e. has Boinc been tricked into thinking it's on a dual core machine, but Windows is setup with the single core HAL and only using one cpu core for both instances of Boinc? (I'll go test SuperPI on my dual core system at work..) I made the mistake of upgrading my system at work to a dual core, and noticed the stats weren't what I was expecting.. and had to perform a repair install of windows to get the dual cpu HAL setup.
ID: 17072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17073 - Posted: 25 May 2006, 18:03:52 UTC
Last modified: 25 May 2006, 18:04:43 UTC

No Benny, that's down to whether you have "ACPI Multiprocessor PC" in your devices instead of "ACPI Uniprocessor PC", it makes switching between single and dual cores a hassle.

The machine eventually fell over and rebooted, the WU thought about crashing (there's the first line of BOINC debug in the result) but it picked itself up and completed normally.
ID: 17073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 17081 - Posted: 25 May 2006, 19:55:47 UTC - in response to Message 16889.  
Last modified: 25 May 2006, 19:56:35 UTC

Jose, have you tried searching for Malware/adware with Ad-ware SE, and searching for Spybots with Spybot search and destroy in addition to your virus program?? They're free.

tony


And they seem to be the ones running amok in my computer, them and my latest Office Update. I deleted almost all the applications in my computer and I am working very careful to restore them in my computer one by one. Saturday I will be asking a friend to check to see why Rosetta is only using 19% of my CPU while "Idle Stuff "( you know I am a techie by the terminology I use lol lol ) takes more than 70% of the CPU.

At least I was able to complete a WU without error . Slowly but it was completed. ( Watch me jinx the computer again)

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 17081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 17092 - Posted: 25 May 2006, 21:51:40 UTC
Last modified: 25 May 2006, 21:53:36 UTC

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 17092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 17093 - Posted: 25 May 2006, 22:19:17 UTC - in response to Message 17092.  

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker's response, some comments below, which said this workunit has issues which have now been fixed.

Regards,
Bob P.
ID: 17093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 17111 - Posted: 26 May 2006, 5:39:59 UTC - in response to Message 17093.  

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker's response, some comments below, which said this workunit has issues which have now been fixed.


it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.
ID: 17111 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alain Maes

Send message
Joined: 23 Nov 05
Posts: 3
Credit: 3,925,328
RAC: 1,013
Message 17124 - Posted: 26 May 2006, 12:07:48 UTC

Workunit https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18060667 got stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour. Good that I catched it early. Workunit aborted. First time ever I had to do this for Rosetta.

Kind regards

Alain
Kind regards

Alain

ID: 17124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 17133 - Posted: 26 May 2006, 14:28:02 UTC - in response to Message 17124.  
Last modified: 26 May 2006, 14:28:51 UTC

Workunit https://boinc.bakerlab.org/rosetta/workunit.php?wuid=18060667 got stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour. Good that I catched it early. Workunit aborted. First time ever I had to do this for Rosetta.

Kind regards

Alain

Some workunits take a long time (over an hour) to reach their first structure, and this is normal behavior. They stay at 1.041% the whole time, until they reach their first structure.

So this work unit you aborted was likely OK.

There is a "watchdog" in the code that deletes work units that are truly stuck for a long time. I generally wait at least 4-5 hours before I start to get concerned.
Regards,
Bob P.
ID: 17133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17136 - Posted: 26 May 2006, 14:44:12 UTC - in response to Message 17124.  

...stuck at 1.041% after 08:55 minutes,blocking BOINC 5.4.9 for almost an hour.

Can you explain what you mean by that? Do you mean that the WU was in a "running" status for an hour, but only shows 8:55min. of CPU time?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17136 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Buffalo Bill
Avatar

Send message
Joined: 25 Mar 06
Posts: 71
Credit: 1,630,458
RAC: 0
Message 17146 - Posted: 26 May 2006, 16:50:36 UTC
Last modified: 26 May 2006, 16:52:06 UTC

Errored out. Max. disk useage exceeded.

JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_1618_0

http://www.boinc.bakerlab.org/rosetta/result.php?resultid=21291895
ID: 17146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17147 - Posted: 26 May 2006, 16:56:12 UTC - in response to Message 17146.  

Errored out. Max. disk useage exceeded.

JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_1618_0

http://www.boinc.bakerlab.org/rosetta/result.php?resultid=21291895


Can anyone identify the units that will cause this problem so I may delete them from my queue?

ID: 17147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 17148 - Posted: 26 May 2006, 17:00:37 UTC
Last modified: 26 May 2006, 17:03:49 UTC

Failure:
5/25/2006 8:42:49 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_8465_0: exceeded disk limit: 139256061.000000 > 100000000.000000
5/25/2006 8:42:49 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_8465_0 (Maximum disk usage exceeded)

I crunch a 24hr time preference. Looks like it got about 15hrs in to it.

Seeing a trend here?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 17148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17153 - Posted: 26 May 2006, 17:31:02 UTC - in response to Message 17111.  

Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

https://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker's response, some comments below, which said this workunit has issues which have now been fixed.


it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.



If this is true, I don't understand why 5.16 is still being run on Ralph.
ID: 17153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 17155 - Posted: 26 May 2006, 17:36:06 UTC - in response to Message 17153.  

[quote][quote]Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.



If this is true, I don't understand why 5.16 is still being run on Ralph.

I read this as the code being included within the workunit itself (the JUMP_ALLBARCODE_t285__SAVE_ALL_OUT), and not within the 5.16 application, but maybe I misread this.

Regards,
Bob P.
ID: 17155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2024 University of Washington
https://www.bakerlab.org