Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 16976 - Posted: 24 May 2006, 13:34:29 UTC - in response to Message 16974.  

BAD ERROR! Boinc 5.4.9 crunching WU t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom024__528_13504_0, screensaver appeared.. suddenly windows error message appeared about Rosetta@home doing illegal operation and windows had to end this process.. "send report to microsoft? [send] [don't send]" you probably know that message.. after closing the message: boinc happily crunches another WU.. now it looks like it was normal computing error .. but it wasn't ..

I was seeing those in Ralph on one computer, but not the others. I turned off the screensaver and haven't see it again. I reported this in Ralph.
ID: 16976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 16980 - Posted: 24 May 2006, 14:25:37 UTC

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.
ID: 16980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16991 - Posted: 24 May 2006, 16:17:14 UTC - in response to Message 16980.  

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.


thanks! this will be easy to track down and fix

ID: 16991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 16992 - Posted: 24 May 2006, 16:18:59 UTC

ID: 16992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sybr_E-N

Send message
Joined: 26 Nov 05
Posts: 2
Credit: 164,851
RAC: 0
Message 17005 - Posted: 24 May 2006, 18:10:50 UTC
Last modified: 24 May 2006, 18:38:33 UTC

--sorry my mistake
ID: 17005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17008 - Posted: 24 May 2006, 18:40:27 UTC - in response to Message 17005.  

I've just finished, 5 minutes ago, and uploaded task t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0 . But I can't seem to find that work unit in my stats, nor credits are awarded ?? What happened??

The work unit did complete succesfully.

My computer ID is: 76755

The total uploaded batch was:
24/05/2006 20:10:49|rosetta@home|Started upload of file v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_6789_0_0

24/05/2006 20:10:49|rosetta@home|Started upload of file v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_6790_0_0

24/05/2006 20:10:54|rosetta@home|Started upload of file t287__CASP7_ABRELAX_SAVE_ALL_OUT_hom001__527_6789_0_0

24/05/2006 20:10:54|rosetta@home|Started upload of file t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0_0




Once the work unit is complete, it is uploaded back to the servers. That unit then becomes "ready to report". It is not uncommon for BOINC (on your PC) to not report right away and actually accumulate several results to "report" back. This could take several hours.
Once reported the work unit goes through several steps before it actually shows up as complete in your stats. If they are working on any of the servers and have one of the components down, that could also influence when you actually get the credit.
I have never seen a work unit actually "lost" but I have heard of it a LONG LONG time ago, and I would not be afraid of that. Just be patient, you will get the credit due you.

ID: 17008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 17010 - Posted: 24 May 2006, 19:02:52 UTC - in response to Message 16992.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=21413607

Maximum disk usage exceeded

Anders n


When I reported this error, it was because I had a question over how 100% Rosetta , use no more than 50% of the HD, leave at least 100 Megs of HD space would equal 100,000,000 bytes on a roughly 30 gig partition with roughly 8.5 gigs free.
It was suggested that like AMD_is_logical, my error report had grown incredibly large (in my case to over 100,000,000 bytes) and tripped this error.

Would the programmers please give the error report growing larger than 100,000,000 bytes - its own error message? Perhaps grabbing the first million characters and last million characters of the file.. so the project can see what WUs are having more problems, and what those problems are?
In my case, it was on hour 23 of my 24 hour per WU setting - so it probably isn't an error that will be seen that often. i.e. they need a cpu that cranks out models as fast or faster than mine; need a 12-24 hour per WU setting, and a WU that spits out errors right and left, like the one AMD_is_logical and Anders n are reporting.
ID: 17010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17013 - Posted: 24 May 2006, 19:27:17 UTC - in response to Message 16980.  

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.



I had the same phenomenon with this WU.

https://boinc.bakerlab.org/rosetta/result.php?resultid=21390555
ID: 17013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 17016 - Posted: 24 May 2006, 19:55:43 UTC - in response to Message 17013.  

Thanks for posting about this big file. I'm fixing this now for the new app --
I thought those print statments would not normally get triggered, but this CASP target has found a way! I'll also see if I can find a way to turn it off for
the current app and resend the workunits.

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.



I had the same phenomenon with this WU.

https://boinc.bakerlab.org/rosetta/result.php?resultid=21390555


ID: 17016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 17020 - Posted: 24 May 2006, 20:08:39 UTC - in response to Message 16939.  

A short message about one workunit. It is completing
successfully on all clients. But one some
Windows machines, it appears to be grabbing and
excessive amount of memory as the workunit progresses.
If your computer is crunching a workunit
with the following name, please abort it, just in case:

t283__CASP7_MAPRELAX_...

We have canceled these jobs for now. Incidentally,
not too many of these work units have been sent
out, maybe 1000 out of the 100,000 in progress --
but we want to make sure there are no problems during CASP7.

Thanks! The TeraFLOPS estimate has exceeded 30 TFlops
thanks to your efforts and a record low error rate!


If I've got one running on a system here, but can afford the memory to keep it going, should I let it run to completion? Also, I've got another one queued on one of my Linux systems. Should I let that one run as well?
ID: 17020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17026 - Posted: 24 May 2006, 21:42:38 UTC

2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169
ID: 17026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 17029 - Posted: 24 May 2006, 21:58:48 UTC - in response to Message 17020.  

Linux is fine -- keep it going, please! I'd recommend aborting the Windows one, just in case.

A short message about one workunit. It is completing
successfully on all clients. But one some
Windows machines, it appears to be grabbing and
excessive amount of memory as the workunit progresses.
If your computer is crunching a workunit
with the following name, please abort it, just in case:

t283__CASP7_MAPRELAX_...

We have canceled these jobs for now. Incidentally,
not too many of these work units have been sent
out, maybe 1000 out of the 100,000 in progress --
but we want to make sure there are no problems during CASP7.

Thanks! The TeraFLOPS estimate has exceeded 30 TFlops
thanks to your efforts and a record low error rate!


If I've got one running on a system here, but can afford the memory to keep it going, should I let it run to completion? Also, I've got another one queued on one of my Linux systems. Should I let that one run as well?


ID: 17029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 17030 - Posted: 24 May 2006, 21:59:31 UTC - in response to Message 17026.  

Jimi, if you happen to be running something
beside rosetta@home, have you noticed application crashes with other
BOINC apps that have graphics?
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169


ID: 17030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 17032 - Posted: 24 May 2006, 23:56:47 UTC
Last modified: 24 May 2006, 23:57:52 UTC

Rhiju, I've had those fatal windows errors with both 5.12 and 5.16 on this machine. All errors stopped when I turned off the screensaver. See project list in Signature. No other project has caused this fatal windows error when the screensaver was running. (note: some don't have a screensaver, and the ones will zero RAC, I haven't run in a while)

hope this helps

tony



ID: 17032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Steve Shedroff

Send message
Joined: 7 Nov 05
Posts: 11
Credit: 250,657
RAC: 0
Message 17033 - Posted: 25 May 2006, 0:53:22 UTC - in response to Message 16950.  

This may be coincidence, but I just downloaded the most recent BOINC Client and all my numbers are dropping. Work per day is about 1/2 of what it was before the new client. This is true on MacX and Intel P4 systems. Is it just me?

What version of BOINC did you install?


Sorry, Client is 5.4.9 all both systems. One thing I noticed and changed after I noticed the slow down and it appears to help the Intel boxes. I turned off the screen saver. At least 2 Intel boxes were hanging on the screensaver (no Rosetta progress while screensaver was active). On both these boxes, BOINC & Rosetta has consistently run better without the screensaver mode. The new install defaulted to turning on the screensaver. numbers have been improving since this change.


ID: 17033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 17034 - Posted: 25 May 2006, 1:27:44 UTC - in response to Message 17033.  

This may be coincidence, but I just downloaded the most recent BOINC Client and all my numbers are dropping. Work per day is about 1/2 of what it was before the new client. This is true on MacX and Intel P4 systems. Is it just me?

What version of BOINC did you install?


Sorry, Client is 5.4.9 all both systems. One thing I noticed and changed after I noticed the slow down and it appears to help the Intel boxes. I turned off the screen saver. At least 2 Intel boxes were hanging on the screensaver (no Rosetta progress while screensaver was active). On both these boxes, BOINC & Rosetta has consistently run better without the screensaver mode. The new install defaulted to turning on the screensaver. numbers have been improving since this change.


I have sent an e-mail to David Kim concerning the screen saver issue. He will be working specifically on this issue until it is fixed. He may post additional questions here to all of you on this point soon.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 17034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 17038 - Posted: 25 May 2006, 5:01:02 UTC
Last modified: 25 May 2006, 5:11:42 UTC

This computer Result ID Work unit ID

Rosetta version 5.16
BOINC version 5.4.9
OS WinXP home service pack 2

error msg:

25/05/2006 1:36:40 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1: exceeded disk limit: 103510443.000000 > 100000000.000000
25/05/2006 1:36:40 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1 (Maximum disk usage exceeded)


edit: to add more info
ID: 17038 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17040 - Posted: 25 May 2006, 6:02:20 UTC
Last modified: 25 May 2006, 6:06:16 UTC

Client errors here 17525724

and here 17359315


Screen saver never runs on this computer. It has run many successful work units before and after these errors.

ID: 17040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17043 - Posted: 25 May 2006, 7:55:22 UTC - in response to Message 17030.  

Sorry Rhiju, just saw this. Rosetta is the only BOINC program on the machine and crunching is all it does at the moment. It looks like memory instability; I lost 4 units in quick succession this morning but a reboot seems to have fixed it.

WUs were:

17959274
17975623
17976020
17981072

I've had trouble with this RAM before and managed to tweak it back into life, it seems to be going sour again. Crucial Ballistix DDR500 2x1GB, it tends to do this kind of thing. :(

Jimi, if you happen to be running something
beside rosetta@home, have you noticed application crashes with other
BOINC apps that have graphics?
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169


ID: 17043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 17055 - Posted: 25 May 2006, 12:53:42 UTC
Last modified: 25 May 2006, 13:17:55 UTC

My bad: found the missing WUs, they have low ID numbers.

Still got this weird imbalance between CPU usage though - on a dual-core, instead of 50:50 it's 97:3 (as in one WU using 97% of both cores).

On other machines, RAM usage is 100MB or 235MB. This WU grabbing all the CPU is using 770MB! Is that some kind of doomsday machine? This is it -

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=17193170
ID: 17055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2024 University of Washington
https://www.bakerlab.org