Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next
Author | Message |
---|---|
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
BAD ERROR! Boinc 5.4.9 crunching WU t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom024__528_13504_0, screensaver appeared.. suddenly windows error message appeared about Rosetta@home doing illegal operation and windows had to end this process.. "send report to microsoft? [send] [don't send]" you probably know that message.. after closing the message: boinc happily crunches another WU.. now it looks like it was normal computing error .. but it wasn't .. I was seeing those in Ralph on one computer, but not the others. I turned off the screensaver and haven't see it again. I reported this in Ralph. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0 result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179 This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like: ... set_phi:: move not allowed: 88 set_psi:: move not allowed: 88 set_omega:: move not allowed: 88 set_phi:: move not allowed: 88 set_psi:: move not allowed: 88 set_omega:: move not allowed: 88 set_phi:: move not allowed: 88 set_psi:: move not allowed: 88 set_omega:: move not allowed: 88 set_phi:: move not allowed: 88 set_psi:: move not allowed: 88 set_omega:: move not allowed: 88 ... And so on. On rare occasion there would be a line like: scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343 thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages. |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0 thanks! this will be easy to track down and fix |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=21413607 Maximum disk usage exceeded Anders n |
Sybr_E-N Send message Joined: 26 Nov 05 Posts: 2 Credit: 164,851 RAC: 0 |
--sorry my mistake |
Mike Gelvin Send message Joined: 7 Oct 05 Posts: 65 Credit: 10,612,039 RAC: 0 |
I've just finished, 5 minutes ago, and uploaded task t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0 . But I can't seem to find that work unit in my stats, nor credits are awarded ?? What happened?? Once the work unit is complete, it is uploaded back to the servers. That unit then becomes "ready to report". It is not uncommon for BOINC (on your PC) to not report right away and actually accumulate several results to "report" back. This could take several hours. Once reported the work unit goes through several steps before it actually shows up as complete in your stats. If they are working on any of the servers and have one of the components down, that could also influence when you actually get the credit. I have never seen a work unit actually "lost" but I have heard of it a LONG LONG time ago, and I would not be afraid of that. Just be patient, you will get the credit due you. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=21413607 When I reported this error, it was because I had a question over how 100% Rosetta , use no more than 50% of the HD, leave at least 100 Megs of HD space would equal 100,000,000 bytes on a roughly 30 gig partition with roughly 8.5 gigs free. It was suggested that like AMD_is_logical, my error report had grown incredibly large (in my case to over 100,000,000 bytes) and tripped this error. Would the programmers please give the error report growing larger than 100,000,000 bytes - its own error message? Perhaps grabbing the first million characters and last million characters of the file.. so the project can see what WUs are having more problems, and what those problems are? In my case, it was on hour 23 of my 24 hour per WU setting - so it probably isn't an error that will be seen that often. i.e. they need a cpu that cranks out models as fast or faster than mine; need a 12-24 hour per WU setting, and a WU that spits out errors right and left, like the one AMD_is_logical and Anders n are reporting. |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0 I had the same phenomenon with this WU. https://boinc.bakerlab.org/rosetta/result.php?resultid=21390555 |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Thanks for posting about this big file. I'm fixing this now for the new app -- I thought those print statments would not normally get triggered, but this CASP target has found a way! I'll also see if I can find a way to turn it off for the current app and resend the workunits. WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0 |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
A short message about one workunit. It is completing If I've got one running on a system here, but can afford the memory to keep it going, should I let it run to completion? Also, I've got another one queued on one of my Linux systems. Should I let that one run as well? |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs: 17905138 17921169 |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Linux is fine -- keep it going, please! I'd recommend aborting the Windows one, just in case. A short message about one workunit. It is completing |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Jimi, if you happen to be running something beside rosetta@home, have you noticed application crashes with other BOINC apps that have graphics? 2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs: |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Rhiju, I've had those fatal windows errors with both 5.12 and 5.16 on this machine. All errors stopped when I turned off the screensaver. See project list in Signature. No other project has caused this fatal windows error when the screensaver was running. (note: some don't have a screensaver, and the ones will zero RAC, I haven't run in a while) hope this helps tony |
Steve Shedroff Send message Joined: 7 Nov 05 Posts: 11 Credit: 250,657 RAC: 0 |
This may be coincidence, but I just downloaded the most recent BOINC Client and all my numbers are dropping. Work per day is about 1/2 of what it was before the new client. This is true on MacX and Intel P4 systems. Is it just me? Sorry, Client is 5.4.9 all both systems. One thing I noticed and changed after I noticed the slow down and it appears to help the Intel boxes. I turned off the screen saver. At least 2 Intel boxes were hanging on the screensaver (no Rosetta progress while screensaver was active). On both these boxes, BOINC & Rosetta has consistently run better without the screensaver mode. The new install defaulted to turning on the screensaver. numbers have been improving since this change. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
This may be coincidence, but I just downloaded the most recent BOINC Client and all my numbers are dropping. Work per day is about 1/2 of what it was before the new client. This is true on MacX and Intel P4 systems. Is it just me? I have sent an e-mail to David Kim concerning the screen saver issue. He will be working specifically on this issue until it is fixed. He may post additional questions here to all of you on this point soon. Moderator9 ROSETTA@home FAQ Moderator Contact |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
This computer Result ID Work unit ID Rosetta version 5.16 BOINC version 5.4.9 OS WinXP home service pack 2 error msg: 25/05/2006 1:36:40 PM|rosetta@home|Aborting task JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1: exceeded disk limit: 103510443.000000 > 100000000.000000 25/05/2006 1:36:40 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_9568_1 (Maximum disk usage exceeded) edit: to add more info |
Mike Gelvin Send message Joined: 7 Oct 05 Posts: 65 Credit: 10,612,039 RAC: 0 |
|
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
Sorry Rhiju, just saw this. Rosetta is the only BOINC program on the machine and crunching is all it does at the moment. It looks like memory instability; I lost 4 units in quick succession this morning but a reboot seems to have fixed it. WUs were: 17959274 17975623 17976020 17981072 I've had trouble with this RAM before and managed to tweak it back into life, it seems to be going sour again. Crucial Ballistix DDR500 2x1GB, it tends to do this kind of thing. :( Jimi, if you happen to be running something |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
My bad: found the missing WUs, they have low ID numbers. Still got this weird imbalance between CPU usage though - on a dual-core, instead of 50:50 it's 97:3 (as in one WU using 97% of both cores). On other machines, RAM usage is 100MB or 235MB. This WU grabbing all the CPU is using 770MB! Is that some kind of doomsday machine? This is it - https://boinc.bakerlab.org/rosetta/workunit.php?wuid=17193170 |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.16 I
©2024 University of Washington
https://www.bakerlab.org