Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 16881 - Posted: 22 May 2006, 22:15:48 UTC - in response to Message 16873.  
Last modified: 22 May 2006, 22:22:02 UTC

What settings do you feel are not being respected by the current (improved) checkpointing?

I would interpret the setting "write to disk at most..." as: After a checkpoint wait for x seconds before a new checkpoint and then do it as soon as possible.

Norbert


I believe that's exactly what they are doing. The problem is that "as soon as possible" isn't as often as many people would like. It's only about every 20 minutes that they reach a point in the model where they can checkpoint. But it depends on the protein and the CPU. A faster CPU hits that same point much faster than a slow CPU. So, what they do is... reach a point in the model where a checkpoint COULD be made, and if more than 20 min. has gone by since the last checkpoint was made, then another is made.... which I guess is your point now that I type it. Let me see if I can restate it...

"Why use the arbitrary 20 minutes number, when the user's preference might be for write to disk every 5 minutes, and my model may be hitting a checkpointable state every 5 minutes?"

It seems like that point was brought up on Ralph. The project is under maintenance at the moment so can't post a link.

[edit] I think it boiled down to the volume of data they have to write for the checkpoint. It was like 100+MB. And if they wrote that much data every... (I think the default is) 1 min, then your "faster" computer, which is reaching checkpointable points in the model rapidly, would be spending a considerable fraction of time writing the checkpoints rather than getting work done.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16881 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 16889 - Posted: 22 May 2006, 23:19:24 UTC
Last modified: 22 May 2006, 23:19:45 UTC

Jose, have you tried searching for Malware/adware with Ad-ware SE, and searching for Spybots with Spybot search and destroy in addition to your virus program?? They're free.

tony
ID: 16889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 16904 - Posted: 23 May 2006, 10:08:16 UTC
Last modified: 23 May 2006, 10:21:24 UTC

The following WU grew steadily in memory usage up to 550 MB physical RAM and about 700 MB virtual memory (I have 1 GB RAM and 1.24 GB virtual memory on that host):

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=17772949

After three and a half hour and 26 decoys I restarted BOINC and memory usage started from 36 MB but is again growing with each completed model. Seems to me like a memory leak. Btw, I never looked on the graphics.

Edit: It seems Rosetta is no longer writing to the file stdout.txt after restarting BOINC. However it is writing to the file xxt283.out. Don't know if this means anything.
ID: 16904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 16924 - Posted: 23 May 2006, 18:05:44 UTC - in response to Message 16904.  

Tralala: Thanks for posting about this problem. I thought I had fixed this issue on this workunit, but apparently there are still problems on some clients. I am
canceling these workunits now. Aborting the jobs was the right thing to do.

The following WU grew steadily in memory usage up to 550 MB physical RAM and about 700 MB virtual memory (I have 1 GB RAM and 1.24 GB virtual memory on that host):

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=17772949

After three and a half hour and 26 decoys I restarted BOINC and memory usage started from 36 MB but is again growing with each completed model. Seems to me like a memory leak. Btw, I never looked on the graphics.

Edit: It seems Rosetta is no longer writing to the file stdout.txt after restarting BOINC. However it is writing to the file xxt283.out. Don't know if this means anything.


ID: 16924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 16928 - Posted: 23 May 2006, 18:40:31 UTC - in response to Message 16924.  

Tralala: Thanks for posting about this problem. I thought I had fixed this issue on this workunit, but apparently there are still problems on some clients. I am
canceling these workunits now. Aborting the jobs was the right thing to do.


I could abort it the soft way with lowering the run time preference, but I was afraid it would kill one of my remote hosts with only 512 MB RAM. Fortunately that was not the case.

You can safeguard against those incidents if you specify a memory bound for all WU. If the virtual memory exceeds this bound the WU gets automatically aborted.
ID: 16928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 16932 - Posted: 23 May 2006, 19:39:08 UTC

Curious behaviour.... Two work units "exited with 0" but had no finish file. They then restarted and appear to have resumed where they left off. They are still running.

Heres the log

5/23/2006 9:21:11 AM||Rescheduling CPU: application exited
5/23/2006 9:21:11 AM|rosetta@home|Computation for task u287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_nterm__522_6410_0 finished
5/23/2006 9:21:11 AM|rosetta@home|Starting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1046_0 using rosetta version 516
5/23/2006 9:21:13 AM|rosetta@home|Started upload of file u287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_nterm__522_6410_0_0
5/23/2006 9:21:19 AM|rosetta@home|Finished upload of file u287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_nterm__522_6410_0_0
5/23/2006 9:21:19 AM|rosetta@home|Throughput 29328 bytes/sec
5/23/2006 9:21:24 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
5/23/2006 9:21:24 AM|rosetta@home|Reason: To report completed tasks
5/23/2006 9:21:24 AM|rosetta@home|Reporting 1 tasks
5/23/2006 9:21:29 AM|rosetta@home|Scheduler request succeeded
5/23/2006 10:22:32 AM||Rescheduling CPU: application exited
5/23/2006 10:22:32 AM|rosetta@home|Computation for task b287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_truncate__522_6500_0 finished
5/23/2006 10:22:32 AM|rosetta@home|Starting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1041_0 using rosetta version 516
5/23/2006 10:22:34 AM|rosetta@home|Started upload of file b287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_truncate__522_6500_0_0
5/23/2006 10:22:40 AM|rosetta@home|Finished upload of file b287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_truncate__522_6500_0_0
5/23/2006 10:22:40 AM|rosetta@home|Throughput 28853 bytes/sec
5/23/2006 10:22:45 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
5/23/2006 10:22:45 AM|rosetta@home|Reason: To report completed tasks
5/23/2006 10:22:45 AM|rosetta@home|Reporting 1 tasks
5/23/2006 10:22:50 AM|rosetta@home|Scheduler request succeeded
5/23/2006 11:04:46 AM|rosetta@home|Task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1046_0 exited with zero status but no 'finished' file
5/23/2006 11:04:46 AM|rosetta@home|If this happens repeatedly you may need to reset the project.
5/23/2006 11:04:46 AM||Rescheduling CPU: application exited
5/23/2006 11:04:46 AM|rosetta@home|Task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1041_0 exited with zero status but no 'finished' file
5/23/2006 11:04:46 AM|rosetta@home|If this happens repeatedly you may need to reset the project.
5/23/2006 11:04:46 AM||Rescheduling CPU: application exited
5/23/2006 11:04:46 AM|rosetta@home|Restarting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1046_0 using rosetta version 516
5/23/2006 11:04:46 AM|rosetta@home|Restarting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1041_0 using rosetta version 516

ID: 16932 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 16941 - Posted: 24 May 2006, 0:36:22 UTC - in response to Message 16863.  
Last modified: 24 May 2006, 0:39:34 UTC

[quote]LINUX problem:
I need help with this problem: while running Rosetta on Linux server with PentiumIV HyperThreading processor, Rosetta occasionally hangs in a very strange state: everything is running except Rosetta. Boinc is running. Application on other thread (Simap@home) is running. Just Rosetta isn't.


I had encountered this particular issue back in Jan/Feb-06 (also under Linux). Overall about 5-6 times.



I have been having the same problems with my Dell SC420... The Rosetta application just sleeps (watching in BoincView I have a 0.00 cpu efficiency, and when I "top" I see the Rosetta apps in memory, but 0% cpu usage). I have gotten this quite frequently while running Rosetta on a Linux box, even through all the different versionings. Any ideas/suggestions from the Mods, Testers or Dev's?


OK, it looks to be the same issue. Rosetta "frozen" (SN=Sleeping,Nice and consuming 0% CPU) although BOINC thinks it's running. Also, for some reason BOINC won't pre-empt Rosetta after say 1hr, so effectively the whole DC queue is stuck.

I see (e.g. here) you've encountered Rosetta "hangs" recently under Linux using BOINC v5.4.9 (as I see you're using now), we can rule out the BOINC v5.2.14 possibility. Also you have a different kernel 2.6.x (both myself and Aglarond had kernel 2.4.x and BOINC v5.2.14), so we can rule that out too.

Although I reiterate that my Linux box that had this issue has been running smoothly for over 3 months, 24/7, crunching 90% Rosetta/Ralph, not a single "hung" instance. I thought it was an some odd issue that was "solved" by re-compiling R with new BOINC API, but apparently you guys still have it...

Maybe do some thinking about SIGSEGV and SIGABRT:

SIGSEGV: segmentation violationStack trace (11 frames):
[0x882fbb3]
Exiting...
SIGABRT: abort calledStack trace (18 frames):
[0x882fbb3]
https://boinc.bakerlab.org/rosetta/result.php?resultid=20134206
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 16941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Steve Shedroff

Send message
Joined: 7 Nov 05
Posts: 11
Credit: 250,657
RAC: 0
Message 16946 - Posted: 24 May 2006, 2:33:49 UTC

This may be coincidence, but I just downloaded the most recent BOINC Client and all my numbers are dropping. Work per day is about 1/2 of what it was before the new client. This is true on MacX and Intel P4 systems. Is it just me?
ID: 16946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 16955 - Posted: 24 May 2006, 5:55:22 UTC - in response to Message 16941.  

[quote][quote]LINUX problem:
I need help with this problem: while running Rosetta on Linux server with PentiumIV HyperThreading processor, Rosetta occasionally hangs in a very strange state: everything is running except Rosetta. Boinc is running. Application on other thread (Simap@home) is running. Just Rosetta isn't.


I had encountered this particular issue back in Jan/Feb-06 (also under Linux). Overall about 5-6 times.



I have been having the same problems with my Dell SC420... The Rosetta application just sleeps (watching in BoincView I have a 0.00 cpu efficiency, and when I "top" I see the Rosetta apps in memory, but 0% cpu usage). I have gotten this quite frequently while running Rosetta on a Linux box, even through all the different versionings. Any ideas/suggestions from the Mods, Testers or Dev's?


OK, it looks to be the same issue. Rosetta "frozen" (SN=Sleeping,Nice and consuming 0% CPU) although BOINC thinks it's running. Also, for some reason BOINC won't pre-empt Rosetta after say 1hr, so effectively the whole DC queue is stuck.

-----------------------------------------------------------------------------

I am not sure but I may have a similiar problem. Once in awhile I will leave my computer running for a few consecutive hours. When I come back, it seems that BOINC got stuck and stranded a workunit at "100% ready to report" status. If I hit the update button under the projects tab, it sends the workunit and simultaneously downloads another one. Why would ite get stuck like that? I am running BOINC 4.45.

ID: 16955 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aglarond

Send message
Joined: 29 Jan 06
Posts: 26
Credit: 446,212
RAC: 0
Message 16974 - Posted: 24 May 2006, 13:09:42 UTC

BAD ERROR! Boinc 5.4.9 crunching WU t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom024__528_13504_0, screensaver appeared.. suddenly windows error message appeared about Rosetta@home doing illegal operation and windows had to end this process.. "send report to microsoft? [send] [don't send]" you probably know that message.. after closing the message: boinc happily crunches another WU.. now it looks like it was normal computing error .. but it wasn't ..
ID: 16974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 16976 - Posted: 24 May 2006, 13:34:29 UTC - in response to Message 16974.  

BAD ERROR! Boinc 5.4.9 crunching WU t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom024__528_13504_0, screensaver appeared.. suddenly windows error message appeared about Rosetta@home doing illegal operation and windows had to end this process.. "send report to microsoft? [send] [don't send]" you probably know that message.. after closing the message: boinc happily crunches another WU.. now it looks like it was normal computing error .. but it wasn't ..

I was seeing those in Ralph on one computer, but not the others. I turned off the screensaver and haven't see it again. I reported this in Ralph.
ID: 16976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 16980 - Posted: 24 May 2006, 14:25:37 UTC

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.
ID: 16980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16991 - Posted: 24 May 2006, 16:17:14 UTC - in response to Message 16980.  

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.


thanks! this will be easy to track down and fix

ID: 16991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 16992 - Posted: 24 May 2006, 16:18:59 UTC

ID: 16992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sybr_E-N

Send message
Joined: 26 Nov 05
Posts: 2
Credit: 164,851
RAC: 0
Message 17005 - Posted: 24 May 2006, 18:10:50 UTC
Last modified: 24 May 2006, 18:38:33 UTC

--sorry my mistake
ID: 17005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 17008 - Posted: 24 May 2006, 18:40:27 UTC - in response to Message 17005.  

I've just finished, 5 minutes ago, and uploaded task t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0 . But I can't seem to find that work unit in my stats, nor credits are awarded ?? What happened??

The work unit did complete succesfully.

My computer ID is: 76755

The total uploaded batch was:
24/05/2006 20:10:49|rosetta@home|Started upload of file v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_6789_0_0

24/05/2006 20:10:49|rosetta@home|Started upload of file v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_6790_0_0

24/05/2006 20:10:54|rosetta@home|Started upload of file t287__CASP7_ABRELAX_SAVE_ALL_OUT_hom001__527_6789_0_0

24/05/2006 20:10:54|rosetta@home|Started upload of file t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0_0




Once the work unit is complete, it is uploaded back to the servers. That unit then becomes "ready to report". It is not uncommon for BOINC (on your PC) to not report right away and actually accumulate several results to "report" back. This could take several hours.
Once reported the work unit goes through several steps before it actually shows up as complete in your stats. If they are working on any of the servers and have one of the components down, that could also influence when you actually get the credit.
I have never seen a work unit actually "lost" but I have heard of it a LONG LONG time ago, and I would not be afraid of that. Just be patient, you will get the credit due you.

ID: 17008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 17010 - Posted: 24 May 2006, 19:02:52 UTC - in response to Message 16992.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=21413607

Maximum disk usage exceeded

Anders n


When I reported this error, it was because I had a question over how 100% Rosetta , use no more than 50% of the HD, leave at least 100 Megs of HD space would equal 100,000,000 bytes on a roughly 30 gig partition with roughly 8.5 gigs free.
It was suggested that like AMD_is_logical, my error report had grown incredibly large (in my case to over 100,000,000 bytes) and tripped this error.

Would the programmers please give the error report growing larger than 100,000,000 bytes - its own error message? Perhaps grabbing the first million characters and last million characters of the file.. so the project can see what WUs are having more problems, and what those problems are?
In my case, it was on hour 23 of my 24 hour per WU setting - so it probably isn't an error that will be seen that often. i.e. they need a cpu that cranks out models as fast or faster than mine; need a 12-24 hour per WU setting, and a WU that spits out errors right and left, like the one AMD_is_logical and Anders n are reporting.
ID: 17010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17013 - Posted: 24 May 2006, 19:27:17 UTC - in response to Message 16980.  

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.



I had the same phenomenon with this WU.

https://boinc.bakerlab.org/rosetta/result.php?resultid=21390555
ID: 17013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 17016 - Posted: 24 May 2006, 19:55:43 UTC - in response to Message 17013.  

Thanks for posting about this big file. I'm fixing this now for the new app --
I thought those print statments would not normally get triggered, but this CASP target has found a way! I'll also see if I can find a way to turn it off for
the current app and resend the workunits.

WU: JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_2909_0
result: https://boinc.bakerlab.org/rosetta/result.php?resultid=21309179

This WU looks like it finished normally with no problem, but while it was crunching I looked at the stdout.txt file and it was 74,807,795 Bytes. It was mostly filled with lines like:

...
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
set_phi:: move not allowed: 88
set_psi:: move not allowed: 88
set_omega:: move not allowed: 88
...

And so on. On rare occasion there would be a line like:

scored_frag_close( 80 91 100 240 ) cycle= 100 total-closed-frags: 11 0.00487611 -61.1343

thrown in. After great stretches of this there would be some normal looking stuff, then there would be another great stretch of "move not allowed" messages.



I had the same phenomenon with this WU.

https://boinc.bakerlab.org/rosetta/result.php?resultid=21390555


ID: 17016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 17020 - Posted: 24 May 2006, 20:08:39 UTC - in response to Message 16939.  

A short message about one workunit. It is completing
successfully on all clients. But one some
Windows machines, it appears to be grabbing and
excessive amount of memory as the workunit progresses.
If your computer is crunching a workunit
with the following name, please abort it, just in case:

t283__CASP7_MAPRELAX_...

We have canceled these jobs for now. Incidentally,
not too many of these work units have been sent
out, maybe 1000 out of the 100,000 in progress --
but we want to make sure there are no problems during CASP7.

Thanks! The TeraFLOPS estimate has exceeded 30 TFlops
thanks to your efforts and a record low error rate!


If I've got one running on a system here, but can afford the memory to keep it going, should I let it run to completion? Also, I've got another one queued on one of my Linux systems. Should I let that one run as well?
ID: 17020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2025 University of Washington
https://www.bakerlab.org