Posts by Mike Gelvin

21) Message boards : Number crunching : Report Problems with Rosetta Version 5.24 (Message 19147)
Posted 23 Jun 2006 by Mike Gelvin
Post:
Bombed out due to high disk usage

25277738
22) Message boards : Number crunching : Merge Computers (Message 18471)
Posted 11 Jun 2006 by Mike Gelvin
Post:
Newbie, just connected one machine to Rosetta for the first time.
During the initial download received the following error:
\"2006-06-11 10:32:16 [http://boinc.bakerlab.org/rosetta/] Scheduler request failed: Transferred a partial file\"
It retried after one minute then downloaded 10 Tasks no problem. However, when looking at My Computers I saw the same machine listed twice, each with 10 results, so I merged them.

Now I have one machine listed alright, but it shows 20 results when in fact I only have 10.

Probably not a problem except that 10 results will eventually time out and have to be resent. But, now I\'m a little gun-shy about joint 3 more machines as I had planned.

Any comments? Thanks!


Proceed and Enjoy!

The partial file is not a problem BOINC takes care of those things nicely. The multi machine thing was probably you getting used to how to do it. Im guessing after you do it a few times, there will be no problems.
23) Message boards : Number crunching : Merge Computers (Message 18388)
Posted 10 Jun 2006 by Mike Gelvin
Post:
Thank you very much!
24) Message boards : Rosetta@home Science : Abort t283? (Message 17548)
Posted 2 Jun 2006 by Mike Gelvin
Post:
I suspect they want us to crunch them all. If the structure is found, even after the deadline, that should be useful to them.
25) Message boards : Number crunching : Merge Computers (Message 17332)
Posted 30 May 2006 by Mike Gelvin
Post:
Not until we update the MySQL server on the main server. We\'re waiting to see how RALPH responds to the updates before screwin\' around w/ R@H. -KEL


Thank you for your reply.
26) Message boards : Number crunching : Merge Computers (Message 17329)
Posted 30 May 2006 by Mike Gelvin
Post:
Bump... No replies? Ralph has this feature!
27) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17153)
Posted 26 May 2006 by Mike Gelvin
Post:
Wow: 5/24/2006 9:49:03 PM Aborting result JUMP_ALLBARCODE_t285__SAVE_ALL_OUT_530_16143_0: exceeded disk limit: 120577226.000000 > 100000000.000000

http://boinc.bakerlab.org/rosetta/result.php?resultid=21426436

Please refer to Dr. Baker\'s response, some comments below, which said this workunit has issues which have now been fixed.


it was a write statement for in house diagnostics that Rhiju removed from teh code today and the updated version (with some other improvements) is currently being tested on ralph. we will send out the updated version after the ralph results are back (we are trying to be as cautious as possible!) in the next day or two.



If this is true, I don\'t understand why 5.16 is still being run on Ralph.
28) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17147)
Posted 26 May 2006 by Mike Gelvin
Post:
Errored out. Max. disk useage exceeded.

JUMP_RELAX_ALLBARCODE_t285__SAVE_ALL_OUT_530_1618_0

http://www.boinc.bakerlab.org/rosetta/result.php?resultid=21291895


Can anyone identify the units that will cause this problem so I may delete them from my queue?
29) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17040)
Posted 25 May 2006 by Mike Gelvin
Post:
Client errors here 17525724

and here 17359315


Screen saver never runs on this computer. It has run many successful work units before and after these errors.
30) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17008)
Posted 24 May 2006 by Mike Gelvin
Post:
I\'ve just finished, 5 minutes ago, and uploaded task t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0 . But I can\'t seem to find that work unit in my stats, nor credits are awarded ?? What happened??

The work unit did complete succesfully.

My computer ID is: 76755

The total uploaded batch was:
24/05/2006 20:10:49|rosetta@home|Started upload of file v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_6789_0_0

24/05/2006 20:10:49|rosetta@home|Started upload of file v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_6790_0_0

24/05/2006 20:10:54|rosetta@home|Started upload of file t287__CASP7_ABRELAX_SAVE_ALL_OUT_hom001__527_6789_0_0

24/05/2006 20:10:54|rosetta@home|Started upload of file t283__CASP7_ABRELAX_SAVE_ALL_OUT_hom008__528_3596_0_0




Once the work unit is complete, it is uploaded back to the servers. That unit then becomes \"ready to report\". It is not uncommon for BOINC (on your PC) to not report right away and actually accumulate several results to \"report\" back. This could take several hours.
Once reported the work unit goes through several steps before it actually shows up as complete in your stats. If they are working on any of the servers and have one of the components down, that could also influence when you actually get the credit.
I have never seen a work unit actually \"lost\" but I have heard of it a LONG LONG time ago, and I would not be afraid of that. Just be patient, you will get the credit due you.
31) Message boards : Rosetta@home Science : Abort t283? (Message 16944)
Posted 24 May 2006 by Mike Gelvin
Post:
We will submit these lowest energy structures for T283 this week


Does this mean t283 is done, and we should abort the remaining workunits on t283 to move on to the other problems?
32) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 16932)
Posted 23 May 2006 by Mike Gelvin
Post:
Curious behaviour.... Two work units \"exited with 0\" but had no finish file. They then restarted and appear to have resumed where they left off. They are still running.

Heres the log

5/23/2006 9:21:11 AM||Rescheduling CPU: application exited
5/23/2006 9:21:11 AM|rosetta@home|Computation for task u287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_nterm__522_6410_0 finished
5/23/2006 9:21:11 AM|rosetta@home|Starting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1046_0 using rosetta version 516
5/23/2006 9:21:13 AM|rosetta@home|Started upload of file u287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_nterm__522_6410_0_0
5/23/2006 9:21:19 AM|rosetta@home|Finished upload of file u287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_nterm__522_6410_0_0
5/23/2006 9:21:19 AM|rosetta@home|Throughput 29328 bytes/sec
5/23/2006 9:21:24 AM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
5/23/2006 9:21:24 AM|rosetta@home|Reason: To report completed tasks
5/23/2006 9:21:24 AM|rosetta@home|Reporting 1 tasks
5/23/2006 9:21:29 AM|rosetta@home|Scheduler request succeeded
5/23/2006 10:22:32 AM||Rescheduling CPU: application exited
5/23/2006 10:22:32 AM|rosetta@home|Computation for task b287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_truncate__522_6500_0 finished
5/23/2006 10:22:32 AM|rosetta@home|Starting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1041_0 using rosetta version 516
5/23/2006 10:22:34 AM|rosetta@home|Started upload of file b287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_truncate__522_6500_0_0
5/23/2006 10:22:40 AM|rosetta@home|Finished upload of file b287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_truncate__522_6500_0_0
5/23/2006 10:22:40 AM|rosetta@home|Throughput 28853 bytes/sec
5/23/2006 10:22:45 AM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
5/23/2006 10:22:45 AM|rosetta@home|Reason: To report completed tasks
5/23/2006 10:22:45 AM|rosetta@home|Reporting 1 tasks
5/23/2006 10:22:50 AM|rosetta@home|Scheduler request succeeded
5/23/2006 11:04:46 AM|rosetta@home|Task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1046_0 exited with zero status but no \'finished\' file
5/23/2006 11:04:46 AM|rosetta@home|If this happens repeatedly you may need to reset the project.
5/23/2006 11:04:46 AM||Rescheduling CPU: application exited
5/23/2006 11:04:46 AM|rosetta@home|Task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1041_0 exited with zero status but no \'finished\' file
5/23/2006 11:04:46 AM|rosetta@home|If this happens repeatedly you may need to reset the project.
5/23/2006 11:04:46 AM||Rescheduling CPU: application exited
5/23/2006 11:04:46 AM|rosetta@home|Restarting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1046_0 using rosetta version 516
5/23/2006 11:04:46 AM|rosetta@home|Restarting task v287__CASP7_ABRELAX_SAVE_ALL_OUT_cterm__527_1041_0 using rosetta version 516
33) Message boards : Number crunching : Merge Computers (Message 16896)
Posted 23 May 2006 by Mike Gelvin
Post:
Bump
34) Message boards : Number crunching : Default Run Time (Message 16870)
Posted 22 May 2006 by Mike Gelvin
Post:
Without wanting to drag this thread off-topic, regarding the cache-size/FSB/CPU core etc... effects on rosetta:

is it possible to run the same job, with the same seed on two computers to get a comparison? If so, how is this done?

I just had a long exchange with Rhiju on this subject a few days ago. The answer is Yes it is possible to do what you describe, and they run this test on a regular schedule.

In anticipation of your next question, the first model duplicates in both runs if the random number is the same at the start. However, in subsequent models the work unit processing will diverge to some degree, as the small influence of hydrogen elements in the model are not considered, and they become cumulative as processing proceeds.

To bring it back on topic a bit, clearly longer run times might have an effect in comparisons of this type, but the goal here is to produce as many possible and plausible structures as possible, so this divergence is not a bad thing, and longer run times help. I am informed that these differences are growing smaller as the software improves. Obviously the best situation would be if all work units reached the same conclusion, and that conclusion was correct for the particular protein. It is heading in that direction but there is a way to go yet.


I\'m still trying to understand what this means. It appears that it means that subsequent models are not \"whole new attempts\". If a model gets generated and has a terrible energy (not sure what that means)... then why continue looking in that neighborhood? Wouldnt either looking in a whole new place each time (using a previous analogy)... or if the first attempt is not as good as some \"X\" then dont try anymore near here. This \"X\" could be fed with the workunit and be a feedback from other models that have started to \"zero in\" on the answer. Not sure I\'m making sense here, just some questions.
35) Message boards : Number crunching : Default Run Time (Message 16652)
Posted 19 May 2006 by Mike Gelvin
Post:
Web page for Default Run Time entry states:

Target CPU run time
(not selected defaults to 4 hours)

I have never chosen a run time, allowing the project to chose what might be best for me, so I assume the default is in play here (4 hours). However it actually appears to be set at 3 hours across all my machines. Is this an error?
36) Message boards : Rosetta@home Science : R@H attempting all CASP7 targets? (Message 16318)
Posted 15 May 2006 by Mike Gelvin
Post:
Once a target is out... when is the deadline for getting the predictions? Some people have large caches, hence a specific work unit might not bubble to the top for over a week.
37) Message boards : Number crunching : Merge Computers (Message 16082)
Posted 12 May 2006 by Mike Gelvin
Post:
I know the merge computer function was broke. I also know the fix for it has been out for some time. When will we be able to merge again? This will be especially important as account managers come on line.
38) Message boards : Number crunching : LONG..... work unit (Message 14714)
Posted 27 Apr 2006 by Mike Gelvin
Post:
1 day 4hrs and still going... currently at 8.31% complete. Is there a timer on these units that if they take too long they self abort? App is Rosetta 5.01
39) Message boards : Number crunching : LONG..... work unit (Message 14678)
Posted 26 Apr 2006 by Mike Gelvin
Post:
I have a condition on two of my computers that I have not seen before.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=13402092

Its a HBLR_1.0 unit.

It does not appear to be stuck, just VERY SLOW. It gains about 0.01% per 200 seconds. It has been running for 16 hours 50 minutes and is now at 5.15%. It looks like others have had \"trouble\" with this unit before (I am the third to receive it). Is it of value to allow it to continue? At this rate it should be done in 13 days which is beyond the due date. Its app is Rosetta 5.01, running on an Win XP with SP2.

On the other computer, there is also an HBLR_1.0 work unit that appears to have completed successfully by another person but was re-issued to me anyway. Its a much faster computer but has been running for 5 hours and is 2.91% complete.
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=13420677

40) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 14109)
Posted 19 Apr 2006 by Mike Gelvin
Post:
Aborted 1.04%

http://boinc.bakerlab.org/rosetta/result.php?resultid=17045924


Previous 20 · Next 20



©2019 University of Washington
http://www.bakerlab.org