Posts by SekeRob

1) Message boards : Number crunching : minirosetta 2.03 (Message 64566)
Posted 21 Dec 2009 by SekeRob
Post:
The present test standing is:

306433095 279395159 21 Dec 2009 13:45:43 UTC 21 Dec 2009 14:53:27 UTC Over Success Done 3,366.56 17.47 15.56
306421142 279383866 21 Dec 2009 12:39:42 UTC 21 Dec 2009 13:49:54 UTC Over Success Done 3,503.61 18.18 15.74
306417911 279381434 21 Dec 2009 12:22:40 UTC 21 Dec 2009 13:33:02 UTC Over Success Done 3,542.00 18.38 16.13
306408510 279371956 21 Dec 2009 11:33:58 UTC 21 Dec 2009 12:43:56 UTC Over Success Done 3,594.54 18.65 16.79
306393079 279357285 21 Dec 2009 10:11:58 UTC 21 Dec 2009 12:26:53 UTC Over Success Done 7,661.61 39.76 40.86
306381783 279347432 21 Dec 2009 9:11:43 UTC 21 Dec 2009 11:38:12 UTC Over Success Done 3,444.83 17.88 15.05
306365927 279333133 21 Dec 2009 7:44:11 UTC 21 Dec 2009 9:07:33 UTC Over Success Done 3,497.34 18.15 16.63

Looks like it's pretty well figured out that an hour is an hour most of the times. I do appreciate that there is a non-deterministic element and if just incidental, a good project to act as filler when on a shutdown schedule. From 4 O'clock it's power-off.
2) Message boards : Number crunching : minirosetta 2.03 (Message 64562)
Posted 21 Dec 2009 by SekeRob
Post:
Just for try out, set 1 hour run time pref in home profile [because Rosetta is on a small share], saved, received 2 with about a 1 hour run time, but now this one is running

21/12/2009 12:23:40 rosetta@home [checkpoint_debug] result broker_idealclose_hb_t293__IGNORE_THE_REST_16362_82629_0 checkpointed

1:25 CPU time 1:26 Elapsed, and on 28 percent. It's checkpointing regularly, so don't consider this a bad task. Why this long one in-between? Mini 2.03 release.

PS: First 2 validated, this long one now Pending Validation at 2.11 hours... twice as long from specified.
3) Message boards : Number crunching : Minirosetta 1.86 (Message 62559)
Posted 28 Jul 2009 by SekeRob
Post:
None of my machines can get to the scheduler either. Although I can connect to the scheduling server it is not issuing my computers any new tasks. "Scheduler Request Succeeded: got 0 new tasks", has been that way since 1330 EST yesterday and I received three tasks right before that and they completed almost instantly.

Bit more log flags on and one gets:

28-7-2009 16:27:03|rosetta@home|Sending scheduler request: To fetch work. Requesting 1030 seconds of work, reporting 0 completed tasks
28-7-2009 16:27:08|rosetta@home|Scheduler request succeeded: got 0 new tasks
28-7-2009 16:27:08|rosetta@home|Message from server: Server error: can't attach shared memory
4) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60819)
Posted 25 Apr 2009 by SekeRob
Post:
All parlance aside, something changed in the client from 6.6.20, in a ludicrous way, and whilst reported at the Berkeley developers forums, no-one is home even after a repeat bump. Anyway, if you now set the client to permit 5 minutes write times and you have a quad core, it only allows 1 checkpoint per 20 minutes. So, here's how it now looks for minirosetta 1.54 on 6.6.24:

2009/04/25 16:00:16 rosetta@home Starting task frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 using minirosetta version 154
2009/04/25 16:21:27 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed
2009/04/25 16:45:18 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed
2009/04/25 17:09:59 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed
2009/04/25 17:34:47 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed

mike chosen to only write about every 24 minutes. The slot file timestamps of the fastrelax chkpnt sets do not change in between ;>)
5) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60663)
Posted 16 Apr 2009 by SekeRob
Post:
I apologize Forza. I wasn't clear on what your observations were. Perhaps it has changed with the 6.6 version as well.

Which WCG project checkpoints every 7 seconds?

ALL projects of WCG behave like that, most projects everywhere behave like that, making the write call, no change between client 5 when the <options> for checkpoint_debug was added and 6. It so happened that this anonymous beta had a very short checkpointing most convenient for testing. Commonly I document in WCG's FAQ section for folk who like to minimize suspension/progress loss.

Simply, If someone tells 5 minutes or 10 minutes or max of 999 seconds, the design is to have a log entry when a (significant/recovery point) disk save is made. From the 1,2,3 I gather this is not how miniRosetta is coded. The checkpoint is logged and written when actually occurring for a small piece and not when the lowest interval permitted as set in the client. Out with there are writes that are not logged. Personally, no one I think is interested in the log/write of the 2 small files, rather when the large dump is made, for they seem critical to establishing the recovery point. Anyway, now I know.
6) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60654)
Posted 15 Apr 2009 by SekeRob
Post:
Paul, see the third post to this thread. I've found the checkpoint debug messages report the attempts to checkpoint, not the physical flush to disk.

That diametrical opposed to my findings.

WCG Beta of new project, setting Write to Disk at Most to 0 seconds. Logs one every 7 seconds. Then, changed the WTD to 60 seconds. Restarted client and get log entries shortly after 60 seconds have passed. Changed to 5 minutes. Get checkpoint log entries shortly after 5 minutes pass. Still client 6.6.23. The slot file timestamps consistently followed the log entry times.
7) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60642)
Posted 15 Apr 2009 by SekeRob
Post:
As I said, the messages report when the application called the API to checkpoint. They do not indicate when data was written to disk. Some checkpoints store multiple files. So, between having multiple files, and having several checkpoints buffered into a single write, that may explain why you see "A whole swat of files gets written..." at once.

What have you set for your write at most setting? Are you expecting the C drive of a Windows PC to go idle and spin down to save power? In my experience, that will never happen, whether BOINC is running or not.

What BOINC version are you running?
What operating system are you using?

For the answers see the opening post you split off. Anyone who sets his disk to spin down, shortens live substantially, so no, mine always go 100%.

Multiple observations and comparing to a test job to QMC confirms that this application does not behave according expectation and some more for it also writes files outside the timepoints when checkpoints are logged, where any other project writes files at the checkpoints.
8) Message boards : Number crunching : Rosetta 5.98 using full CPU resources (Message 60633)
Posted 14 Apr 2009 by SekeRob
Post:
Apologies if you already know this, but in case not, there's an option to run only when the screensaver is showing


Where is this option to run only when the screensaver is showing? I can't find it anywhere...


Best



In BOINC manager Go to the Advanced Menu and then click the Preferences Option. On the Processor Usage Tab, uncheck while computer is in use and then set the "only after computer has been idle for" box to whatever your screensaver time out is or some time like 5 or 10 minutes probably would work as well.

Brian

Important, until now was that the core client has to start a minute before the screensaver kicks in AND the screensaver is not required. Many use the blank screensaver with password. They eat cycles after all or cause screen-burn on old CRTs.
9) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60629)
Posted 14 Apr 2009 by SekeRob
Post:
Anyway, the point is, is your application asking the core client for permission?


I believe it is. I am not a Rosetta coder to know for certain. This was posted during the testing on Ralph. I believe that the end of a model might be the only exception. I believe that is not really a checkpoint per se and that a write will always occur when the task reaches that point.

What are you seeing for file revision times in your slots directory?

This delay of writing for your app, and still writing them all is certainly clarified there given the seeming need to have them all to be able to restore from a system restart for instance.

No, 2 slot files default.out and rng.state.gz get modified at the exact same time stamps as recorded in the log i.e. every few minutes completely ignoring the "at most ..." carefully watching slot content. More interesting (disturbing), there is writes even when not logged and more frequent than the checkpoint log entries and with minutes offset. A whole swat of files gets written with a new chk_chk1_1... through 15. I'd rather you kept that in memory too till checkpoint write time. Now I'm even more uncomfortable with the whirring than I was before just looking at the log frequency.

Edit: Some typos and the checkpoint log for a longer time frame:

14/04/2009 19.11.59 World Community Grid [checkpoint_debug] result E000490_575B_002a0s009_1 checkpointed
14/04/2009 19.12.08 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.13.09 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.15.31 World Community Grid [checkpoint_debug] result R00270_b00f5fa921c9e1c31699fcd04438d3aa_01_000_6 checkpointed
14/04/2009 19.16.49 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.17.10 World Community Grid [checkpoint_debug] result HFCC_t1_00279513_TrkB_0002_0 checkpointed
14/04/2009 19.20.44 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.26.52 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.31.02 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.34.21 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.35.51 World Community Grid [checkpoint_debug] result R00270_b00f5fa921c9e1c31699fcd04438d3aa_01_000_6 checkpointed
14/04/2009 19.37.40 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.38.57 World Community Grid [checkpoint_debug] result HFCC_t1_00279513_TrkB_0002_0 checkpointed
14/04/2009 19.41.24 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
14/04/2009 19.45.03 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed
10) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60627)
Posted 14 Apr 2009 by SekeRob
Post:
I wanted to discuss this topic in more detail, so opened this new thread.

Could you describe how you are confirming when a checkpoint is taken? I've found that the cc_config checkpoint_debug message are misleading. It shows every time the application has requested a checkpoint, but they are actually buffered until the "write to disk at most" wait time is satisfied.

What I would suggest is review of the last change date/times on the files in the slot directory the task is running in.

Well, that the first I've heard of that one. e.g. RICE & DDDT have checkpoints from 1 to 2 minutes, but only write and log them as said after the set time.

afaik, the science app can have a code to check if permitted. QMC is another one that does not have the call in theirs, by their own confirmation last I had exchange with them and promise to implement, which is now good many months ago.

As for buffered and waiting, that is not the description I found. The checkpoint is skipped and the next one up will show, so if the checkpoint were to occur every 4 minutes and the "at most" is 5 minutes, the next checkpoint written to disk is the one occurring at the 8th minute.

Mind you, may have discovered an unrelated bug for sciences that do follow the rule asking the core client (6.6.23) Just reported a continues every minute project preempting at BOINCdev forum and some projects taking 5 minutes "at most" on quad as 4x5 minutes, before writing again, meaning after 20 minutes.

Anyway, the point is, is your application asking the core client for permission?
11) Message boards : Number crunching : Checkpointing under Rosetta Mini (Message 60621)
Posted 14 Apr 2009 by SekeRob
Post:
Noting the denser checkpointing, the application is not set to check with the core client if checkpoint writing is at all permitted per the preferences. Set it to 5 minutes and the mini just happily ignores and writes one every few minutes. Other projects such as WCG respect the 5 minute restriction and write one on first one to happen after 5 minutes. Client 6.6.23.
12) Message boards : Number crunching : High Priority (Message 46023)
Posted 11 Sep 2007 by SekeRob
Post:
Running just now two WUs with comment ''High Priority''. What is special about these?

What's the deadline like compared to others you ran?
13) Message boards : Number crunching : Problems with Rosetta version 5.40 (Message 31178)
Posted 15 Nov 2006 by SekeRob
Post:
Did not quite get the assurance that the 5.36 WU in the queue together with the 5.40 in the queue would take the proper version, thus canceled all and requested a fresh batch.

thanks

PS. At WCG the WU with version X marker, would work with version X software i.e. multiple versions could be in the queue.
14) Message boards : Number crunching : Still ignoring 64 bit users? (Message 30856)
Posted 9 Nov 2006 by SekeRob
Post:
The potential to improve the performance is there, especially by getting rid of arcane x87 code. But one will only know after trying, which hasn't been done yet. Instead, Rosetta went on to support cottage OSX/x86 systems...

That's highly assumptious! Who's privy of what's tried in the labs and the reasoning why after so many years of x64 in the market and the user base lack of growth, for the very reason that most users simply dont need it. The projects are aimed at the vast resource readily available and not the bleeding edge of calculation power that require a disproportionate amount of effort to implement. If easy access can be made to 10000 x32 machines versus 100 x64, where do u think the bets are placed? Well tell u what, I'd go for the 10000 x32.
15) Message boards : Rosetta@home Science : Machines sitting Idle (Message 30800)
Posted 8 Nov 2006 by SekeRob
Post:
Currently, the project has two types of WUs, some require 256MB memory, some require 512MB. Your machines are hidden, so I can't tell. Do you have 256MB of memory?

I for one am unclear exactly how the system handles these memory limitations. Are you getting any messages in BOINC Manager as you attempt to download work? Do you know if a WU was actually downloaded before reporting the message?

I believe BOINC is working on some enhancements to the scheduling to help better accomodate such situations and deliver work that's appropriate to the client.


Cant remember where, but someone proposed that if u set a second project of choice up, e.g. Tanpaku or Simap and give it e.g. 0.01 factor of time share, Rosetta will be treated as primary. If there is no work that fits, the machine starts pulling work from alternate projects, until Rosetta has something on offer again. WCG has a few life science projects coming with very small needs, so have a sniff there too. In fact WCG restarted HPF2 which is running on the Rosetta principle, 1 WU, many seeds, no quorum. Drs.Bonneau well known to Dr.Baker is the architects of that work. It's expected to take a long time for that project to complete. HPF1 of same took 27,000 CPU years to complete.
16) Message boards : Number crunching : Is XP Hibernation Function a Long-Setpoint Solution? (Message 30564)
Posted 3 Nov 2006 by SekeRob
Post:
Works well for me with the WCG UD proprietary agent and BOINC - 5 projects (UD/WCG/Rosetta/TANPAKU/SIMAP (latter on periodic projects). Use it regular as power is not stable here. The UPS allows orderly close. Before, put the agent in sleep mode, hold shift key before mousing the start button in WinXP (if the hibernation option does not show up in the close option), hit hibernate and it takes a minute to close. Standby still requires some juice to maintain memory state, whereas hibernation does not. Very rarely does it jump back to the previous checkpoint save (due the mentioned clock sync issues).
17) Message boards : Number crunching : Limiting CPU usage (Message 30106)
Posted 27 Oct 2006 by SekeRob
Post:
At this time, i�m doing fine with TreadMaster, because my PC is second hand and the cooling fan doesn�t work well. I suppose the bearing has weared off.

My question is:
Have I also to restart my computer when I�m using BES, like I must do with ThreadMaster, to make the changes of CPU usage take effect, or can the CPU usage of BES adjusted without rebooting?


From my experience, if u make changes to parms of TM, they will take effect the next time u restart BOINC or any app that uses TM. Also when BOINC moves onto the next WU will it read the parm for the particular project. It will not read them when switching projects using the Pre-Emting Keep In Memory of say an hourly project switch, only when starting a new WU. That's how is works on my WXP SP2.

cheers
18) Message boards : Number crunching : rosetta/boinc does not suspend (Message 29502)
Posted 17 Oct 2006 by SekeRob
Post:
First question always drilling home is the OS, BOINC Version, Antivirus, Firewall.
Second question: Did u do a search on the forum for the keyword 'screensaver'.....

Killing a process mid-flight could cause damage to the WU depending on what it happened to do at the moment....if e.g. during checkpoint writing, it could cause the loss of the recovery point and make it start all over.
19) Message boards : Number crunching : Limiting CPU usage (Message 29249)
Posted 12 Oct 2006 by SekeRob
Post:
The quickest way to nirvana and to prevent any typing errors, put the complete text in the quote box below including 'Windows....' into notepad and save it to a file called rosetta.reg on your desktop. Ensure the extension u select in dropdown box below file name All and when saved has a .reg extension. Then right-click on the file saved on desktop and select 'install'. Probably it will ask for a few okays. After that you're ready to restart BOINC. Threadmaster will automatically read the new settings anytime a new Work Unit is started.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesThreadMasterParametersApplications]
"SpybotSD.exe"="5"
"Ad-Aware.exe"="5"
"dfrgntfs.exe"="10"
"dfrgfat.exe"="10"
"rosetta_5.25_windows_intelx86.exe"="80"
"rosetta_5.32_windows_intelx86.exe"="80"
20) Message boards : Rosetta@home Science : Prions...? (Message 29210)
Posted 12 Oct 2006 by SekeRob
Post:

That makes me wonder something else then. The number-crunching that we're doing here, should there be some concern that one of these proteins, if reproduced in some of the shapes we're looking at, could cause some major problems (ie. a new disease)?


Hey, that's an interesting point! Very interesting.. It's a small possibility I suppose.

These diseases are associated w/ the insolubility of proteins in vivo, resulting in the plaques that fall out of solution and form on the various tissues. I would bet that if 'rogue molecule' would be designed that it would be really difficult w/ which to work in the lab and folks would give up on it long before it was put into any testing pipeline...

Not only that, I'd like to think that something like this would show up in the extensive testing that would be done to satisfy the FDA before any human saw anything developed here - and there is a LOT of testing that is required.

Hmm... an interesting idea though... I smell a M. Crichton book in the making... -KEL


Like any new medicine / application it will have to go through clinical trials to determine bad side-effects....i'm sure in the case of the protein engineering, the side-effects will be considered....the cure worse than the decease is not unthinkable....the last few recent cases of medicine retraction from market come to mind.


Next 20



©2024 University of Washington
https://www.bakerlab.org