1)
Message boards :
Number crunching :
minirosetta 2.03
(Message 64566)
Posted 21 Dec 2009 by SekeRob Post: The present test standing is: 306433095 279395159 21 Dec 2009 13:45:43 UTC 21 Dec 2009 14:53:27 UTC Over Success Done 3,366.56 17.47 15.56 306421142 279383866 21 Dec 2009 12:39:42 UTC 21 Dec 2009 13:49:54 UTC Over Success Done 3,503.61 18.18 15.74 306417911 279381434 21 Dec 2009 12:22:40 UTC 21 Dec 2009 13:33:02 UTC Over Success Done 3,542.00 18.38 16.13 306408510 279371956 21 Dec 2009 11:33:58 UTC 21 Dec 2009 12:43:56 UTC Over Success Done 3,594.54 18.65 16.79 306393079 279357285 21 Dec 2009 10:11:58 UTC 21 Dec 2009 12:26:53 UTC Over Success Done 7,661.61 39.76 40.86 306381783 279347432 21 Dec 2009 9:11:43 UTC 21 Dec 2009 11:38:12 UTC Over Success Done 3,444.83 17.88 15.05 306365927 279333133 21 Dec 2009 7:44:11 UTC 21 Dec 2009 9:07:33 UTC Over Success Done 3,497.34 18.15 16.63 Looks like it's pretty well figured out that an hour is an hour most of the times. I do appreciate that there is a non-deterministic element and if just incidental, a good project to act as filler when on a shutdown schedule. From 4 O'clock it's power-off. |
2)
Message boards :
Number crunching :
minirosetta 2.03
(Message 64562)
Posted 21 Dec 2009 by SekeRob Post: Just for try out, set 1 hour run time pref in home profile [because Rosetta is on a small share], saved, received 2 with about a 1 hour run time, but now this one is running 21/12/2009 12:23:40 rosetta@home [checkpoint_debug] result broker_idealclose_hb_t293__IGNORE_THE_REST_16362_82629_0 checkpointed 1:25 CPU time 1:26 Elapsed, and on 28 percent. It's checkpointing regularly, so don't consider this a bad task. Why this long one in-between? Mini 2.03 release. PS: First 2 validated, this long one now Pending Validation at 2.11 hours... twice as long from specified. |
3)
Message boards :
Number crunching :
Minirosetta 1.86
(Message 62559)
Posted 28 Jul 2009 by SekeRob Post: None of my machines can get to the scheduler either. Although I can connect to the scheduling server it is not issuing my computers any new tasks. "Scheduler Request Succeeded: got 0 new tasks", has been that way since 1330 EST yesterday and I received three tasks right before that and they completed almost instantly. Bit more log flags on and one gets: 28-7-2009 16:27:03|rosetta@home|Sending scheduler request: To fetch work. Requesting 1030 seconds of work, reporting 0 completed tasks 28-7-2009 16:27:08|rosetta@home|Scheduler request succeeded: got 0 new tasks 28-7-2009 16:27:08|rosetta@home|Message from server: Server error: can't attach shared memory |
4)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60819)
Posted 25 Apr 2009 by SekeRob Post: All parlance aside, something changed in the client from 6.6.20, in a ludicrous way, and whilst reported at the Berkeley developers forums, no-one is home even after a repeat bump. Anyway, if you now set the client to permit 5 minutes write times and you have a quad core, it only allows 1 checkpoint per 20 minutes. So, here's how it now looks for minirosetta 1.54 on 6.6.24: 2009/04/25 16:00:16 rosetta@home Starting task frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 using minirosetta version 154 2009/04/25 16:21:27 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed 2009/04/25 16:45:18 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed 2009/04/25 17:09:59 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed 2009/04/25 17:34:47 rosetta@home [checkpoint_debug] result frb_0_8_mike_chosen_cst_hb_t313__IGNORE_THE_REST_1BG2A_4_11064_44_0 checkpointed mike chosen to only write about every 24 minutes. The slot file timestamps of the fastrelax chkpnt sets do not change in between ;>) |
5)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60663)
Posted 16 Apr 2009 by SekeRob Post: I apologize Forza. I wasn't clear on what your observations were. Perhaps it has changed with the 6.6 version as well. ALL projects of WCG behave like that, most projects everywhere behave like that, making the write call, no change between client 5 when the <options> for checkpoint_debug was added and 6. It so happened that this anonymous beta had a very short checkpointing most convenient for testing. Commonly I document in WCG's FAQ section for folk who like to minimize suspension/progress loss. Simply, If someone tells 5 minutes or 10 minutes or max of 999 seconds, the design is to have a log entry when a (significant/recovery point) disk save is made. From the 1,2,3 I gather this is not how miniRosetta is coded. The checkpoint is logged and written when actually occurring for a small piece and not when the lowest interval permitted as set in the client. Out with there are writes that are not logged. Personally, no one I think is interested in the log/write of the 2 small files, rather when the large dump is made, for they seem critical to establishing the recovery point. Anyway, now I know. |
6)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60654)
Posted 15 Apr 2009 by SekeRob Post: Paul, see the third post to this thread. I've found the checkpoint debug messages report the attempts to checkpoint, not the physical flush to disk. That diametrical opposed to my findings. WCG Beta of new project, setting Write to Disk at Most to 0 seconds. Logs one every 7 seconds. Then, changed the WTD to 60 seconds. Restarted client and get log entries shortly after 60 seconds have passed. Changed to 5 minutes. Get checkpoint log entries shortly after 5 minutes pass. Still client 6.6.23. The slot file timestamps consistently followed the log entry times. |
7)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60642)
Posted 15 Apr 2009 by SekeRob Post: As I said, the messages report when the application called the API to checkpoint. They do not indicate when data was written to disk. Some checkpoints store multiple files. So, between having multiple files, and having several checkpoints buffered into a single write, that may explain why you see "A whole swat of files gets written..." at once. For the answers see the opening post you split off. Anyone who sets his disk to spin down, shortens live substantially, so no, mine always go 100%. Multiple observations and comparing to a test job to QMC confirms that this application does not behave according expectation and some more for it also writes files outside the timepoints when checkpoints are logged, where any other project writes files at the checkpoints. |
8)
Message boards :
Number crunching :
Rosetta 5.98 using full CPU resources
(Message 60633)
Posted 14 Apr 2009 by SekeRob Post: Apologies if you already know this, but in case not, there's an option to run only when the screensaver is showing Important, until now was that the core client has to start a minute before the screensaver kicks in AND the screensaver is not required. Many use the blank screensaver with password. They eat cycles after all or cause screen-burn on old CRTs. |
9)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60629)
Posted 14 Apr 2009 by SekeRob Post: Anyway, the point is, is your application asking the core client for permission? This delay of writing for your app, and still writing them all is certainly clarified there given the seeming need to have them all to be able to restore from a system restart for instance. No, 2 slot files default.out and rng.state.gz get modified at the exact same time stamps as recorded in the log i.e. every few minutes completely ignoring the "at most ..." carefully watching slot content. More interesting (disturbing), there is writes even when not logged and more frequent than the checkpoint log entries and with minutes offset. A whole swat of files gets written with a new chk_chk1_1... through 15. I'd rather you kept that in memory too till checkpoint write time. Now I'm even more uncomfortable with the whirring than I was before just looking at the log frequency. Edit: Some typos and the checkpoint log for a longer time frame: 14/04/2009 19.11.59 World Community Grid [checkpoint_debug] result E000490_575B_002a0s009_1 checkpointed 14/04/2009 19.12.08 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.13.09 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.15.31 World Community Grid [checkpoint_debug] result R00270_b00f5fa921c9e1c31699fcd04438d3aa_01_000_6 checkpointed 14/04/2009 19.16.49 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.17.10 World Community Grid [checkpoint_debug] result HFCC_t1_00279513_TrkB_0002_0 checkpointed 14/04/2009 19.20.44 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.26.52 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.31.02 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.34.21 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.35.51 World Community Grid [checkpoint_debug] result R00270_b00f5fa921c9e1c31699fcd04438d3aa_01_000_6 checkpointed 14/04/2009 19.37.40 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.38.57 World Community Grid [checkpoint_debug] result HFCC_t1_00279513_TrkB_0002_0 checkpointed 14/04/2009 19.41.24 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed 14/04/2009 19.45.03 rosetta@home [checkpoint_debug] result lr5_E_no_rama_04_intra_rep_rlbd_1ubi_SAVE_ALL_OUT_10755_841_0 checkpointed |
10)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60627)
Posted 14 Apr 2009 by SekeRob Post: I wanted to discuss this topic in more detail, so opened this new thread. Well, that the first I've heard of that one. e.g. RICE & DDDT have checkpoints from 1 to 2 minutes, but only write and log them as said after the set time. afaik, the science app can have a code to check if permitted. QMC is another one that does not have the call in theirs, by their own confirmation last I had exchange with them and promise to implement, which is now good many months ago. As for buffered and waiting, that is not the description I found. The checkpoint is skipped and the next one up will show, so if the checkpoint were to occur every 4 minutes and the "at most" is 5 minutes, the next checkpoint written to disk is the one occurring at the 8th minute. Mind you, may have discovered an unrelated bug for sciences that do follow the rule asking the core client (6.6.23) Just reported a continues every minute project preempting at BOINCdev forum and some projects taking 5 minutes "at most" on quad as 4x5 minutes, before writing again, meaning after 20 minutes. Anyway, the point is, is your application asking the core client for permission? |
11)
Message boards :
Number crunching :
Checkpointing under Rosetta Mini
(Message 60621)
Posted 14 Apr 2009 by SekeRob Post: Noting the denser checkpointing, the application is not set to check with the core client if checkpoint writing is at all permitted per the preferences. Set it to 5 minutes and the mini just happily ignores and writes one every few minutes. Other projects such as WCG respect the 5 minute restriction and write one on first one to happen after 5 minutes. Client 6.6.23. |
12)
Message boards :
Number crunching :
High Priority
(Message 46023)
Posted 11 Sep 2007 by SekeRob Post: Running just now two WUs with comment ''High Priority''. What is special about these? What's the deadline like compared to others you ran? |
13)
Message boards :
Number crunching :
Problems with Rosetta version 5.40
(Message 31178)
Posted 15 Nov 2006 by SekeRob Post: Did not quite get the assurance that the 5.36 WU in the queue together with the 5.40 in the queue would take the proper version, thus canceled all and requested a fresh batch. thanks PS. At WCG the WU with version X marker, would work with version X software i.e. multiple versions could be in the queue. |
14)
Message boards :
Number crunching :
Still ignoring 64 bit users?
(Message 30856)
Posted 9 Nov 2006 by SekeRob Post: The potential to improve the performance is there, especially by getting rid of arcane x87 code. But one will only know after trying, which hasn't been done yet. Instead, Rosetta went on to support cottage OSX/x86 systems... That's highly assumptious! Who's privy of what's tried in the labs and the reasoning why after so many years of x64 in the market and the user base lack of growth, for the very reason that most users simply dont need it. The projects are aimed at the vast resource readily available and not the bleeding edge of calculation power that require a disproportionate amount of effort to implement. If easy access can be made to 10000 x32 machines versus 100 x64, where do u think the bets are placed? Well tell u what, I'd go for the 10000 x32. |
15)
Message boards :
Rosetta@home Science :
Machines sitting Idle
(Message 30800)
Posted 8 Nov 2006 by SekeRob Post: Currently, the project has two types of WUs, some require 256MB memory, some require 512MB. Your machines are hidden, so I can't tell. Do you have 256MB of memory? Cant remember where, but someone proposed that if u set a second project of choice up, e.g. Tanpaku or Simap and give it e.g. 0.01 factor of time share, Rosetta will be treated as primary. If there is no work that fits, the machine starts pulling work from alternate projects, until Rosetta has something on offer again. WCG has a few life science projects coming with very small needs, so have a sniff there too. In fact WCG restarted HPF2 which is running on the Rosetta principle, 1 WU, many seeds, no quorum. Drs.Bonneau well known to Dr.Baker is the architects of that work. It's expected to take a long time for that project to complete. HPF1 of same took 27,000 CPU years to complete. |
16)
Message boards :
Number crunching :
Is XP Hibernation Function a Long-Setpoint Solution?
(Message 30564)
Posted 3 Nov 2006 by SekeRob Post: Works well for me with the WCG UD proprietary agent and BOINC - 5 projects (UD/WCG/Rosetta/TANPAKU/SIMAP (latter on periodic projects). Use it regular as power is not stable here. The UPS allows orderly close. Before, put the agent in sleep mode, hold shift key before mousing the start button in WinXP (if the hibernation option does not show up in the close option), hit hibernate and it takes a minute to close. Standby still requires some juice to maintain memory state, whereas hibernation does not. Very rarely does it jump back to the previous checkpoint save (due the mentioned clock sync issues). |
17)
Message boards :
Number crunching :
Limiting CPU usage
(Message 30106)
Posted 27 Oct 2006 by SekeRob Post: At this time, i�m doing fine with TreadMaster, because my PC is second hand and the cooling fan doesn�t work well. I suppose the bearing has weared off. From my experience, if u make changes to parms of TM, they will take effect the next time u restart BOINC or any app that uses TM. Also when BOINC moves onto the next WU will it read the parm for the particular project. It will not read them when switching projects using the Pre-Emting Keep In Memory of say an hourly project switch, only when starting a new WU. That's how is works on my WXP SP2. cheers |
18)
Message boards :
Number crunching :
rosetta/boinc does not suspend
(Message 29502)
Posted 17 Oct 2006 by SekeRob Post: First question always drilling home is the OS, BOINC Version, Antivirus, Firewall. Second question: Did u do a search on the forum for the keyword 'screensaver'..... Killing a process mid-flight could cause damage to the WU depending on what it happened to do at the moment....if e.g. during checkpoint writing, it could cause the loss of the recovery point and make it start all over. |
19)
Message boards :
Number crunching :
Limiting CPU usage
(Message 29249)
Posted 12 Oct 2006 by SekeRob Post: The quickest way to nirvana and to prevent any typing errors, put the complete text in the quote box below including 'Windows....' into notepad and save it to a file called rosetta.reg on your desktop. Ensure the extension u select in dropdown box below file name All and when saved has a .reg extension. Then right-click on the file saved on desktop and select 'install'. Probably it will ask for a few okays. After that you're ready to restart BOINC. Threadmaster will automatically read the new settings anytime a new Work Unit is started. Windows Registry Editor Version 5.00 |
20)
Message boards :
Rosetta@home Science :
Prions...?
(Message 29210)
Posted 12 Oct 2006 by SekeRob Post:
Like any new medicine / application it will have to go through clinical trials to determine bad side-effects....i'm sure in the case of the protein engineering, the side-effects will be considered....the cure worse than the decease is not unthinkable....the last few recent cases of medicine retraction from market come to mind. |
©2024 University of Washington
https://www.bakerlab.org