21)
Message boards :
Rosetta@home Science :
Possible cancer cure found in plant
(Message 65280)
Posted 11 Feb 2010 by Aegis Maelstrom Post: Highly interesting, we have just discussed this topic on my team's (BOINC@Poland) forum. :) "Maybe that is something for a future Boinc project, ways to speed up clinical trials!" I'm sorry Otto, I think I don't understand you. What do you mean by saying "basically zero progress" and what kind of progress would like to see in a formal methodology of science? Double blind test is a double blind test. It requires patients, researchers and time. Sorry, welcome to the real life. I'm discussing the method as I don't think you were saying about the oncology itself, as the progress over there is clearly visible. Regarding BOINC and clinical trials: being honest, I don't imagine how we could speed it by BOINC. Certainly not by using our computing power. What we certainly can do, and try i.e. in Rosetta, is facilitating a basic research and finding new drug candidates. This is a lot already. :) And regarding "magic therapies"... one must remember that the world is full of hoaxes, swindles, cynical frauders, bigheads, greed and clearly crazy people. What is more, we humans tend to make mistakes. This are sad reasons why we must protect ourselves and stick to the scientific method. This is why this process tends that long. Maybe it could be a bit shorter, probably you can fight for an experimental therapy - but the clinical tests need to be done. Having said that, I'm sure the modern pharma/health care system must be changed, especially in the U.S. - but it needs to be based more on science. Not less. :) |
22)
Message boards :
Rosetta@home Science :
Design of protein-protein interfaces
(Message 64846)
Posted 7 Jan 2010 by Aegis Maelstrom Post: Great News and please, keep us updating! :) It should help to encourage us, the readers, and further our teammates, friends etc. to bigger involvement in R@H. |
23)
Message boards :
Cafe Rosetta :
Where is everyone
(Message 63641)
Posted 9 Oct 2009 by Aegis Maelstrom Post: Hi there! Nice to see a yet another soul interested in actual scientific progress of this great, shared effort we call Rosetta@Home (and BOINC, and distributed computing in general). Happy Crunching Everyone, and keep pushing for the science! :) |
24)
Message boards :
Number crunching :
Issue with checkpointing.
(Message 63640)
Posted 9 Oct 2009 by Aegis Maelstrom Post: Hi there, I am writing this report as I have seen a problem with checkpointing - unfortunately again. Work Unit: lr5_combine_smooth_torsion_it06_A_rlbd_1cg5_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_15145_49 Computed on a portable version of a good old BOINC 5.10.45 prepared by my team (BOINC@Poland) (sorry, can't use non-portable version but this software has been heavily used before). The WU has obvious problems with checkpointing. It's been computed on one computer and done in almost 3 hrs 8 models. The progress was something 4x.xx%. After a restart on another computer, the graphics app showed me a Model 0, Step 0. Suddenly the progress dropped to something around 25% and now a Model 0, Step 25 is being computed. It looks like a whole work has been wasted. The stderr.txt file shows logs of two runs of this Work Unit - one in the morning and one right now (in the evening). See: [2009-10- 9 6:29:47:] :: BOINC:: Initializing ... ok. [2009-10- 9 6:29:47:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev32257.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lr5_combine_smooth_torsion_it06_A.zip Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr5_1cg5.out.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Fullatom mode .. # cpu_run_time_pref: 21600 Fullatom mode .. Fullatom mode .. Fullatom mode .. Fullatom mode .. Fullatom mode .. Fullatom mode .. Fullatom mode .. Fullatom mode .. Fullatom mode .. [2009-10- 9 22:16: 1:] :: BOINC:: Initializing ... ok. [2009-10- 9 22:16: 1:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev32257.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lr5_combine_smooth_torsion_it06_A.zip Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr5_1cg5.out.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Fullatom mode .. # cpu_run_time_pref: 21600 I am pretty sure I have seen this bug before so it is probable it is not a question of this particular WU. Can anyone confirm this issue and deliver a solution? In a few days I will see results of this WU - i.e. how many models will be crunched and how many headers with number of results will be given (see a known bug with multiple headers in the result file). Have a nice weekend and keep rocking. Best from Warsaw. :) a.m. |
25)
Message boards :
Rosetta@home Science :
DISCUSSION of Rosetta@home Journal (4)
(Message 62899)
Posted 12 Aug 2009 by Aegis Maelstrom Post: Hello Dr. Baker, Hello Team, my sincere congratulations for your upcoming publication in Nature - one of the most prestigious scientific magazines! It is good to participate in such a productive project. I hope you will find time to deliver the all-around description of recent work, achievements and challenges you promised 2.5 weeks ago. Good luck with all your efforts, a.m.Poland |
26)
Message boards :
Number crunching :
Can't send back output files - BOINC manager issue.
(Message 60186)
Posted 17 Mar 2009 by Aegis Maelstrom Post: I can only suggest that you get some tasks, suspend network activity, complete the tasks, and study how those .out file look in the BOINC control files. That's precisely what I thought and did. :) I've did some further tasks and took a look into client_state.xml. After that switching the status from 0 to 1 was obvious. Unfortunately, I couldn't see the rest of the solution. If there were an MD5 problem, you would have a message saying that, and the tasks would be marked as invalid in some way and removed from your task list. Oh, good to know that. The version in question is 5.10.45, tweaked as a portable app (it should not perform a regular installation which would interfere in an OS). OS is Windows 2000. Being honest, I don't count on finding a solution before the WU deadline :) however I wanted to give a hint about this problem on a popular BOINC forum and at least gather some data for others affected by this bug in future. |
27)
Message boards :
Number crunching :
Can't send back output files - BOINC manager issue.
(Message 60168)
Posted 16 Mar 2009 by Aegis Maelstrom Post: Sorry for using this forum but there are R@H tasks and these fora are quite popular and maybe someone knows the answer. The problem is as following: I have finished several tasks and they show their status as "transferring". However, the manager does not show any result files to send back to the servers. The problem is *probably* a result of "one click too far" - double run of portable version of the manager (I can't *install* an original one on this machine). To make the story short: the results are present in their projects directories and seem to be fine. Unfortunately, the manager does not see them. When you check logs in stdoutdae.txt you see access path, like projects/docking.cis.udel.edu/name_of_file not found. Originally the manager tries only once and then "forgets" about files to transfer (but their WU status remains "transferring"). I coped with that manually editting client_state.xml (change status flag from 0 to 1 in a proper file's section), however the manager still doesn't see the files - you get the file not found log and that's all. What bugs me is I couldn't find any string with access path or sth that could be corrupted. Everything in client_state.xml looked fine. Last things: The results are not reported. What is interesting, one of original *sets of files* to transfers was not corrupted and got reported - that makes me think there is some access path responsible for all the files generated by one WU which I have missed. Second thing: further WUs have been downloaded from the servers and sent back without any problem. I see two possibilities: 1) BOINC manager is looking for the files in a wrong place. No idea why, probably some string got corrupted/deleted. I have tried to place a projects folder in different places - maybe I should try more? :) Is there any other file where such things like access paths are stored? 2) BOINC manager sees the files, but treats them as corrupted and doesn't give a proper report in its logs. Names of the files are correct, I havent seen anything strange according to their sizes (both actual and in client_state).. It is possible that the MD5 is not valid. :/ Any help will be appreciated. I have seen a couple of reports about this mistake here and there but no solution was found. |
28)
Message boards :
Number crunching :
who do some tasks show two results?
(Message 59999)
Posted 6 Mar 2009 by Aegis Maelstrom Post: Alright, I couldn't wait longer for some advice. I haven't engineered the out file, just accepted the loss and finished the WU. Here you have it, 15 results ignored, the second bracket with 1 result accepted. I hope the remaining 15 results (IMVHO they looked nice) went to the database and are scientifically used. If not, well, please correct this bug in the future. Best Regards from Warsaw, a.m. |
29)
Message boards :
Number crunching :
who do some tasks show two results?
(Message 59843)
Posted 27 Feb 2009 by Aegis Maelstrom Post: How could I force BOINC to... You are probably right. :):) However, my BOINC manager refuses to negotiate and a flattery does not work neither. ;) What Rosetta version is the task you are describing? Mini 1.54. The manager 5.10.45 but as I've said - the original double results error has been seen on later managers as well. The graphic always takes a minute or so to look right as a task starts. But I agree, if you had 15 models done, it should not have been starting back at model 1. Hmm... what really bugged me was some misplacement of the folding protein. It was seen only in one quarter, the rest was out of the box where it supposed to be. But it's just graphics any way. And yet there is this "urk" thing...
No, as far as I remember it didn't - it was like 98,8% I guess. I wanted to make my base safe and send it back - that's why I halted other tasks and made BOINC return to this WU and finish it. The worst thing I'd have suspected would be crunching another decoy (however, it was obvious there is no time within set 6 hrs for that). To my surprise, I've seen this Model 1 - and then this table with results in the output file. Best from Warsaw, a.m. |
30)
Message boards :
Number crunching :
who do some tasks show two results?
(Message 59805)
Posted 25 Feb 2009 by Aegis Maelstrom Post: O.K., I think I've catched this error while it was just happening. It's a different machine, Pentium IV with BOINC 5.10.15 I've waited for this Rosetta task to finally get done after circa 6 hrs of work. Finally the task got halted after 5:52 of runtime and finishing 15th model and the other project started. However, to my surprise, the task has not been sent to the server - it was still waiting for some more crunching! I wanted to complete it and see the results, so I have halted other tasks and started this WU. Then, it attempted to crunch... but from the model one, step probably one. The stage was named "urk", whatever it means, and everything looked like an error. The graphics seemed to be wrong as well - firstly nothing, only lines of energy and RMSD, then the picture was moved so one could see only a part of the protein, and then it got O.K. The progress dropped to 58%. I've switched off the client and started writing this bug report. I am pasting here the stderr.txt of this WU: OINC:: Initializing ... ok. [2009- 2-25 11:47:10:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 BOINC:: Initializing ... ok. [2009- 2-25 15:28:41:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 BOINC:: Initializing ... ok. [2009- 2-25 17:21:57:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 Continuing computation from checkpoint: chk_chk1_FastRelax__S_1ist_1_00010_fa ... success! Continuing computation from checkpoint: chk_chk2_FastRelax__S_1ist_1_00010_fa ... success! Continuing computation from checkpoint: chk_chk3_FastRelax__S_1ist_1_00010_fa ... success! Continuing computation from checkpoint: chk_chk4_FastRelax__S_1ist_1_00010_fa ... success! Continuing computation from checkpoint: chk_chk5_FastRelax__S_1ist_1_00010_fa ... success! BOINC:: Initializing ... ok. [2009- 2-25 19:27:52:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 Continuing computation from checkpoint: chk_chk1_FastRelax__S_1ist_1_00012_fa ... success! Continuing computation from checkpoint: chk_chk2_FastRelax__S_1ist_1_00012_fa ... success! Continuing computation from checkpoint: chk_chk3_FastRelax__S_1ist_1_00012_fa ... success! Continuing computation from checkpoint: chk_chk4_FastRelax__S_1ist_1_00012_fa ... success! Continuing computation from checkpoint: chk_chk5_FastRelax__S_1ist_1_00012_fa ... success! Continuing computation from checkpoint: chk_chk6_FastRelax__S_1ist_1_00012_fa ... success! Continuing computation from checkpoint: chk_chk7_FastRelax__S_1ist_1_00012_fa ... success! ====================================================== DONE :: 1 starting structures 21149.3 cpu seconds This process generated 15 decoys from 15 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC:: Initializing ... ok. [2009- 2-25 20:57:32:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/hw_mamaln_t290_3.loopbuild_SAVEALLOUT.1lop_.mtyka.boinc_files.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 As you may see, one table informing about 15 decoys has already been generated. I suppose that after some (mal)crunching the second table would be generated and hopefully both of them would be reported - but the first one would be ignored. Something obviously went wrong and I would like to report this WU properly and show generated results (I don't want them to be wasted, they are pretty good). I have to turn off this machine anyway so I can wait, but could you help me how should I transfer the results? Should I edit the stderr.txt file and cut off the latest lines? How could I force BOINC to just send back existing 15 decoys? Mod.Sense? Anybody? |
31)
Message boards :
Number crunching :
Double sending the same WU - issue with Rosetta's scheduler?
(Message 59802)
Posted 25 Feb 2009 by Aegis Maelstrom Post: Task 1louA_BOINC_ABRELAX_IGNORE_THE_REST-ENV1000--1louA-_7587_31_0. Reported on 25 Feb 2009 10:10:21 UTC as a success, however 0 credit points was granted as the server states that "Task was reported too late to validate". The problem is, the deadline was 2 Mar 2009 19:32:07 UTC. On the other hand, as we may see, the same WU was issued on 24th of Feb '09 to be returned today at 7:49. So here's my rant: Please, please, can you avoid double sending the same WUs? It is a waste of our computing power. If you do need some WUs ASAP, please send them to the known 24/7 crunchers - or at least set them a short deadline so we have some chance to adjust. I'm sorry for my bitterness but it's a kind of a stupid mistake which just makes a cruncher angry. I hope at least that it was some random error, not a change of Rosetta's policy... Best regards for all the crunchers and the team, a.m.@Poland |
32)
Message boards :
Number crunching :
Report long-running models here
(Message 59179)
Posted 30 Jan 2009 by Aegis Maelstrom Post: Aegis, that does sound a bit odd, but please allow it to run further. It should be caught by the watchdog once it has run for your 6hr preference plus 4 hours. And this is why it is having some trouble showing you any faster change in the % completed. Hi Mod! I've restared my computer for sure and let it crunch the WU to the bitter end. :] First thing I have seen - I am not sure if the checkpointing is working correctly... After the initialization, the searching WU started as a straw chain which went into SmallMoverEnergyCutRotamerTrials+Minimization. After a couple of seconds, I got something quasifolded with a lot of high details - actually there were only high details (only the thin chains with no thick ones!) and it got accepted as a low energy state. This low energy had over 300 000 energy units! - and it was step 1. After that the procedure of searching was being continued and after several seconds I got step 2 with different low energy ("only" over 150 000 units). Finally only after 7:34:56 and I guess less than 200 steps, the WU finished claiming success. Stderr out: <core_client_version>6.2.19</core_client_version> <![CDATA[ <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-29 15:40:18:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 # cpu_run_time_pref: 21600 BOINC:: Initializing ... ok. [2009- 1-30 8:32:28:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_1 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_2 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_1 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_2 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_3 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_4 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_5 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_6 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_7 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_8 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_9 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_10 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage4_kk_1 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage4_kk_2 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage4_kk_3 ... success! # cpu_run_time_pref: 21600 Continuing computation from checkpoint: chk_stage_1_ClassicRelax__S_00000001_fa ... success! Continuing computation from checkpoint: chk_stage_2_ClassicRelax__S_00000001_fa ... success! ====================================================== DONE :: 1 starting structures 27296.2 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> All of that looks quite similar to the behaviour I have seen in RALPH - please read the second part of this report. I haven't got any feedback regarding my test of the checkpointing; in a 1.54 summary Mike wrote he corrected some issues with checkpointing, but no one directly responded to my results saying "confirmed" or "here it works" - so I'm not sure if it was ignored, solved or actually what. I know that Mike is working very hard and needs to set priorities but this one doesn't look well. I am not sure if it is a bug - my knowledge how minirosetta is actually running is quite limited - but then again, if we knew more, we could be much more of help. While watching graphics we would be much more aware what is a strange behaviour and what is not. I do appreciate your work (and my humble crunching devices) and I would like to help. Best for you all, a.m. BOINC@Poland |
33)
Message boards :
Number crunching :
Report long-running models here
(Message 59168)
Posted 29 Jan 2009 by Aegis Maelstrom Post: Hi There! I'm just having yet another long unit, however with a different behaviour than before. Not sure if it's 1.54 specific or if it had been before. Task 1nkuA_BOINC_MPZN_with_zinc_abrelax_cs_frags_6231_146489_1 is just being crunched on my laptop. It has been running for 6h 50min (and more to go!), while my prefs are 6 hrs. Still model 1. The graphics shows strange behaviour. The accepted version is like 3/4 folded, however with a positive energy amount (71.4777; low energy is on 30.80764) and still on some earlier stage - without high details specific for the last steps. The searching protein is a different beast - it can be truly folded and is always with high details. I am on a "ShearMoverMoverBase+Minimization" right now (I think I've seen SmallMoverMoverBase-Minimization as well). And now the most important part: the Searching protein is moving each ca. 2 seconds, obviously changing itself, however the "Step" number is virtually stalled: it is crawling, increasing by one each ca. 40 seconds or more! Now I am on 316 605 step. It looks like as if the majority of the attempts in searching were not considered as "steps". I've been watching the task for 40 minutes. Maybe there were some changes in the accepted model but I don't think so. Now I am waiting for this task to get finished or killed by the watchdog. The WU is taking a bit more memory than usually. It takes 160+ MB of RAM, 338 of peak and 305 MB of VM. The next strange thing is, I can't find in BOINC logs (ver 6.2.19) when the WU started and if it was checkpointed and paused to run another project (QMC) - I don't have logs from 9:05 to 20:06. o.O Do you know all the reasons why the WU can be a long-running one and what is happening here? a.m. BOINC@Poland EDIT: The machine as you can see - Win XP SP2, 512 MB RAM minus RAM consumed by an integrated graphics, enough place on the HDD for the swap file. EDIT2: The task was suspended after 07:19:22 and the BOINC restarted a QMC workunit. I don't place WUs in RAM while rotating projects (too little RAM) and I have set a default rotation time for 3 hours. |
34)
Message boards :
Cafe Rosetta :
What's playing on the stereo?
(Message 59005)
Posted 24 Jan 2009 by Aegis Maelstrom Post: Currently the Third Programme of Polish Public Radio. =) It really can rock at night and it does warm up my Internet connection. :> |
35)
Message boards :
Number crunching :
Ghost WU - issue with the database?
(Message 58988)
Posted 23 Jan 2009 by Aegis Maelstrom Post: Hi All, recently, while watching the list of the recent tasks for my main computer (here) I've learned it says now I have more work units to complete that I actually have waiting on my computer. Precisely, some new WU popped up on this list (see ID 220650574, and here). It says it has been issued on 13 of Jan. Well, since then I've completed many WUs issued on the 13th and later. What is more, I check the task webpage quite often and I don't remember any task waiting on the bottom of the list. I've checked the job_log_boinc.bakerlab.org_rosetta.txt file (I suppose it should log all the WUs) but I can't see this name (abinitio_norelax_homfrag_129_B_1t2iA_SAVE_ALL_OUT_4626_12762) there. Now I am puzzled what is the issue. Is it some mix up in the database? |
36)
Message boards :
Number crunching :
who do some tasks show two results?
(Message 58960)
Posted 21 Jan 2009 by Aegis Maelstrom Post: Possibly - and this is really a guess, it might be that this bug is caused by not running the CPU at 100% utilization. Nah, I don't think that's that - I have a 100% CPU utilization set. But I think you are right it is some kind of checkpointing problem. Look on my previous post: it looks like it made a checkpoint after 8 decoys before the six hours for the WU and then the WU run longer than scheduled time, just because this decoy took some more time... I don't know why the scheduler thought it could make an additional decoy within time... I'm not even sure if this CPU time number is perfectly correct... :/ Maybe there is something with the preferences (now I have 6 hrs for WU, 3 hrs for rotation - but I'm not sure if there actually was a QMC unit to rotate with, so maybe it was run all the time without the brake...). Or maybe it is a different kind of bug. Certainly this is quite annoying - not only see that your machine has a quite limited crunching power but then it is "robbed" here and there. :] However, the transfer of generated but "ignored" results into the results database is most important. |
37)
Message boards :
Number crunching :
who do some tasks show two results?
(Message 58950)
Posted 20 Jan 2009 by Aegis Maelstrom Post: Hi All, unfortunately the problem seems to be not as rare as I thought. Once again two DONE sections. This task : stderr out <core_client_version>6.2.19</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 20372.3 cpu seconds This process generated 8 decoys from 8 attempts ====================================================== BOINC :: Watchdog shutting down... # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 23460.2 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> Claimed credit 61.61. Granted 5.80 (only 1 decoy seen). Can the team check if all the generated decoys sit in the database, not only one? :S I would add that - as this is my second attack of this bug - at least some points added would be highly appreciated... Take care, a.m.@Poland P.S. Unfortunately I got no valid RALPH tasks. When I had time to babysit, the database spewed only some wrong WUs... :/ |
38)
Message boards :
Number crunching :
Problems with web site
(Message 58912)
Posted 18 Jan 2009 by Aegis Maelstrom Post: Hi Mod, sorry for the previous claim regarding one of two computers - I had a bad day obviously and haven't seen the 0 seconds of work requested. Despite of that, the claim regarding my 2nd computer was valid. For many hours there were no problems, however presently once again we seem to be dry of WUs: 2009-01-18 15:42:16|rosetta@home|Sending scheduler request: Requested by user. Requesting 2495 seconds of work, reporting 1 completed tasks 2009-01-18 15:42:21|rosetta@home|Scheduler request succeeded: got 0 new tasks Best Regards for all. a.m. |
39)
Message boards :
Number crunching :
Problems with web site
(Message 58874)
Posted 17 Jan 2009 by Aegis Maelstrom Post: I can't connect to the website. The server status seems to be O.K., however none of my 2 computers - in different locations, with different IPs - can connect to the server and get a new work unit. According to the Rosetta server my 2nd laptop managed to upload data and report the success, however it can't get any new WUs. On the first laptop I get as following: 2009-01-17 17:13:37|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks 2009-01-17 17:13:59||Project communication failed: attempting access to reference site 2009-01-17 17:14:01||Internet access OK - project servers may be temporarily down. 2009-01-17 17:14:02|rosetta@home|Scheduler request failed: Couldn't connect to server I have no problem with the Internet access - I can even write this bug report down. :D The same issue was reported by my teammate a couple hours ago. Does anyone have the same problem? EDIT: an interesting issue - my BOINC Client repeated its attempt, however the second time it requested... 0 seconds of work! See: 2009-01-17 17:13:37|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks 2009-01-17 17:13:59||Project communication failed: attempting access to reference site 2009-01-17 17:14:01||Internet access OK - project servers may be temporarily down. 2009-01-17 17:14:02|rosetta@home|Scheduler request failed: Couldn't connect to server Finally, after a couple of attempts, my client managed to report a completed WU, however it still requested 0 seconds of work so I got no new WUs. 2009-01-17 17:19:12|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks 2009-01-17 17:20:48|rosetta@home|Scheduler request succeeded: got 0 new tasks In this way, I still get no new WUs. I don't know where the problem is, but looking at my 2nd laptop I guess it won't get any new Rosetta WU unless the BOINC is at least restarted. The strange thing is that the 2nd laptop still communicates with the server (the "last seen" date in my computers window is getting updated) but it doesn't take new WUs although it should. |
40)
Message boards :
Number crunching :
who do some tasks show two results?
(Message 58854)
Posted 16 Jan 2009 by Aegis Maelstrom Post: The second problem is one I believe I've seen before as well. Some tasks seem to have two "done sections" as you called them. And it seems as though the credit system only sees one of them. I've asked the Project Team to look in to this issue to see if they can determine the cause. Hi Mod, probably you remember this thread of mine. Obviously it is not a really widespread bug but I hope it will be fixed. Best Wishes for all of you. a.m. |
©2023 University of Washington
https://www.bakerlab.org