Message boards : Number crunching : minirosetta 2.05
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one ran for 11min. lr15clus_3fa_opt_.1bm8.1bm8.SAVE_ALL_OUT_IGNORE_THE_REST.c.13.6.pdb.pdb.JOB_17967_1_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=289719139 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 ERROR: start_res != middle_res ERROR:: Exit from: src/protocols/moves/KinematicMover.cc line: 132 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
AdeB Send message Joined: 12 Dec 06 Posts: 45 Credit: 4,428,086 RAC: 0 |
The same error as P.P.L. and Admin. ERROR: start_res != middle_res ERROR:: Exit from: src/protocols/moves/KinematicMover.cc line: 132 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Task 317684657 AdeB |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,574,068 RAC: 10,198 |
Max, the watchdog should be expected to kick in if a model exceeds target runtime by more then 4 hours. And so with a short 2 hour preference, you will tend to see much more variation in runtimes then if you had a longer preference.
I got a lot of tasks that ignore the Target CPU Time in preferences recently It seems most of them belong to the type * boinc_filtered_loopbuild_threading * Examples of such tasks: t380__boinc_filtered_loopbuild_threading_cst_lb_tex_IGNORE_THE_REST_16900_2906_0 - 15002.5 cpu seconds, 2 decoys t347__boinc_filtered_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_16901_9452_0 - 20591.2 cpu seconds, 2 decoys t330__boinc_filtered_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_16901_3175_0 - 16323.3 cpu seconds, 2 decoys t322__boinc_filtered_loopbuild_threading_cst_all_tex_IGNORE_THE_REST_16902_3299_0 - 21789.4 cpu seconds, 3 decoys In all the examples cpu_run_time_pref was set at 7200 sec. And all was generated 2 or more decoys(and 2 of them i saw what 1st model took about 2hr or more), so that the program was able to stop after 1st decoy correctly. But for some reason did not do so. |
banditwolf Send message Joined: 10 Jan 06 Posts: 28 Credit: 139,737 RAC: 0 |
I am seeing the 2.05 use more memory than previous versions. Up to 160k from ~60-100k. |
Sarel Send message Joined: 11 May 06 Posts: 51 Credit: 81,712 RAC: 0 |
Hello, if these are the Protein-interface Design jobs then this is expected since they work with very large complexes of proteins. If you turn on the graphics you'll see that the protein systems are much larger than the typical ones on Rosetta @ Home. These jobs are sent out with a requirement for 512Mb of memory to ensure that large-memory jobs are not sent out to low-resource machines. Best, Sarel. I am seeing the 2.05 use more memory than previous versions. Up to 160k from ~60-100k. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Task 317544195 , lr15clus_opt_.1a32.1a32.IGNORE_THE_REST.c.2.8.pdb.pdb.JOB_17418_5_1 behaved strangely on Mac OS X. It got hung at Model 2: step 0 and had to aborted. In the Searching... pane in the graphics window the protein was compressed into a furball: the other protein displays seemed pretty normal. |
Craig Dickinson Send message Joined: 7 May 07 Posts: 8 Credit: 1,021,887 RAC: 0 |
I am using a router so re-loaded the firmware as suggested and this has fixed the problem. Didn't think about the router being the cause as I would have expected that to have caused problems with other BOINC project updates or other software auto updaters. |
pvh Send message Joined: 7 Feb 10 Posts: 3 Credit: 2,487,638 RAC: 0 |
I am seeing WUs that seem to be "stuck". If you look at the properties of the WU, you typically see something like: CPU time at last checkpoint 00:35:26 CPU time 06:02:29 If you look at the graphics, you see that the protein is not changing shape at all and the energy and RMSD are perfectly constant. These jobs run on for around 25,000 seconds and (I assume) are then terminated by the watchdog. You get very low credit for these jobs. I assume this is a bug in the code. If so, please fix it quickly since it is wasting lots of CPU time. When I see such a WU, should I abort it, or is it better to leave it running? This is with Rosetta Mini 2.05 on a 64-bit Linux system. I have seen this on both of my OpenSUSE 11.2 systems with the 2.6.31.8-0.1-desktop kernel (so hardware problems are ruled out). I have so far not seen this on my dual-core OpenSUSE 11.0 system with a custom 2.6.28.2-vanilla kernel. The latter is by far the least performant machine, so there is a (small) chance that this is just random chance. I do not see an obvious pattern which WUs suffer from this. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
CPU time at last checkpoint 00:35:26 Try closing down BOINC and re-opening it. That seems to do the trick. |
pvh Send message Joined: 7 Feb 10 Posts: 3 Credit: 2,487,638 RAC: 0 |
Try closing down BOINC and re-opening it. That seems to do the trick. Thanks for the tip, but why did you remove my post? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
lr15clusfa_opt_.1o5u.1o5u.SAVE_ALL_OUT_IGNORE_THE_REST.c.13.8.pdb.pdb.JOB_17715_7_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=317120311 Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) CPU time 15.39063 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I don't know if this is a task problem or because of the validator problems, i'll put it here anyway. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=291130717 2cgq_Jan28_2cgq_3cp0_ProteinInterfaceDesign_15Feb2010_18083_187_0 Validate error # cpu_run_time_pref: 14400 ====================================================== DONE :: 2 starting structures 14487.9 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> |
sam_spade Send message Joined: 2 Dec 08 Posts: 1 Credit: 453,056 RAC: 0 |
Almost since a week I get an error while downloading the app: [error] Can't create HTTP response output file projects/boinc.bakerlab.org_rosetta/minirosetta_2.05_windows_x86_64.exe What can I do? I already tried to reset the project. The rosetta_beta version_598 app works well. |
[AF>Le_Pommier>MacBidouille.com] BlueG3 Send message Joined: 16 Mar 08 Posts: 1 Credit: 43,585 RAC: 0 |
|
markj Send message Joined: 21 Jun 08 Posts: 6 Credit: 18,060,229 RAC: 0 |
all, or at least most, of the ProteinInterface jobs cause validate errors - would it be possible to fix this and post in this thread when the fix is performed? It occurs on three different computers (PC, Mac), so appears to be platform-independent. In the meantime, I am aborting all ProteinInterface jobs, leaving the others which run ok. markj |
J Send message Joined: 23 Feb 10 Posts: 4 Credit: 68,995 RAC: 0 |
http://img80.imageshack.us/img80/4378/roserr1.jpg Haven't been on this project long. No noticeable problems outside of punching 'ok'. Briefly searched the forums for c++ runtime error and didn't find anything, so cheers, here's a pic. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like "J" has had a few compute errors reported on Win XP running BOINC 6.5.0: abinitio_withrelax_nodisulf_nohomfrag_cst0.1_129_B_1wjgA_SAVE_ALL_OUT_17405_1983_0 abinitio_withrelax_nodisulf_nohomfrag_cst0.1_129_B_2hx5A_SAVE_ALL_OUT_17405_468_0 lrmixclus_opt_.5cro.5cro.SAVE_ALL_OUT_IGNORE_THE_REST.c.5.6.pdb.pdb.JOB_17886_5_1 The first was completed ok by a wingman The second is out being worked on right now The third failed on a wingman as well after 2 min. with an error: The system cannot find the path specified. (0x3) - exit code 3 (0x3) Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 556 |
I am running Rosetta@home (along with some other projects) on four machines, three runing XP and one running Win7. Two of the machines have AMD 64 single core processors and two have AMD 64 dual core processors. The rosetta app is 'rosetta mini 2.05' and the BOINC version is 6.10.18. About once a week or so I will notice that the Rosetta WU running on at least one of the machines has been running longer than usual and using little or no CPU time. I abort it, and the next WU runs fine. I have not seen this occur with any of my other projects (climite prodiction, malaria control or world community grid). Thanks - one of my 2.05 workunits had the same problem, but now seems to be running well after a reboot. https://boinc.bakerlab.org/rosetta/result.php?resultid=320652086 64-bit Vista SP2, BOINC 6.10.18, quad-core Intel, not using keep in memory when suspended (something tends to tie up lots of memory and make the computer unresponsive to the mouse and keyboard; haven't found what, though) t311__boinc_filtered_loopbuild_threading type workunit Before the reboot, showed CPU time 03:39:05, last checkpoint 03:39:03, elapsed time so far 20:29:26, not using any CPU time Rebooted, that workunit restarted at about 4 hours elapsed time, but is now using a CPU core again. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 556 |
Hi! One thing to look for: I've found that when the output file absent error occurs, it's a good idea to search the logfile for any reference to boinc_lockfile. Errors that refer to that file tend to cascade from one workunit to the next, at least with the older versions of BOINC, but not with some of the newer versions like the 6.10.18 I'm now using. They can also cascade to other BOINC projects that use a file with the same name, again for the older BOINC versions. |
Minardi Send message Joined: 19 Jan 10 Posts: 1 Credit: 1,117,527 RAC: 0 |
I have had several tasks stall out and stop using CPU over the past few days. I am finishing up my rosetta tasks, then taking this machine off the project. I was running an XP machine and had no problems. It died, and I replaced it with a W7 64-bit machine and some tasks started stalling out on me. In reviewing this thread, it appears there is a problem with mini Rosetta 2.05 running on W7 machines. |
Message boards :
Number crunching :
minirosetta 2.05
©2025 University of Washington
https://www.bakerlab.org