Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next
Author | Message |
---|---|
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,036,003 RAC: 7,522 |
Has anyone else gotten work units for Minirosetta 3.71 that are estimated to run 14 days? I'm running on an old (2009) Mac with 8GB of memory and lately I've gotten these here and there. Rosetta appear OK. I just set my Rosetta PREFERENCES: CPU TARGET RUNTIME = 14 hours, enabled Rosetta computing on one of my Linux 64-bit systems and Rosetta downloaded 50 14-hour jobs. I think the only difference in a default 6-hour job and a 14-hour job is what Rosetta sets in the "-cpu_run_time 21600" as a command line option. I don't think Rosetta jobs care what system they execute on .... MACOS, Windows, Linux86/64. I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached. NOTE: THIS is one reason why it is very, very tough to compare system performances. Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys. I would guess that your event log is showing some problem with DISK SPACE available for Rosetta. BOINC has 3 possible limits on disk and I always seem to hit them accidently: 1. maximum amount used 2. amount to leave free 3. maximum % of disk SAMPLE command line will only differ in the leading OS name and is added by the Rosetta server when it dispatches the job to a system. command: minirosetta_3.71_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip NTF2_215_N90N92K61_4_9_1_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 21600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3415526 |
James Adrian Send message Joined: 27 Apr 12 Posts: 5 Credit: 1,801,535 RAC: 0 |
rls5 thanks for all the info! I checked the logs and didn't find any errors and prefs show 6 hours as you mentioned below. If it happens again I'll wait for the 6 hour mark, just so I can see what happens. (:-)
|
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Not all. If you check any FFD_ tasks in your list you will see they generate many hundreds of models (I have several with over 1000 models generated). If memory serves, the 99 model limit was enacted when some tasks created output files too large to be uploaded. The limit only applies to a particular type of task. Others use the preferred cpu time plus 4 method to determine when to end things. When a model is completed the task calculates whether it has time left to complete another model. If the answer is no then the task wraps things up despite there appearing (to the cruncher) hours left. if the answer is yes the tasks will begin another model. All models aren't equal however, even within the same task so some will take longer than predicted. To insure that otherwise good models aren't cut short just before completing (and to increase the odds that the task will complete at least one model) the task will continue past the preferred cpu time. At some point though, you gotta cut your losses and so at preferred cpu time plus 4 hours the watchdog cuts bait and the task goes home. ( I'm curious about the average overtime; my totally uninformed guess is that it's less than an hour.) There are other types of tasks in which filters are employed to cut off models early. If the model passes the filter it will continue working on that one task to the end. This results in dramatically disparate counts, with one task generating hundreds of models while another task from the same batch only generating one, two, five, etc. Recently on ralph a filter was used to remove models resulting in a file transfer error upon upload. The stderr out listed 13 models from 2 attempts but since the models had been erased the file meant to contain them didn't exist. I'm guessing, based on DEK's post, which I may well have misinterpreted, that the server, possibly as part of a validation check, automatically gives the file transfer error (client error, compute error) when this particular file isn't part of the upload. All these different strategies result, from the cruncher's point of view, in varied behavior which we struggle to interpret. Is it a problem with my computer or a problem with rosetta? Is it a problem at all? BOINC is complicated enough for the computer savvy, much more so for majority of crunchers who just want to maximize their participation in rosetta and end up massively tangled up in the BOINC settings. The variety of legitimate behaviors exhibited by rosetta tasks trips up the volunteers trying to help them become untangled. From the researcher' point of view everything may look fine, working as expected, and any issues a lone cruncher is having is most likely due to their particular set up. And it probably is, but the lack of information leaves the volunteers flailing. I have long wished for a reference, a database of tasks, in which the tasks are divided into broad categories of strategies employed (as above, which some info on how they "look " to the crunchers) and what, in a most basic way, is being asked (how does this particular protein fold, how do these two proteins interact, can we create a new protein to do x, etc.) Best, Snags |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Thanks for the report you guys! I'm responsible for the *MAP* jobs. I'm getting 90% success, which is "normal", but if it turns out that part of the 10% that fail are coming from mac(s), we could fix this! I'll do some local tests on my mac. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi krypton. I've had 9 of your tasks fail on one rig just today so far, all with the same error like so. https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1622019 P76481_PF12034_90-575_300-486_EN_MAP_hyb_cst_v02_i01_t000__krypton_SAVE_ALL_OUT_03_09_341621_123_0 <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> (EDITED OUT THE REST) Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_f513f38.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/P76481_PF12034_90-575_300-486_EN_MAP_hyb_cst_v02_i01_t000__krypton.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 SIGSEGV: segmentation violation Stack trace (22 frames): [0xd98d38f] [0xb7766404] [0xb6d9849] [0xb8ef314] [0xb8f1a90] [0xb8f4b33] [0xb90ae55] [0xb7ecda9] [0xb8cebea] [0xc2ff844] [0xc31427f] [0xabe3c0b] [0x8d92b93] [0xb04b065] [0xb05021c] [0xb0f6a35] [0xb0f959e] [0xb1b8bc3] [0xb1b524d] [0x8057071] [0xda24988] [0x8048131] Exiting... </stderr_txt> |
Cap Send message Joined: 29 Aug 11 Posts: 3 Credit: 7,112,836 RAC: 0 |
I have had several tasks fail and some of them fail to exit leaving a boinc slot in use but no processing being done. Those I had to force quit. They all have an error message from malloc saying that a free was done on a block that was not allocated or a block was corrupted after being freed. Seems that the app is using a block after it has been freed. I don't know why boinc isn't cleaning up after some these. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
The error turned out to be related to a new rotamer library we are using, which I happened to enable for the *MAP* jobs. I confirmed on my mac, appears to only happen on macs (and some older linux machines). I currently have no more jobs in the queue. For all future jobs I'll be reverting to the older rotamer library until the error is fixed! Thanks for the examples, it was helpful in debugging. Update: I just submitted a new batch of jobs *REDO_MAP*, if you get any errors from these, please report! Thanks, -krypton |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I just updated the minirosetta app to 3.73. This version includes new protocols, including the remodel protocol for design, and various bug fixes. It uses the latest Rosetta source. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,602,547 RAC: 8,833 |
|
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Forgot to mention, I added a project specific option for the black screen (like before). |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Mark Kramer Send message Joined: 25 Jun 10 Posts: 5 Credit: 74,534 RAC: 0 |
Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences. Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,277,903 RAC: 1,635 |
Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences. Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. Rosetta Mini for Android is not available for your type of computer. Do Network Communication successfully reports the task. I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer |
LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0 |
Same problem still exists today. It happened to two of my machines, running 32 and 64 bit Linux 3.16.0-4. Forcing an update reports/fetches jobs and clears the 24 hour wait time, and reports normally for about 12 hours or so, when the cycle repeats. |
Mark Kramer Send message Joined: 25 Jun 10 Posts: 5 Credit: 74,534 RAC: 0 |
Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences. It does but, as it's a 9600GT, the highest driver is a 340.22. As I was running 337.88, I updated and then tried again. It has now outright crashed twice when I was running other programs. (Starcraft 2 when I was redoing graphics settings and SWTOR just now.) Because both crashes completely locked up the system to the point of needing a reboot, I couldn't check task manager to see if Minirosetta had started itself again. Reviewing the logs under admin tools didn't show me anything either. I've uninstalled BOINC again and I'm just going to run this computer as normal for the next two days. If it crashes again during that time, then I'll know that it's something else wrong with the computer. If it doesn't, then I'm probably going to lean towards it being an XP/older graphics card conflict with BOINC. |
Mark Kramer Send message Joined: 25 Jun 10 Posts: 5 Credit: 74,534 RAC: 0 |
Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences. Follow-up: The outright crashes seem to have been caused by the driver upgrade so I reverted it back to 337.88. That doesn't resolve the problem with minirosetta locking up or starting despite preferences but I know it wasn't causing outright crashes. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be: ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884 [0x4485e82] Sample tasks: 815715361 815591894 Boinc 7.2.42 Ubuntu 14.04 |
tortuga1 Send message Joined: 16 Oct 08 Posts: 1 Credit: 734,150 RAC: 0 |
I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be: |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org