Posts by Hank Barta

1) Message boards : Number crunching : Rosetta diskcheck space (Message 79544)
Posted 15 Feb 2016 by Hank Barta
Post:
Yes that was it. I increased that parameter to 5.0 and am getting work units again.

Many thanks for your help.
2) Message boards : Number crunching : Rosetta diskcheck space (Message 79534)
Posted 14 Feb 2016 by Hank Barta
Post:
BOINC Manager has some settings which allow you to limit the amount of disk it consumes. So your description that space is available, and the error you are seeing makes me tend to believe you will need to change your BOINC settings.

Hi Mod.Sense,
Thanks for the reply and the pointer to the text config file. Here is the content of /etc/boinc-client/global_prefs_override.xml:

<global_preferences>
   <run_on_batteries>1</run_on_batteries>
   <run_if_user_active>1</run_if_user_active>
   <run_gpu_if_user_active>0</run_gpu_if_user_active>
   <idle_time_to_run>3.000000</idle_time_to_run>
   <suspend_cpu_usage>0.000000</suspend_cpu_usage>
   <start_hour>0.000000</start_hour>
   <end_hour>0.000000</end_hour>
   <net_start_hour>0.000000</net_start_hour>
   <net_end_hour>0.000000</net_end_hour>
   <leave_apps_in_memory>0</leave_apps_in_memory>
   <confirm_before_connecting>0</confirm_before_connecting>
   <hangup_if_dialed>0</hangup_if_dialed>
   <dont_verify_images>1</dont_verify_images>
   <work_buf_min_days>0.000000</work_buf_min_days>
   <work_buf_additional_days>3.000000</work_buf_additional_days>
   <max_ncpus_pct>100.000000</max_ncpus_pct>
   <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes>
[b]   <disk_interval>60.000000</disk_interval>
   <disk_max_used_gb>1.000000</disk_max_used_gb>
   <disk_max_used_pct>80.000000</disk_max_used_pct>
   <disk_min_free_gb>0.100000</disk_min_free_gb>[/b]
   <vm_max_used_pct>75.000000</vm_max_used_pct>
   <ram_max_used_busy_pct>55.000000</ram_max_used_busy_pct>
   <ram_max_used_idle_pct>100.000000</ram_max_used_idle_pct>
   <max_bytes_sec_up>30003.200000</max_bytes_sec_up>
   <max_bytes_sec_down>499998.720000</max_bytes_sec_down>
   <cpu_usage_limit>100.000000</cpu_usage_limit>
   <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
   <daily_xfer_period_days>0</daily_xfer_period_days>
</global_preferences>


The text settings match what I see in the GUI in Tools -> Computing preferences... -> disk and memory usage.

Is there something in my settings that causes this problem? I hadn't changed them before I noticed this (and they were probably at default values.) I think I have modified them since to see if that helped with this problem and it did not.

Thanks!
3) Message boards : Number crunching : Rosetta diskcheck space (Message 79524)
Posted 12 Feb 2016 by Hank Barta
Post:
I'm currently not getting any Rosetta work units. I see a message in my logs that states:

Tue 09 Feb 2016 07:49:02 PM CST | rosetta@home | Rosetta Mini needs 233.78MB more disk space.  You currently have 719.89 MB available and it needs 953.67 MB.


And disk available in the /var/lib directory (where AFAIK Rosetta and other BOINC projects store data files)

hbarta@yggdrasil /var/lib/boinc-client $ df -H .
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/6a4d9d0c-ff6f-4cbe-8aa2-e2b126e87b4e   37G   15G   20G  43% /
hbarta@yggdrasil /var/lib/boinc-client $ 


With 20GB available I do not see where BOINC is getting it's disk space indication.

This is on Linux Mint and Boinc identifies itself as
Sun 31 Jan 2016 09:30:42 PM CST |  | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu


Is this a Rosetta or a BOINC problem?

Thanks!
4) Message boards : Number crunching : Big WUs, tiny credit (Message 70391)
Posted 25 May 2011 by Hank Barta
Post:
I have recently received a number of WUs that take about 7 hours and the resulting credit granted is about 1/10 of the claimed credit. What is going on here? Is there a problem with my system? These seem to comprise about 10% of the work load on this host and do not appear on any of my other hosts (which are all of lesser CPU horsepower.)

Here are a couple recent samples:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=387381739
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=387381738


Ordinarily this host is granted about 2/3 of claimed credit but that also seems to be down to about half. :(
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 70164)
Posted 29 Apr 2011 by Hank Barta
Post:
I ask that you alert us to new issues in this thread so that we can find them more easily.


Thank you for helping to deal with these. This morning I have seen a number of errors with a different signature. These run for 3-5 minutes before producing an error and exiting. The characteristic error seems to be:

ERROR: ct == final_atoms

An example is http://boinc.bakerlab.org/rosetta/workunit.php?wuid=382081360

thanks,
hank
6) Message boards : Number crunching : Mod.Sense (Message 70100)
Posted 23 Apr 2011 by Hank Barta
Post:
Hi Chris,
The lack of attention to this issue is highly irritating and I don't blame you at all for your decision to stop processing Rosetta. Here's what keeps me in:

1) I'm on the Overclockers team. We're presently #1 on RAC and trying to achieve #15 in total credits. And I'm helping.

2) The bad units are just a fraction of the total. As near as I can tell most of the WUs produce good results. The bad ones just add a bit to my bandwidth usage and a few seconds of compute time. Their actual impact is negligible for me. (I can see where it could be different for others.)

I hope the lack of attention to these details does not reflect the overall tenor of the project. It would be disappointing to know that all of the CPU cycles we've been donating are being squandered because the science end is handled similarly to these IT issues.

As others have mentioned, there are other DC projects that you could work on. I've tasked a video card with looking for messages for aliens and my PS3 is Folding@Home when I'm not watching movies or playing games.

At times looking for alien communications seems silly compared to something like F@H or Rosetta which are both more likely to produce tangible results. However if we ever do hear from ET, the results could be (hopefully not literally) earth shattering.
7) Message boards : Number crunching : Compute error (Message 70002)
Posted 9 Apr 2011 by Hank Barta
Post:
I suppose that when there are problems, the folks whose job it is to solve them spend time on that rather than getting into potentially endless conversations on the forum. ;)

I'm happy to know that they monitor the compute errors and will soon become aware of problems.

A new one I just saw is:


Setting up graphics native ...
Setting up folding (abrelax) ...
std::cerr: Exception was thrown:
Atom HE1 90 not found

(See http://boinc.bakerlab.org/rosetta/result.php?resultid=412361359 for further info.)

on Ross3X3_SAVE_ALL_OUT_k034_CS_frag_NOE_cst_005_23917_1066_0
8) Message boards : Number crunching : [rosetta@home] Message from server: Invalid or missing account key (Message 69860)
Posted 18 Mar 2011 by Hank Barta
Post:
Thanks for the help. That worked and the page with all of the command options is pretty useful.
9) Message boards : Number crunching : [rosetta@home] Message from server: Invalid or missing account key (Message 69854)
Posted 18 Mar 2011 by Hank Barta
Post:
That's as far as I've gotten trying to run BOINC/Rosetta on a system w/out a GUI. As near as I can tell, I need to execute:
[b]boinccmd --project_attach  http://boinc.bakerlab.org/rosetta/  "Hank Barta" password[/b]
in order to attach. The syntax indicated for this command is:
[b] --project_attach URL auth          attach to project[/b]


I have apparently not guessed the correct format for "auth" and any help would be appreciated.

thanks,
hank
10) Message boards : Number crunching : minirosetta_2.1 tasks not exiting (Message 69818)
Posted 14 Mar 2011 by Hank Barta
Post:
Any particular reason for using such an old version of minirosetta?

I currently have a few workunits using 2.17.

That's a red herring. The actual executable is minirosetta_2.17_x86_64-pc-linux-gnu.

To the original question, I've seen this on other systems and happening for the first WU so I presume it indicates that there are multiple threads for each Rosetta process that I have not noticed before.

thanks,
hank
11) Message boards : Number crunching : minirosetta_2.1 tasks not exiting (Message 69797)
Posted 12 Mar 2011 by Hank Barta
Post:
I have a bunch of extra minirosetta_2.1 tasks that have not exited. They're just hanging around and using RAM:
hbarta@oak:~$ ps -el|grep rosetta
0 R   119 10056  1315 96  99   - - 123959 ?     ?        03:53:18 minirosetta_2.1
1 S   119 10057 10056  0  99   - - 123959 poll_s ?       00:00:00 minirosetta_2.1
1 S   119 10058 10057  0  99   - - 123959 hrtime ?       00:00:01 minirosetta_2.1
1 S   119 10059 10057  0  99   - - 123959 hrtime ?       00:00:00 minirosetta_2.1
0 R   119 10078  1315 95  99   - - 95345 ?      ?        02:56:14 minirosetta_2.1
1 S   119 10079 10078  0  99   - - 95345 poll_s ?        00:00:00 minirosetta_2.1
1 S   119 10080 10079  0  99   - - 95345 hrtime ?        00:00:01 minirosetta_2.1
1 S   119 10081 10079  0  99   - - 95345 hrtime ?        00:00:00 minirosetta_2.1
0 R   119 10102  1315 96  99   - - 90096 ?      ?        02:04:05 minirosetta_2.1
1 S   119 10103 10102  0  99   - - 90096 poll_s ?        00:00:00 minirosetta_2.1
1 S   119 10104 10103  0  99   - - 90096 hrtime ?        00:00:00 minirosetta_2.1
1 S   119 10105 10103  0  99   - - 90096 hrtime ?        00:00:00 minirosetta_2.1
0 R   119 10150  1315 86  99   - - 87666 ?      ?        00:30:51 minirosetta_2.1
1 S   119 10151 10150  0  99   - - 87666 poll_s ?        00:00:00 minirosetta_2.1
1 S   119 10152 10151  0  99   - - 87666 hrtime ?        00:00:00 minirosetta_2.1
1 S   119 10153 10151  0  99   - - 87666 hrtime ?        00:00:00 minirosetta_2.1
hbarta@oak:~$ 


Boinc Manager only shows active and ready to run tasks at the moment so these must have already been reported. (There are 7 ready to run and 12 left over tasks so these must hve completed and been reported.)

Linux - Ubuntu 10.04 LTS
hbarta@oak:~$ uname -a
Linux oak 2.6.32-29-generic #58-Ubuntu SMP Fri Feb 11 20:52:10 UTC 2011 x86_64 GNU/Linux

Boinc Manager 6.10.17 (Which comes from Ubuntu's repo.)

Suspend/resume does not clear these. "Leave applications in memory while suspended" is not checked.

Is this the right place to report this sort of behavior?

Is there any more information I need to provide?

12) Message boards : Number crunching : Problems concerning Ferredoxin-Workunits ? (Message 69788)
Posted 11 Mar 2011 by Hank Barta
Post:
I haven't seen any more of this problem since yesterday so it appears to be resolved.
13) Message boards : Number crunching : Problems concerning Ferredoxin-Workunits ? (Message 69773)
Posted 9 Mar 2011 by Hank Barta
Post:
Likewise, including 9 of 24 complete work units on a machine I set up yesterday. :(

Most recent is:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=370266669

thanks,
hank
14) Message boards : Number crunching : minirosetta 2.17 (Message 69772)
Posted 9 Mar 2011 by Hank Barta
Post:

Same here on Linux.

Also on Linux, and also apparently the same problem.

Out of 24 complete work units since I set up BOINC/Rosetta yesterday, 9 have finished with the compute error "ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context"

Can I presume that this represents a bug in the S/W or should I be looking for H/W problems?

the latest result is: http://boinc.bakerlab.org/rosetta/result.php?resultid=405000927

(Interesting to note that two other machines I have crunching have not encountered compute errors. However they have faster processors so they may be getting different types of work units.)

thanks,
hank






©2024 University of Washington
https://www.bakerlab.org