Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 311 · Next
Author | Message |
---|---|
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
Anyone else having trouble getting new work? As of this time stamp? ~16:00 left coast time? Just replaced a board, can't get any work. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
Downloading new work now at 17:30. Some people are so nervous ..... lol |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
As of 20:35 PDT (West Coast USA), no work in last hour or so. S@H down for maintenance and Rosetta being asked for work, but none forthcoming. |
Don Send message Joined: 22 Aug 17 Posts: 3 Credit: 2,325,443 RAC: 0 |
No new work for me either. 5793 Rosetta@home 3/28/2018 12:35:43 AM Sending scheduler request: To fetch work. 5794 Rosetta@home 3/28/2018 12:35:43 AM Requesting new tasks for CPU 5795 Rosetta@home 3/28/2018 12:35:44 AM Scheduler request completed: got 0 new tasks 5796 Rosetta@home 3/28/2018 12:35:44 AM No tasks sent |
Darrell Send message Joined: 28 Sep 06 Posts: 25 Credit: 51,934,631 RAC: 0 |
I just lost a few more Rosetta 4.07 after they clogged my 8GB RAM computer, then each wanted more RAM. When are the estimates going to get better so as to avoid all the wasted crunching? I have set ALL my 8GB computers to only run a single 4.07 task, so I would rather waste unused crunch time than used crunch time (and electricity). |
newman Send message Joined: 18 Mar 10 Posts: 1 Credit: 584,269 RAC: 0 |
All my Rosetta 4.07 WUs error out after some seconds under Ubuntu 18.04. Mini Rosetta is working fine. Error code: <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -ignore_unrecognized_res 1 -abinitio::fastrelax 1 -ex2aro 1 -abinitio::use_filters false -out:file:silent default.out -abinitio::rsd_wt_loop 0.5 -beta 1 -abinitio::detect_disulfide_before_relax 1 -relax::minimize_bond_angles 1 -in:file:native 00001.pdb -silent_gz 1 -abinitio::rsd_wt_helix 0.5 -relax::default_repeats 15 -beta_cart 1 -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::rg_reweight 0.5 -abinitio::increase_cycles 10 -out:file:silent_struct_type binary -ex1 1 -optimization::default_max_cycles 200 -relax::dualspace 1 -in:file:boinc_wu_zip NTF2chip_8437_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2141443 rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed. SIGABRT: abort called Kind regards, Marcus |
mmonnin Send message Joined: 2 Jun 16 Posts: 61 Credit: 25,390,629 RAC: 13,030 |
Most of mine do as well. Same OS. There are a few that do complete but over half have this error. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The first Rosetta 4.07 that I ran on a new Ubuntu 18.04 machine (i7-8700) errored out immediately also. https://boinc.bakerlab.org/result.php?resultid=997076322 The 3.78 are running fine. |
Chris Jenks Send message Joined: 16 Jun 06 Posts: 2 Credit: 4,471,026 RAC: 349 |
I have recently started running Rosetta on two cell phones using the Android version of BOINC. I thought things were working well but started noticing an excessive number of newly started tasks and started keeping track. What I am finding is that the "Elapsed time" for the tasks, and the percent complete, randomly decreases for the tasks, causing them to take much longer to finish than I would expect half way through. For example, task ab_12_01__vall_2011_1pgxA_vall_2011_9mers_3mers_535141_18556_0 currently shows elapsed time of 4:25:52 and % complete of 80.8%, but an hour ago this same task was shown with an elapsed time of 4:29 and 80.8% complete - it went backwards despite an hour of work. Is there anything I can do besides finding another project? Edit: I just noticed in the event log that the computation was suspended due to being on battery. My phone charges using Qi wireless, which causes it to cycle between 100% and 99% annoyingly. The suspension this cycle causes may be causing data loss. I should also mention that I greedily ran all four cores - maybe fewer will fix the errors. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
No tasks sent today 23 May ? I have been getting both 3.78 and 4.07 all morning (Windows version). |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,229,863 RAC: 2,766 |
Android. Around midday today, 27 May again no work units. Check log and it states received 0. Your profile shows that you have no computers assigned to your account. I wouldn't worry about receiving WU until the computer shows up in your profile. Have you ever received any WU? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,229,863 RAC: 2,766 |
Yes. I have received and crunched heaps of WU. All my BOINC score is from Rosetta. I assume BAM does not know how to associate an Android host. You post AUTHOR information shows zero credits. Maybe you have 2 accounts. If not, there seems to be some problems. Joined: 27 Mar 18 Posts: 4 Credit: 0 RAC: 0 Your USER PROFILE also shows zero. Peter DM User ID 1991524 Rosetta@home member since 27 Mar 2018 Country Australia Total credit 0 Recent average credit 0.00 Computers View Team None Message boards 4 posts |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I agree with RJS5. If your acct was just started on 27th, how could you have crunched any WUs at all? You MUST have a duplicate or separate acct as well. Do you have other computers/devices also crunching Rosetta? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
I agree with RJS5. If your acct was just started on 27th, how could you have crunched any WUs at all? You MUST have a duplicate or separate acct as well. Do you have other computers/devices also crunching Rosetta? Did you notice the 27th of what month? |
Cheech Send message Joined: 20 Nov 05 Posts: 1 Credit: 1,134,996 RAC: 0 |
Sorry if this has been asked before. I did searches but couldn't find anything. I contributed to Rosetta@home a few years ago, then I was unable to help. Now I am again able to help. Here is my problem. When I run Rosetta 4.07 tasks, they process fine. However when I get Rosetta Mini 3.78 tasks, they never run to the end. They always hang usually with less than 10% completed. My machine is a Windows 10 running on AMD A10-5750M processor. Any help is appreciated. -- Cheech |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
At this point, the only v3.78 tasks I see say they were aborted by the user. Once you have a failure like that and the task is reported, then the WU shows the error logs of the run, which sometimes gives a clue as to the issue. Rosetta Moderator: Mod.Sense |
Fynjy Send message Joined: 18 Sep 06 Posts: 8 Credit: 10,762,260 RAC: 0 |
Hi! There is something strange with credits per task: 1011055107 911022117 3421373 29 Jun 2018, 21:29:21 UTC 5 Jul 2018, 7:57:22 UTC Завершён и проверен 86,066.99 85,311.74 153.97 Rosetta v4.07 i686-pc-linux-gnu 1011055621 911022559 3421373 29 Jun 2018, 21:29:21 UTC 2 Jul 2018, 0:15:13 UTC Завершён и проверен 85,928.51 85,169.40 924.92 Rosetta v4.07 i686-pc-linux-gnu I'v recived 153.97 in one case and 924.92 in another on the same CPU for 1 day calculation. Any guess? Help people! Join TSC!Russia! |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I'v recived 153.97 in one case and 924.92 in another on the same CPU for 1 day calculation. Any guess? I see that all the time. At first, I thought it was due to work unit memory size, but further investigation shows that is not the cause (or not the only cause). It has more to do with how many work units you run; it helps to leave a couple of cores free. But even that is not a complete cure; sometimes it works, and sometimes it doesn't. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=12544 I think that Windows is more consistent than Linux, and sometimes I do better with an Ivy Bridge CPU than with a Haswell or even a Coffee Lake, but that is not guaranteed either. I am fairly certain that Intel does better than AMD (Ryzen 1700) though, but that raises other issues as I get errors on the AMD that I don't get on Intel chips at all. Here are my most recent tests on six cores of an i7-4770 running Ubuntu 16.04. The other two cores are left free, and it is a dedicated machine with no other work running. They start out well, but then go downhill. https://boinc.bakerlab.org/rosetta/results.php?hostid=3416394&offset=0&show_names=0&state=4&appid= One problem is that I don't know if it is a real effect, or just an artifact of the BOINC credit system. I will probably put that machine on another project where I know that it is working well. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
If running fewer concurrent tasks improves credit per minute, it would imply either memory contention, or L2 cache contention. If all of the cores are operating in the same L2 cache, then you can see how that would become the constrained resource. I don't mean to say there is anything wrong with a given computer, just that R@h is very memory intensive. Also others have found that machines with larger L2 caches seem to yield more credit per FLOPS rating per runtime minute. Rosetta Moderator: Mod.Sense |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,229,863 RAC: 2,766 |
If running fewer concurrent tasks improves credit per minute, it would imply either memory contention, or L2 cache contention. If all of the cores are operating in the same L2 cache, then you can see how that would become the constrained resource. I don't mean to say there is anything wrong with a given computer, just that R@h is very memory intensive. Also others have found that machines with larger L2 caches seem to yield more credit per FLOPS rating per runtime minute. I found 2 of his WU on one machine that had 153 and 863 credits. Task 1011055293 and 1011055107 They processed 51 decoys and 61 decoys from the same number of attempts. I compared the command line options and there was only one different on the higher credit run: relax::minimize_bond_lengths 1 IF they ran the same amount of time and the lower credit WU did 5/6 as many decoys, does a CREDIT of 17% as much seem reasonable? BTW, there is NEAR ZERO Floating point operations in Rosetta code. Credit 153.97 for 51 decoys from 51 attempts Peak working set size 660.07 MB Peak swap size 725.63 MB Peak disk usage 557.30 MB Credit 863.87 for 61 decoys from 61 attempts Peak working set size 500.61 MB Peak swap size 567.88 MB Peak disk usage 539.84 MB The two runs differ in the following command line options. They differ in the input file (153.97 vs 863.87). in:file:boinc_wu_zip DRH_curve_X_h22_l3_h16_l2_01200_1_loop_10_0001_one_capped_0001_fragments_data.zip in:file:boinc_wu_zip PH180629_9_data.zip jran 3552117 jran 3431960 relax::default_repeats 2 relax::default_repeats 15 -------------------------- relax::minimize_bond_lengths 1 |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org