Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 309 · Next

AuthorMessage
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 88573 - Posted: 27 Mar 2018, 23:53:43 UTC

Anyone else having trouble getting new work? As of this time stamp? ~16:00 left coast time?
Just replaced a board, can't get any work.
ID: 88573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 88574 - Posted: 28 Mar 2018, 0:42:57 UTC - in response to Message 88573.  

Downloading new work now at 17:30. Some people are so nervous ..... lol
ID: 88574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88575 - Posted: 28 Mar 2018, 3:40:12 UTC

As of 20:35 PDT (West Coast USA), no work in last hour or so. S@H down for maintenance and Rosetta being asked for work, but none forthcoming.
ID: 88575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Don

Send message
Joined: 22 Aug 17
Posts: 3
Credit: 2,325,443
RAC: 0
Message 88576 - Posted: 28 Mar 2018, 4:25:21 UTC
Last modified: 28 Mar 2018, 4:41:07 UTC

No new work for me either.

5793	Rosetta@home	3/28/2018 12:35:43 AM	Sending scheduler request: To fetch work.	
5794	Rosetta@home	3/28/2018 12:35:43 AM	Requesting new tasks for CPU	
5795	Rosetta@home	3/28/2018 12:35:44 AM	Scheduler request completed: got 0 new tasks	
5796	Rosetta@home	3/28/2018 12:35:44 AM	No tasks sent
ID: 88576 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Darrell

Send message
Joined: 28 Sep 06
Posts: 25
Credit: 51,934,631
RAC: 0
Message 88741 - Posted: 23 Apr 2018, 9:42:41 UTC

I just lost a few more Rosetta 4.07 after they clogged my 8GB RAM computer, then each wanted more RAM. When are the estimates going to get better so as to avoid all the wasted crunching?

I have set ALL my 8GB computers to only run a single 4.07 task, so I would rather waste unused crunch time than used crunch time (and electricity).
ID: 88741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
newman

Send message
Joined: 18 Mar 10
Posts: 1
Credit: 584,269
RAC: 0
Message 88854 - Posted: 11 May 2018, 12:02:10 UTC

All my Rosetta 4.07 WUs error out after some seconds under Ubuntu 18.04. Mini Rosetta is working fine.

Error code:
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -ignore_unrecognized_res 1 -abinitio::fastrelax 1 -ex2aro 1 -abinitio::use_filters false -out:file:silent default.out -abinitio::rsd_wt_loop 0.5 -beta 1 -abinitio::detect_disulfide_before_relax 1 -relax::minimize_bond_angles 1 -in:file:native 00001.pdb -silent_gz 1 -abinitio::rsd_wt_helix 0.5 -relax::default_repeats 15 -beta_cart 1 -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::rg_reweight 0.5 -abinitio::increase_cycles 10 -out:file:silent_struct_type binary -ex1 1 -optimization::default_max_cycles 200 -relax::dualspace 1 -in:file:boinc_wu_zip NTF2chip_8437_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2141443
rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
SIGABRT: abort called


Kind regards,
Marcus
ID: 88854 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 61
Credit: 25,390,629
RAC: 31,778
Message 88856 - Posted: 11 May 2018, 12:38:31 UTC

Most of mine do as well. Same OS. There are a few that do complete but over half have this error.
ID: 88856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88857 - Posted: 11 May 2018, 15:16:19 UTC

The first Rosetta 4.07 that I ran on a new Ubuntu 18.04 machine (i7-8700) errored out immediately also.
https://boinc.bakerlab.org/result.php?resultid=997076322

The 3.78 are running fine.
ID: 88857 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chris Jenks

Send message
Joined: 16 Jun 06
Posts: 2
Credit: 4,471,026
RAC: 852
Message 88911 - Posted: 16 May 2018, 0:20:00 UTC
Last modified: 16 May 2018, 0:51:23 UTC

I have recently started running Rosetta on two cell phones using the Android version of BOINC. I thought things were working well but started noticing an excessive number of newly started tasks and started keeping track. What I am finding is that the "Elapsed time" for the tasks, and the percent complete, randomly decreases for the tasks, causing them to take much longer to finish than I would expect half way through. For example, task ab_12_01__vall_2011_1pgxA_vall_2011_9mers_3mers_535141_18556_0 currently shows elapsed time of 4:25:52 and % complete of 80.8%, but an hour ago this same task was shown with an elapsed time of 4:29 and 80.8% complete - it went backwards despite an hour of work. Is there anything I can do besides finding another project?

Edit: I just noticed in the event log that the computation was suspended due to being on battery. My phone charges using Qi wireless, which causes it to cycle between 100% and 99% annoyingly. The suspension this cycle causes may be causing data loss. I should also mention that I greedily ran all four cores - maybe fewer will fix the errors.
ID: 88911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88980 - Posted: 23 May 2018, 11:25:05 UTC - in response to Message 88979.  

No tasks sent today 23 May ?

I have been getting both 3.78 and 4.07 all morning (Windows version).
ID: 88980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,229,863
RAC: 6,747
Message 89004 - Posted: 27 May 2018, 17:39:17 UTC - in response to Message 89001.  

Android. Around midday today, 27 May again no work units. Check log and it states received 0.
Log also says requesting work for Mali-880 the GPU in my phone. I wonder if this is why I'm getting no work ?
I have changed nothing and assume it was always asking for GPU work units.
Any ideas ?


Your profile shows that you have no computers assigned to your account.
I wouldn't worry about receiving WU until the computer shows up in your profile.

Have you ever received any WU?
ID: 89004 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,229,863
RAC: 6,747
Message 89009 - Posted: 28 May 2018, 1:48:19 UTC - in response to Message 89008.  

Yes. I have received and crunched heaps of WU. All my BOINC score is from Rosetta. I assume BAM does not know how to associate an Android host.
Today I received 2 WU which will not last long.



You post AUTHOR information shows zero credits. Maybe you have 2 accounts. If not, there seems to be some problems.

Joined: 27 Mar 18
Posts: 4
Credit: 0
RAC: 0


Your USER PROFILE also shows zero.

Peter DM
User ID 1991524
Rosetta@home member since 27 Mar 2018
Country Australia
Total credit 0
Recent average credit 0.00
Computers View

Team None
Message boards 4 posts
ID: 89009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 89013 - Posted: 28 May 2018, 8:18:35 UTC - in response to Message 89010.  

I agree with RJS5. If your acct was just started on 27th, how could you have crunched any WUs at all? You MUST have a duplicate or separate acct as well. Do you have other computers/devices also crunching Rosetta?
ID: 89013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 89053 - Posted: 4 Jun 2018, 3:41:11 UTC - in response to Message 89013.  

I agree with RJS5. If your acct was just started on 27th, how could you have crunched any WUs at all? You MUST have a duplicate or separate acct as well. Do you have other computers/devices also crunching Rosetta?

Did you notice the 27th of what month?
ID: 89053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Cheech

Send message
Joined: 20 Nov 05
Posts: 1
Credit: 1,134,996
RAC: 0
Message 89205 - Posted: 3 Jul 2018, 2:36:03 UTC

Sorry if this has been asked before. I did searches but couldn't find anything.
I contributed to Rosetta@home a few years ago, then I was unable to help. Now I am again able to help. Here is my problem. When I run Rosetta 4.07 tasks, they process fine. However when I get Rosetta Mini 3.78 tasks, they never run to the end. They always hang usually with less than 10% completed. My machine is a Windows 10 running on AMD A10-5750M processor. Any help is appreciated.

-- Cheech
ID: 89205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 89210 - Posted: 3 Jul 2018, 18:17:07 UTC - in response to Message 89205.  

At this point, the only v3.78 tasks I see say they were aborted by the user. Once you have a failure like that and the task is reported, then the WU shows the error logs of the run, which sometimes gives a clue as to the issue.
Rosetta Moderator: Mod.Sense
ID: 89210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fynjy

Send message
Joined: 18 Sep 06
Posts: 8
Credit: 10,762,260
RAC: 0
Message 89217 - Posted: 5 Jul 2018, 10:17:28 UTC

Hi! There is something strange with credits per task:

1011055107	911022117	3421373	29 Jun 2018, 21:29:21 UTC	5 Jul 2018, 7:57:22 UTC	Завершён и проверен	86,066.99	85,311.74	153.97	Rosetta v4.07 
i686-pc-linux-gnu
1011055621	911022559	3421373	29 Jun 2018, 21:29:21 UTC	2 Jul 2018, 0:15:13 UTC	Завершён и проверен	85,928.51	85,169.40	924.92	Rosetta v4.07 
i686-pc-linux-gnu


I'v recived 153.97 in one case and 924.92 in another on the same CPU for 1 day calculation. Any guess?
Help people! Join TSC!Russia!
ID: 89217 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89218 - Posted: 5 Jul 2018, 12:53:27 UTC - in response to Message 89217.  

I'v recived 153.97 in one case and 924.92 in another on the same CPU for 1 day calculation. Any guess?

I see that all the time. At first, I thought it was due to work unit memory size, but further investigation shows that is not the cause (or not the only cause). It has more to do with how many work units you run; it helps to leave a couple of cores free. But even that is not a complete cure; sometimes it works, and sometimes it doesn't.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=12544

I think that Windows is more consistent than Linux, and sometimes I do better with an Ivy Bridge CPU than with a Haswell or even a Coffee Lake, but that is not guaranteed either. I am fairly certain that Intel does better than AMD (Ryzen 1700) though, but that raises other issues as I get errors on the AMD that I don't get on Intel chips at all.

Here are my most recent tests on six cores of an i7-4770 running Ubuntu 16.04. The other two cores are left free, and it is a dedicated machine with no other work running. They start out well, but then go downhill.
https://boinc.bakerlab.org/rosetta/results.php?hostid=3416394&offset=0&show_names=0&state=4&appid=

One problem is that I don't know if it is a real effect, or just an artifact of the BOINC credit system.
I will probably put that machine on another project where I know that it is working well.
ID: 89218 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 89220 - Posted: 6 Jul 2018, 1:54:55 UTC - in response to Message 89218.  

If running fewer concurrent tasks improves credit per minute, it would imply either memory contention, or L2 cache contention. If all of the cores are operating in the same L2 cache, then you can see how that would become the constrained resource. I don't mean to say there is anything wrong with a given computer, just that R@h is very memory intensive. Also others have found that machines with larger L2 caches seem to yield more credit per FLOPS rating per runtime minute.
Rosetta Moderator: Mod.Sense
ID: 89220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,229,863
RAC: 6,747
Message 89225 - Posted: 6 Jul 2018, 15:32:06 UTC - in response to Message 89220.  
Last modified: 6 Jul 2018, 15:59:44 UTC

If running fewer concurrent tasks improves credit per minute, it would imply either memory contention, or L2 cache contention. If all of the cores are operating in the same L2 cache, then you can see how that would become the constrained resource. I don't mean to say there is anything wrong with a given computer, just that R@h is very memory intensive. Also others have found that machines with larger L2 caches seem to yield more credit per FLOPS rating per runtime minute.


I found 2 of his WU on one machine that had 153 and 863 credits.
Task 1011055293 and 1011055107
They processed 51 decoys and 61 decoys from the same number of attempts.
I compared the command line options and there was only one different on the higher credit run:
relax::minimize_bond_lengths 1

IF they ran the same amount of time and the lower credit WU did 5/6 as many decoys, does a CREDIT of 17% as much seem reasonable?

BTW, there is NEAR ZERO Floating point operations in Rosetta code.


Credit 153.97 for 51 decoys from 51 attempts
Peak working set size 660.07 MB
Peak swap size 725.63 MB
Peak disk usage 557.30 MB

Credit 863.87 for 61 decoys from 61 attempts
Peak working set size 500.61 MB
Peak swap size 567.88 MB
Peak disk usage 539.84 MB

The two runs differ in the following command line options.

They differ in the input file (153.97 vs 863.87).
in:file:boinc_wu_zip DRH_curve_X_h22_l3_h16_l2_01200_1_loop_10_0001_one_capped_0001_fragments_data.zip in:file:boinc_wu_zip PH180629_9_data.zip
jran 3552117 jran 3431960
relax::default_repeats 2 relax::default_repeats 15
-------------------------- relax::minimize_bond_lengths 1
ID: 89225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org