Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 28 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 689
Credit: 9,351,286
RAC: 4,534
Message 88320 - Posted: 19 Feb 2018, 20:58:54 UTC - in response to Message 88319.  

Are you sure that 1 hour is even an allowed value for CPU time? I haven't checked lately, but 3 hours used to be the lowest allowed value.

3hrs used to be the default, but 1hr was (and still is) the minimum allowed.

I agree the 1hr option should be removed. And with so many multi-core processors out there, the minimum should probably be 3hrs. 2hrs is also a current option.


Due to the tasks running for 6 hours then having computation errors its better to have the 1 hour task. Then once it reaches 2-3 hours its know to be bad and can be manually aborted instead of wasting a full 6 hours.

Depends on what caused the computation error.

Each Rosetta@Home task is composed of, usually, 100 subtasks. The first of these only check that the computer is handling such tasks properly; if it is the only one completed, the results of the task are useless.

The other 99 are either from 99 different starting points, or 99 iterations from one starting point. Only as many are actually done as will fit into the time allowed.

If the cause of the computation error is in only one starting point, it's probably best to run as many subtasks as will complete before reaching this starting point, since that many subtasks are not leading to a computation error. I don't think the project has mentioned whether they can recover output from all the properly completed subtasks if a later subtask gives a computation error.

On the other hand, if the cause of the computation error is in an input file shared by all 99 of these subtasks, it is best for the first one to detect the error and stop the whole task.

Note that allowing longer runs reduces the amount of communications time required to get input files from the server to your computer and get the output files back.
ID: 88320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 979
Credit: 21,656,538
RAC: 13,957
Message 88432 - Posted: 6 Mar 2018, 8:10:04 UTC - in response to Message 88320.  

Are you sure that 1 hour is even an allowed value for CPU time? I haven't checked lately, but 3 hours used to be the lowest allowed value.

3hrs used to be the default, but 1hr was (and still is) the minimum allowed.

I agree the 1hr option should be removed. And with so many multi-core processors out there, the minimum should probably be 3hrs. 2hrs is also a current option.

Due to the tasks running for 6 hours then having computation errors its better to have the 1 hour task. Then once it reaches 2-3 hours its know to be bad and can be manually aborted instead of wasting a full 6 hours.

Depends on what caused the computation error.

Quite.

Everyone seems to have gone away. Probably just as well.
No-one mailed me about mining. Probably just as well too.

I think the minimum task runtime should be changed up anyway, before anyone comes back.

I've replaced an old machine with a new one over the last month and been overclocking it gradually, ending up with loads of aborted tasks after I went a bit too far (corrected now). I wouldn't mind betting these other guys have been overclocking too and Rosetta has found their machines' weak spots. Just guessing though.
ID: 88432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 66
Credit: 172,677,259
RAC: 89,359
Message 88573 - Posted: 27 Mar 2018, 23:53:43 UTC

Anyone else having trouble getting new work? As of this time stamp? ~16:00 left coast time?
Just replaced a board, can't get any work.
ID: 88573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 66
Credit: 172,677,259
RAC: 89,359
Message 88574 - Posted: 28 Mar 2018, 0:42:57 UTC - in response to Message 88573.  

Downloading new work now at 17:30. Some people are so nervous ..... lol
ID: 88574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 59
Credit: 547,523
RAC: 211
Message 88575 - Posted: 28 Mar 2018, 3:40:12 UTC

As of 20:35 PDT (West Coast USA), no work in last hour or so. S@H down for maintenance and Rosetta being asked for work, but none forthcoming.
ID: 88575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Don

Send message
Joined: 22 Aug 17
Posts: 3
Credit: 578,914
RAC: 469
Message 88576 - Posted: 28 Mar 2018, 4:25:21 UTC
Last modified: 28 Mar 2018, 4:41:07 UTC

No new work for me either.

5793	Rosetta@home	3/28/2018 12:35:43 AM	Sending scheduler request: To fetch work.	
5794	Rosetta@home	3/28/2018 12:35:43 AM	Requesting new tasks for CPU	
5795	Rosetta@home	3/28/2018 12:35:44 AM	Scheduler request completed: got 0 new tasks	
5796	Rosetta@home	3/28/2018 12:35:44 AM	No tasks sent
ID: 88576 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Darrell

Send message
Joined: 28 Sep 06
Posts: 25
Credit: 44,152,991
RAC: 40,211
Message 88741 - Posted: 23 Apr 2018, 9:42:41 UTC

I just lost a few more Rosetta 4.07 after they clogged my 8GB RAM computer, then each wanted more RAM. When are the estimates going to get better so as to avoid all the wasted crunching?

I have set ALL my 8GB computers to only run a single 4.07 task, so I would rather waste unused crunch time than used crunch time (and electricity).
ID: 88741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
newman

Send message
Joined: 18 Mar 10
Posts: 1
Credit: 481,938
RAC: 719
Message 88854 - Posted: 11 May 2018, 12:02:10 UTC

All my Rosetta 4.07 WUs error out after some seconds under Ubuntu 18.04. Mini Rosetta is working fine.

Error code:
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_x86_64-pc-linux-gnu -ignore_unrecognized_res 1 -abinitio::fastrelax 1 -ex2aro 1 -abinitio::use_filters false -out:file:silent default.out -abinitio::rsd_wt_loop 0.5 -beta 1 -abinitio::detect_disulfide_before_relax 1 -relax::minimize_bond_angles 1 -in:file:native 00001.pdb -silent_gz 1 -abinitio::rsd_wt_helix 0.5 -relax::default_repeats 15 -beta_cart 1 -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::rg_reweight 0.5 -abinitio::increase_cycles 10 -out:file:silent_struct_type binary -ex1 1 -optimization::default_max_cycles 200 -relax::dualspace 1 -in:file:boinc_wu_zip NTF2chip_8437_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2141443
rosetta_4.07_x86_64-pc-linux-gnu: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
SIGABRT: abort called


Kind regards,
Marcus
ID: 88854 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 41
Credit: 5,882,502
RAC: 857
Message 88856 - Posted: 11 May 2018, 12:38:31 UTC

Most of mine do as well. Same OS. There are a few that do complete but over half have this error.
ID: 88856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 300
Credit: 9,196,254
RAC: 12,060
Message 88857 - Posted: 11 May 2018, 15:16:19 UTC

The first Rosetta 4.07 that I ran on a new Ubuntu 18.04 machine (i7-8700) errored out immediately also.
https://boinc.bakerlab.org/result.php?resultid=997076322

The 3.78 are running fine.
ID: 88857 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chris Jenks

Send message
Joined: 16 Jun 06
Posts: 1
Credit: 117,759
RAC: 110
Message 88911 - Posted: 16 May 2018, 0:20:00 UTC
Last modified: 16 May 2018, 0:51:23 UTC

I have recently started running Rosetta on two cell phones using the Android version of BOINC. I thought things were working well but started noticing an excessive number of newly started tasks and started keeping track. What I am finding is that the "Elapsed time" for the tasks, and the percent complete, randomly decreases for the tasks, causing them to take much longer to finish than I would expect half way through. For example, task ab_12_01__vall_2011_1pgxA_vall_2011_9mers_3mers_535141_18556_0 currently shows elapsed time of 4:25:52 and % complete of 80.8%, but an hour ago this same task was shown with an elapsed time of 4:29 and 80.8% complete - it went backwards despite an hour of work. Is there anything I can do besides finding another project?

Edit: I just noticed in the event log that the computation was suspended due to being on battery. My phone charges using Qi wireless, which causes it to cycle between 100% and 99% annoyingly. The suspension this cycle causes may be causing data loss. I should also mention that I greedily ran all four cores - maybe fewer will fix the errors.
ID: 88911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter DM

Send message
Joined: 27 Mar 18
Posts: 5
Credit: 0
RAC: 0
Message 88979 - Posted: 23 May 2018, 10:40:59 UTC

No tasks sent today 23 May ?
ID: 88979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 300
Credit: 9,196,254
RAC: 12,060
Message 88980 - Posted: 23 May 2018, 11:25:05 UTC - in response to Message 88979.  

No tasks sent today 23 May ?

I have been getting both 3.78 and 4.07 all morning (Windows version).
ID: 88980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter DM

Send message
Joined: 27 Mar 18
Posts: 5
Credit: 0
RAC: 0
Message 89001 - Posted: 27 May 2018, 12:35:18 UTC

Android. Around midday today, 27 May again no work units. Check log and it states received 0.
Log also says requesting work for Mali-880 the GPU in my phone. I wonder if this is why I'm getting no work ?
I have changed nothing and assume it was always asking for GPU work units.
Any ideas ?
ID: 89001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 250
Credit: 8,037,564
RAC: 0
Message 89004 - Posted: 27 May 2018, 17:39:17 UTC - in response to Message 89001.  

Android. Around midday today, 27 May again no work units. Check log and it states received 0.
Log also says requesting work for Mali-880 the GPU in my phone. I wonder if this is why I'm getting no work ?
I have changed nothing and assume it was always asking for GPU work units.
Any ideas ?


Your profile shows that you have no computers assigned to your account.
I wouldn't worry about receiving WU until the computer shows up in your profile.

Have you ever received any WU?
ID: 89004 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter DM

Send message
Joined: 27 Mar 18
Posts: 5
Credit: 0
RAC: 0
Message 89008 - Posted: 27 May 2018, 21:36:35 UTC - in response to Message 89004.  

Yes. I have received and crunched heaps of WU. All my BOINC score is from Rosetta. I assume BAM does not know how to associate an Android host.
Today I received 2 WU which will not last long.
ID: 89008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 250
Credit: 8,037,564
RAC: 0
Message 89009 - Posted: 28 May 2018, 1:48:19 UTC - in response to Message 89008.  

Yes. I have received and crunched heaps of WU. All my BOINC score is from Rosetta. I assume BAM does not know how to associate an Android host.
Today I received 2 WU which will not last long.



You post AUTHOR information shows zero credits. Maybe you have 2 accounts. If not, there seems to be some problems.

Joined: 27 Mar 18
Posts: 4
Credit: 0
RAC: 0


Your USER PROFILE also shows zero.

Peter DM
User ID 1991524
Rosetta@home member since 27 Mar 2018
Country Australia
Total credit 0
Recent average credit 0.00
Computers View

Team None
Message boards 4 posts
ID: 89009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter DM

Send message
Joined: 27 Mar 18
Posts: 5
Credit: 0
RAC: 0
Message 89010 - Posted: 28 May 2018, 3:33:00 UTC

Does anyone from UW actually monitor this thread ?
Server status shows no unsent android tasks, and tasks in progress continually falling.
There are no android tasks being sent.
ID: 89010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 59
Credit: 547,523
RAC: 211
Message 89013 - Posted: 28 May 2018, 8:18:35 UTC - in response to Message 89010.  

I agree with RJS5. If your acct was just started on 27th, how could you have crunched any WUs at all? You MUST have a duplicate or separate acct as well. Do you have other computers/devices also crunching Rosetta?
ID: 89013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 689
Credit: 9,351,286
RAC: 4,534
Message 89053 - Posted: 4 Jun 2018, 3:41:11 UTC - in response to Message 89013.  

I agree with RJS5. If your acct was just started on 27th, how could you have crunched any WUs at all? You MUST have a duplicate or separate acct as well. Do you have other computers/devices also crunching Rosetta?

Did you notice the 27th of what month?
ID: 89053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 28 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2019 University of Washington
http://www.bakerlab.org