Rosetta@home

Minirosetta 3.52

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Minirosetta 3.52

Sort
AuthorMessage
David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 76770 - Posted 28 May 2014 19:30:11 UTC

This version includes a couple bug fixes for symmetric modeling. Please report issues/bugs here.

Killersocke@rosetta

Joined: Nov 13 06
Posts: 20
ID: 129065
Credit: 664,666
RAC: 225
Message 76816 - Posted 10 Jun 2014 1:03:11 UTC
Last modified: 10 Jun 2014 1:15:57 UTC

since two days iv'e got same broken WU's

they have all the same exit Status:

Exit status -1073741819 (0xffffffffc0000005

Client error Compute error

Task ID 666681305

Task ID 666713879

Task ID 666800013

Task ID 666800018

regards

Killersocke@rosetta

Joined: Nov 13 06
Posts: 20
ID: 129065
Credit: 664,666
RAC: 225
Message 76818 - Posted 10 Jun 2014 16:42:40 UTC

....
and Today more........

I'm not happy about this

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,273,242
RAC: 5,666
Message 76819 - Posted 10 Jun 2014 17:14:17 UTC

I'm seeing failures on Linux with tasks named T0763*.

See 666899120

ERROR: HybridizeFoldtreeDynamic: failed to build fold tree from cuts and jumps
ERROR:: Exit from: src/protocols/hybridization/HybridizeFoldtreeDynamic.cc line: 676
SIGSEGV: segmentation violation
Stack trace (20 frames):
[0xbb009c7]

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 76821 - Posted 11 Jun 2014 2:28:58 UTC

This task erred just now.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=604963960

T0763_jump_13_70_71_90_hybrid_withlookback_169118_137_0

Watchdog active.
SIGSEGV: segmentation violation
Stack trace (20 frames):
[0xbb009c7]
[0xb7739400]
[0xb3d01ce]
[0xb449439]
[0xa06a109]
[0xa0698a3]
[0xa048b2e]
[0x9823d96]
[0x982475a]
[0x9826b8e]
[0x87aee21]
[0x87e57d9]
[0x9da5fe2]
[0x9da837e]
[0x9f99f2d]
[0xa017d65]
[0xa015595]
[0x8055314]
[0xbb91f88]
[0x8048131]

Exiting...

</stderr_txt>

____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 76822 - Posted 11 Jun 2014 6:03:59 UTC

Had more T0763 fail with this error as well.

ERROR: HybridizeFoldtreeDynamic: failed to build fold tree from cuts and jumps
ERROR:: Exit from: src/protocols/hybridization/HybridizeFoldtreeDynamic.cc line: 676
SIGSEGV: segmentation violation
Stack trace (20 frames):

____________


achalupka

Joined: Feb 25 09
Posts: 1
ID: 303234
Credit: 727,370
RAC: 0
Message 76825 - Posted 11 Jun 2014 13:47:19 UTC

I have approx 20 completed work units stuck in the uploading state. Not sure if this is an issue specific to Minirosetta 3.52 or a general boinc issue, but none of my other projects are exhibiting this issue. Its been like this for several days.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 76844 - Posted 16 Jun 2014 7:13:57 UTC

Crash after few seconds 668119976

ERROR: Warning: can't open file hybrid_jumping\t000_.fasta!
ERROR:: Exit from: ..\..\..\src\core\sequence\util.cc line: 148
# cpu_run_time_pref: 7200

ERROR: Warning: can't open file hybrid_jumping\t000_.fasta!
ERROR:: Exit from: ..\..\..\src\core\sequence\util.cc line: 148

ERROR: Warning: can't open file hybrid_jumping\t000_.fasta!
ERROR:: Exit from: ..\..\..\src\core\sequence\util.cc line: 148
____________

Dennis

Joined: Sep 11 08
Posts: 1
ID: 278209
Credit: 404,173
RAC: 1
Message 76845 - Posted 16 Jun 2014 11:56:30 UTC

I have 7 jobs with the same error as [VENETO] boboviz

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 76847 - Posted 17 Jun 2014 6:01:06 UTC

Hi.

I've had a few of these that got validate errors after either a couple of hours or a few minutes.

Tc761_jump_hybrid_1.0_169577_3670_0


ERROR: Warning: can't open file hybrid_jumping/t000_.fasta!
ERROR:: Exit from: src/core/sequence/util.cc line: 148

ERROR: Warning: can't open file hybrid_jumping/t000_.fasta!
ERROR:: Exit from: src/core/sequence/util.cc line: 148
======================================================
DONE :: 99 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

____________


sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 76857 - Posted 21 Jun 2014 15:05:45 UTC - in response to Message ID 76770.
Last modified: 21 Jun 2014 15:07:07 UTC

i'm trying out the view structure predictions commands as given on
http://boinc.bakerlab.org/rosetta/rah_view_predictions.php

however in stderr.txt
[2014- 6-21 4:23:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
semget failure! Cannot create semaphore!
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: /var/lib/boinc/projects/boinc.bakerlab.org_rosetta/minirosetta_3.52_x86_64-pc-linux-gnu -extract -all -s chk_S_00000010_ClassicAbinitio__stage4_kk_1.out
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
ERROR: Option matching -extract not found in command line top-level context

with the error there is no pdb files created

using this approach as linux hosts currently do not support a screen saver or gui

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 76876 - Posted 25 Jun 2014 9:19:12 UTC

After 108 minutes, 669716479 returns this error:

Starting work on structure: _00001
# cpu_run_time_pref: 7200
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
WARNING! attempt to create gzipped file ../../projects/boinc.bakerlab.org_rosetta/benchmark_0014_alex_metric_58c8eb0a169e5c49e1cef5331416f2432f6d8260_ploops_32_input_0002_no_lig_fragments_contact_opt_iteration_1_dbe4269ef4fb4de99e2f824cdcbaf1c6_fold_SAVE_ALL_OUT_170252_3113_0_0 failed.
======================================================
DONE :: 1 starting structures 6499.73 cpu seconds
This process generated 7 decoys from 7 attempts
======================================================
BOINC :: WS_max 5.23985e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>benchmark_0014_alex_metric_58c8eb0a169e5c49e1cef5331416f2432f6d8260_ploops_32_input_0002_no_lig_fragments_contact_opt_iteration_1_dbe4269ef4fb4de99e2f824cdcbaf1c6_fold_SAVE_ALL_OUT_170252_3113_0_0</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>


____________

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 106
ID: 436004
Credit: 1,877,239
RAC: 1,982
Message 76892 - Posted 27 Jun 2014 18:24:58 UTC - in response to Message ID 76857.

Hi sgaboinc,

See my post here:
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=6358

Tell me if it doesnt work! If it works, I'll try get the R@H people to update the rah_view_predictions.php page with these instructions.

i'm trying out the view structure predictions commands as given on
http://boinc.bakerlab.org/rosetta/rah_view_predictions.php

however in stderr.txt
[2014- 6-21 4:23:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
semget failure! Cannot create semaphore!
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: /var/lib/boinc/projects/boinc.bakerlab.org_rosetta/minirosetta_3.52_x86_64-pc-linux-gnu -extract -all -s chk_S_00000010_ClassicAbinitio__stage4_kk_1.out
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
ERROR: Option matching -extract not found in command line top-level context

with the error there is no pdb files created

using this approach as linux hosts currently do not support a screen saver or gui

BetelgeuseFive

Joined: Aug 10 10
Posts: 1
ID: 389214
Credit: 497,755
RAC: 501
Message 76912 - Posted 29 Jun 2014 14:22:48 UTC


I have one tasks with the same problems as reported by Killersocke:

Exit status -1073741819 (0xffffffffc0000005)

http://boinc.bakerlab.org/rosetta/result.php?resultid=670799921

Most of the tasks complete without problems however.

Tom

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 76914 - Posted 29 Jun 2014 15:35:58 UTC - in response to Message ID 76892.

Hi sgaboinc,

See my post here:
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=6358

Tell me if it doesnt work! If it works, I'll try get the R@H people to update the rah_view_predictions.php page with these instructions.




hi krypton,

thanks much for those instructions, i'm following up in the original thread as it seemed this is some what out of topic of this thread

original thread:
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=6358

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 76926 - Posted 30 Jun 2014 3:52:45 UTC
Last modified: 30 Jun 2014 3:53:21 UTC

Hi.

Had these two fail after about 2hrs.30min.


http://boinc.bakerlab.org/rosetta/workunit.php?wuid=608458156

pd1_graftsheet_41limit_1L3E2L3E2L3E3L13H3L3E1L_1-2.P.0_SAVE_ALL_OUT__171170_1_1

--------------------------------------------------------------------------------

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=608509726

pd1_graftsheet_41limit_1L4E3L4E3L4E2L15H1L_1-2.P.0_SAVE_ALL_OUT__171149_1_0

# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 8514.78 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 6.80565e+38

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>

--------------------------------------------------------------------

# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 8480.43 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 6.80565e+38

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>
____________


sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 76927 - Posted 30 Jun 2014 4:53:35 UTC - in response to Message ID 76926.
Last modified: 30 Jun 2014 4:55:27 UTC


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>

--------------------------------------------------------------------

# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 8480.43 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 6.80565e+38

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>


i've seen a same message, however it seemed the message appeared when the job has completed and i clicked update. it is uncertain if that's caused by the click on update (i.e. some scheduler error?)
http://boinc.bakerlab.org/rosetta/result.php?resultid=670699728

apparently cases which i left them alone for automatic transmissions did not show the error. or for those which i clicked update after a while after the job completes

let us know if the boinc client may be involved here

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 76930 - Posted 30 Jun 2014 16:35:06 UTC
Last modified: 30 Jun 2014 16:46:03 UTC

another case of std::bad_alloc failures

task: pd1_graftsheet_41limit_1L5E2L12H4L8H2L5E1L_1-2.P.0_SAVE_ALL_OUT__171083_4_0
http://boinc.bakerlab.org/rosetta/result.php?resultid=671116399

"======================================================
DONE :: 21 starting structures 10605.5 cpu seconds
This process generated 21 decoys from 21 attempts
======================================================
BOINC :: WS_max 1.12221e-190

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>
]]>
"
aborts occurs at end of job, apparently seemed related to particular tasks
pd1_graftsheet_41limit_1L5E2L12H4L8H2L5E1L*

not sure if it has anything to do with that

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 76934 - Posted 30 Jun 2014 21:37:36 UTC

Lots of errors in the last days. Wing(wo)man have also errors, very occasionally they get one validated.
Part of error message:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x009B7DEB write attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...

Hope this will be solved soon.
____________
Greetings,
TJ.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 76947 - Posted 2 Jul 2014 13:35:20 UTC
Last modified: 2 Jul 2014 13:36:13 UTC

Same here, like TJ 671434192

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x016F7DEB write attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...
____________

Killersocke@rosetta

Joined: Nov 13 06
Posts: 20
ID: 129065
Credit: 664,666
RAC: 225
Message 76948 - Posted 2 Jul 2014 13:55:21 UTC

Client error/Compute error

Task ID 671091234
Workunit 608788253
Created 30 Jun 2014

Task ID 671559600
Workunit 608647393
Created 2 Jul 2014

Task ID 671560728
Workunit 609197398
Created 2 Jul 2014

Task ID 671576410
Workunit 609210460
Created 2 Jul 2014

Task ID 671592963
Workunit 609224067
Created 2 Jul 2014

As I get no response in this board.
To whom I do need to escalate this issues?

I will stop working until this problem will be solved.

regards

Trotador

Joined: May 30 09
Posts: 61
ID: 318648
Credit: 40,027,802
RAC: 5,226
Message 76963 - Posted 6 Jul 2014 10:10:54 UTC
Last modified: 6 Jul 2014 10:11:43 UTC

In my case all units failing with client error/computer error are of the type with name starting by:

pd1_graftsheet_41limit_.....

but only some of them, the majority complete OK. No WU starting with a different name is causing issues, at least as repetitive as this one.

I don't think it could be related to the lenght of the WU names because there are other units with longer names that complete OK.

All my PCs are Ubuntu and the error message is always the same one:

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

In nearly all the cases wingmen fail as well, I've only seen one case on which it succeeded.

In my case the failure is specially annoying because it occurs when the unit has been processing for a long time, wingmen seem to fail just at the beggining.

It is more difficult to trace errors because the database does not allow see long names, to sort by status etc.

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 76967 - Posted 7 Jul 2014 14:29:11 UTC - in response to Message ID 76963.

It is more difficult to trace errors because the database does not allow see long names, to sort by status etc.


Yes and the server software is very obsolete, but no-one on this project does bother to update, as most other projects have several years back.
____________
Greetings,
TJ.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 76968 - Posted 7 Jul 2014 15:07:09 UTC

Yes and the server software is very obsolete, but no-one on this project does bother to update


I'm ignorant about that, but...it's so difficult to update the boinc server code? Have they to recompile all the software?
I found this, seems to be a "standard" procedure
http://boinc.berkeley.edu/trac/wiki/ToolUpgrade
____________

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 338,704
RAC: 23
Message 76973 - Posted 8 Jul 2014 9:01:17 UTC - in response to Message ID 76968.

I'm ignorant about that, but...it's so difficult to update the boinc server code? Have they to recompile all the software?
I found this, seems to be a "standard" procedure
http://boinc.berkeley.edu/trac/wiki/ToolUpgrade

It was reported in the past, that Rosetta has quite many modifications of the server software (this is not the first time that someone asks for a new server version). Getting all of that working with few generations newer server software might not be as easy task as using Windows Update. Hence Rosetta won't update the servers as long as possible, apparently even the issues with v7 clients were not reason enough.
____________
.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 76975 - Posted 8 Jul 2014 9:44:25 UTC

It was reported in the past, that Rosetta has quite many modifications of the server software (this is not the first time that someone asks for a new server version). Getting all of that working with few generations newer server software might not be as easy task as using Windows Update.


Thanks for the answer!
____________

Mark Darrall

Joined: Nov 9 08
Posts: 1
ID: 287298
Credit: 10,956
RAC: 51
Message 76995 - Posted 12 Jul 2014 11:43:47 UTC

The project seems to be running okay so far, but the Minirosetta 3.52 graphics won't run...just returns "Not Responding." It doesn't seem to be affecting anything else.

This is on an older Core2 x86 running 32b Vista with integrated Intel graphics. This systems was recently rebuilt so all drivers are up to date.

Older versions of Rosetta have run fine in the past...

Thank you!

Killersocke@rosetta

Joined: Nov 13 06
Posts: 20
ID: 129065
Credit: 664,666
RAC: 225
Message 76997 - Posted 12 Jul 2014 18:13:37 UTC

Ok Guys
this Project ist now dead for me.

Me

Good bye

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,273,242
RAC: 5,666
Message 77000 - Posted 12 Jul 2014 20:40:22 UTC

I'm also seeing the occasional failure "bad_alloc" failure on Ubuntu 14.04.

Sample: Task 673942668

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 77004 - Posted 12 Jul 2014 22:19:45 UTC

Errors keep flowing in, but no answer from the project guys as usual.
I have set it to very low priority.
____________
Greetings,
TJ.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77009 - Posted 14 Jul 2014 9:32:21 UTC

Some "benchmark_alex_metric" erros (after 2 h), like this:
673949101

upload failure: <file_xfer_error>
<file_name>benchmark_0023_alex_metric_332d631779a7456b4ee7d35a7bbca2b0522dcada_ploops_42_input_0002_no_lig_fragments_contact_opt_iteration_6_a2d64bd69c1740e4b71103f0f2d70098_fold_SAVE_ALL_OUT_173723_311_1_0</file_name>
<error_code>-161 (not found)</error_code>

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77011 - Posted 14 Jul 2014 12:41:40 UTC

674405695

<message>
(unknown error) - exit code -529697949 (0xe06d7363)
</message>
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x75961D4D


____________

Monty

Joined: Oct 14 07
Posts: 4
ID: 212497
Credit: 146,497
RAC: 0
Message 77012 - Posted 14 Jul 2014 18:02:04 UTC

Same here , like boboviz above me:

Task ID: 674414117



[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77016 - Posted 16 Jul 2014 8:21:58 UTC

Some "benchmark_alex_metric" erros (after 2 h)


Erratum: A LOT of "benchmark_alex_metric" errors after 2h
____________

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 77019 - Posted 16 Jul 2014 9:49:46 UTC - in response to Message ID 77016.

Some "benchmark_alex_metric" erros (after 2 h)


Erratum: A LOT of "benchmark_alex_metric" errors after 2h

We can keep posting about the erors but nothing changes, as ever here, we have to wait until all these crappy WU's are finished, or at least run at our systems while using energy but not contributing to science. It's a pity.
____________
Greetings,
TJ.

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77035 - Posted 19 Jul 2014 13:13:48 UTC
Last modified: 19 Jul 2014 13:25:24 UTC

perhaps the rosetta designers need to enhance the app or even the web so that the 'errors' are more meaningful, lol :)

we've a scientist who at least came out to clarify one of the 'hot' errors.
http://boinc.bakerlab.org/forum_thread.php?id=6485&nowrap=true#77013

it turns out that some errors can be simply due to the job did not find any structures. deemed 'dead end elimination' (i.e. either impossible combinations or perhaps algorithmic chaos)

in addition, i'd think that rosetta@home and even boinc may need to look into granting credits for network bandwidth consumed (it makes sense after all), in these cases the errors are due to finding 'dead ends' and if that's true it is different from no result. it could mean that that particular combination leads to a 'dead end' hence no structures

while i'm no scientist in molecular simulations, i'm aware that some algorithms/solution search can lead to systematic 'chaos' when things are sufficiently complex and non-linear

http://en.wikipedia.org/wiki/Chaos_theory


Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77332 - Posted 14 Aug 2014 2:43:35 UTC

Why is it that some tasks have had their deadline extended to 14 days while many remain at 10 days? Is it by accident or related to the extra users that have been added recently?
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77333 - Posted 14 Aug 2014 3:20:29 UTC

I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks.

Remember, you can always set the run time preference to your liking, from as low as 1 hr to as long as 2 days. It is the "Target CPU run time" option in the Rosetta@home specific preferences. Also keep in mind that it's a target run time but if the job is a large protein, it may take longer than 1 hour to generate 1 model so the actual run time can exceed the target run time (at least 1 model is generated).

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77334 - Posted 14 Aug 2014 14:43:52 UTC

Up to 2 days? So that's new too. Used to be that 1 day was the top.

2 days will be great for full-time crunchers. Minimize network bandwidth, great if your bandwidth is capped. Minimize hits on the project servers.

Just beware that if you make changes, the work you've already downloaded will switch to the new target runtime. So, best to make changes when cache of outstanding work is low, and not to bump up the days between connection time setting at the same time. You want BOINC Manager to see a few new WUs complete before changing the network settings. Again, just to help prevent it getting way too much work. Not the end of the world, you can always abort a few, or adjust your runtime preference back down a couple notches.

Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.

DK, please post a notice on the project news and perhaps the twitter feed as well.
____________
Rosetta Moderator: Mod.Sense

Defender

Joined: Mar 22 08
Posts: 1
ID: 248666
Credit: 3,867,607
RAC: 3,808
Message 77336 - Posted 14 Aug 2014 18:34:43 UTC

Because you are talking about twitter: Why is there no activity on the Facebook page since 2010? https://www.facebook.com/pages/Rosettahome/161671540539170

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77337 - Posted 14 Aug 2014 19:55:45 UTC

Lab members are just not posting to the R@h FB page. I'll add something to our technical news.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77338 - Posted 14 Aug 2014 20:12:05 UTC - in response to Message ID 77334.



Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.




No I didn't but I will. Thanks for the suggestion.

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77339 - Posted 15 Aug 2014 2:24:59 UTC - in response to Message ID 77333.

I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks.

6 hours? I didn't realise.

On my two default (24 hour) machines I'm already set to 8hrs, so that's no problem for me. I've got access to one of my less-regular machines on another username, which was set to 4hrs, and notice it's still at 4hrs. I've changed it to the 6hr default to fall in line and will see how it goes over the next week. I'm not expecting an issue.

I have one less regular machine which is only on for a day or two a week for a few hours and set at default, which I'm inclined to knock back down to 3hrs, even though the 14 day deadline will help.

I understand why it's been done with the vast increase in active users and dare say the vast majority are set at default and will be none the wiser. The sign of this being a bad move will be if there's an increase in people missing even the extended deadline as Boinc defaults aren't that productive imo.
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77340 - Posted 15 Aug 2014 2:50:11 UTC - in response to Message ID 77333.

I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks.

On a similar subject, when the next CASP tasks are loaded up, is it possible to give them appropriate deadlines, but keep non-CASP tasks at 14 days? Iirc CASP tasks had to back to you within 48 hours (inc runtime) to meet the deadlines you have. I mentioned this before, but it was too late to do anything about it.

____________

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77341 - Posted 15 Aug 2014 12:57:49 UTC
Last modified: 15 Aug 2014 13:31:18 UTC

thanks for providing the user selectable "Target CPU run time".
as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run), i've gone ahead to set some defaults that's somewhat lower than 6 hours as it is more appropriate for me.
However, i'd think the user selectable "Target CPU run time" would really be a useful thing to help the various participants.

the reason i set a lower run time is simply due to that i only run it (boinc/rosetta) during the 'idle' hours and the pc is switched off when no one is at home.

i'm also of the same opinion as Celery that an extended long hours doesn't help much given that it takes longer to return results and that some of the jobs may simply expire before they could be completed by the deadline

i normally play a 'good citizen', pull just enough jobs complete them and submit the results. this give a much better turn around time and the jobs often complete without errors or even if it error-ed out, it is reverted as soon as the status is known. i think this is much better in terms of turning around the results promptly possibly for the scientists who are waiting for the incremental results. etc rather than to pull a lot of 'unused' jobs which i may after all not crunch which i may later simply have to cancel them.

that may vary/be different for other participants who may possibly leave the PCs crunching round the clock and/or may use a 'slower' cpu. Hence, user selectable "Target CPU run time" is a useful thing.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77342 - Posted 15 Aug 2014 13:28:35 UTC - in response to Message ID 77338.



Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.




No I didn't but I will. Thanks for the suggestion.


Of course, doubling it is only correct for the portion of folks that are running at the default runtime. I was thinking when I had posted the suggestion that I'd had a reason for not suggesting this previously and couldn't recall what it was.

Anyway, I suspect more than half of the profiles are using the defaults anyway. And they are also probably the ones that pay the least attention to message boards and user preferences, so in the spirit of "set it and forget it", that would be the portion that it is most important to match with.
____________
Rosetta Moderator: Mod.Sense

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77343 - Posted 15 Aug 2014 16:09:37 UTC - in response to Message ID 77341.
Last modified: 15 Aug 2014 16:15:36 UTC

thanks for providing the user selectable "Target CPU run time".
as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run)


turns out that the boinc manager (gui) estimates for time left may be somewhat off, most of those '5 hours' tasks seemed to be completed in the original 3+ hours, '10 hours' tasks seemed to be completing in about 6 hours which is the default run time if nothing is selected.
the estimates going off the mark may be due to my pc running slower possibly for various reasons including multitasking with other non boinc tasks

the estimated computational size (gflops) for the 6 and 3 hour jobs seemed ok though, 80,000 gflops and 40,000 gflops respectively. not sure if these info may be useful

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77344 - Posted 15 Aug 2014 17:52:02 UTC
Last modified: 15 Aug 2014 17:52:16 UTC

Definitely most of the 40+ thousand active hosts are set it and forget it users.

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77346 - Posted 16 Aug 2014 2:07:34 UTC - in response to Message ID 77342.

Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.

No I didn't but I will. Thanks for the suggestion.

Of course, doubling it is only correct for the portion of folks that are running at the default runtime. I was thinking when I had posted the suggestion that I'd had a reason for not suggesting this previously and couldn't recall what it was.

I spotted this earlier and guessed it might be related.

On the flipside, being one of those who has tweaked my runtime already, I also tweaked the default buffer from 0.25 days up to 2 days, so by the time those tasks are worked through, it should resolve itself.

In the meantime, the rush of demand for tasks from the new users ought to have settled down too. It's all good. Probably... <cough>

____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77347 - Posted 16 Aug 2014 2:26:12 UTC - in response to Message ID 77341.

thanks for providing the user selectable "Target CPU run time".
as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run), i've gone ahead to set some defaults that's somewhat lower than 6 hours as it is more appropriate for me.
However, i'd think the user selectable "Target CPU run time" would really be a useful thing to help the various participants.

the reason i set a lower run time is simply due to that i only run it (boinc/rosetta) during the 'idle' hours and the pc is switched off when no one is at home.

I'm also of the same opinion as Celery that an extended long hours doesn't help much given that it takes longer to return results and that some of the jobs may simply expire before they could be completed by the deadline.

It's not quite that. Rosetta keeps a record of how much uptime your machine has, so as long as your processing pattern is consistent it'll make the necessary allowances.

I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline.

For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77348 - Posted 16 Aug 2014 3:05:28 UTC

There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77349 - Posted 16 Aug 2014 12:45:41 UTC - in response to Message ID 77348.

There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code.


Love it!

...but I'd have to say, overall, it might generate more TFLOPS if you instead worked on somehow making it easier for developers of the various protocols to implement more frequent checkpointing. If the casual user didn't lose an hour of progress when they power off, they would generally reach completion before the deadline.

Another idea would be to implement the trickle reporting of partial results when a model is completed. This would bring many of the results back to the project much sooner, but no doubt complicate WU validation. This would help eliminate the trade-off between an efficient, long runtime, and having immediate results in the hands of the researcher.
____________
Rosetta Moderator: Mod.Sense

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77351 - Posted 16 Aug 2014 15:20:55 UTC - in response to Message ID 77347.
Last modified: 16 Aug 2014 15:26:31 UTC


It's not quite that. Rosetta keeps a record of how much uptime your machine has, so as long as your processing pattern is consistent it'll make the necessary allowances.

I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline.

For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them.



i'm also suspecting that a lengthy default run time may *discourage* some users (especially the new novice users). i noted recently that some of the work units i've completed has been aborted by other users or that it ends with a 'no reply' status

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616151740
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616144558
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616162794

while it is uncertain if users aborted the jobs related to the run time or even simply abandoned boinc runs after trying them out. i'd think 'too lengthy' a default run time could have this *discouragement* as a negative effect

but of course, today there is this "Target CPU run time" that users can define which would help alleviate that for affected users.

perhaps it could be documented in an easily accessible page so that novice users etc could take note of the feature

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77353 - Posted 17 Aug 2014 1:37:43 UTC - in response to Message ID 77349.

There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code.

Love it!

...but I'd have to say, overall, it might generate more TFLOPS if you instead worked on somehow making it easier for developers of the various protocols to implement more frequent checkpointing. If the casual user didn't lose an hour of progress when they power off, they would generally reach completion before the deadline.

+1 from me.

The biggest problem is those jobs that don't checkpoint at all and run until the watchdog shuts them down. With the current new default, anything less than 10hrs solid running starts them from scratch at every reboot until the deadline passes. For those using Boinc defaults (already stated to be the vast majority of users) it would be more productive to abort them on sight. Chances are they'll hardly ever report back. That should be obvious.

____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77354 - Posted 17 Aug 2014 1:52:35 UTC - in response to Message ID 77351.

I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline.

For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them.

I'm also suspecting that a lengthy default run time may *discourage* some users (especially the new novice users). i noted recently that some of the work units i've completed has been aborted by other users or that it ends with a 'no reply' status

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616151740
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616144558
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616162794

while it is uncertain if users aborted the jobs related to the run time or even simply abandoned boinc runs after trying them out. i'd think 'too lengthy' a default run time could have this *discouragement* as a negative effect

but of course, today there is this "Target CPU run time" that users can define which would help alleviate that for affected users.

perhaps it could be documented in an easily accessible page so that novice users etc could take note of the feature

I suspect you have too high an expectation of most users. Target CPU runtime has always existed. It's just a little more flexible now. But the people who post here, like you and me, are very much the exception. The "set & forget" option is much more the norm. A document would be nice - no objection to it - but unlikely to gain much of a readership beyond what it is now.

Aborting tasks is clearly different from tasks being timed out. One is an active choice, the other the result of no choice at all. I doubt there's much of a "discouragement" factor. More that defaults don't coincide with a normal pattern of behaviour for ordinary people.

That's why I suggested the proportion of tasks failing to meet deadline should be monitored following the changes. Personally I'd have gone to 4hrs first, but obviously the vast increase in users required a more extreme and urgent response at the time.

I trust TPTB will make the appropriate assessment, seeing as they're the ultimate beneficiaries.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77355 - Posted 17 Aug 2014 3:06:49 UTC

You all raise great points and suggestions. Definitely more frequent checkpointing would save a bunch of computing for some protocols, particularly the homology modeling protocol. I will look into this. Forward folding has pretty good checkpointing in place and after CASP, will likely be the most common type of workunit.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77356 - Posted 17 Aug 2014 12:00:36 UTC

So, short of frequent checkpoints on all protocols, it would be ideal to not send the WUs that do not checkpoint as well to hosts that are not active many hours per day. [arm twist]If you upgrade the BOINC server code, you could use the job size matching to avoid assigning such tasks to machines that have a higher average turnaround, or low % BOINC active.[/arm twist]
____________
Rosetta Moderator: Mod.Sense

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77357 - Posted 17 Aug 2014 12:31:37 UTC

[arm twist]]A few other improvements that both the project and the users might enjoy in updated server code (we've had requests for several of the team and user functions, support for badges must be in there somewhere too, might even fix the msg boards so this thread isn't two screens wide):

[25675] Add feature for specifying plan classes in an XML file
[25321] Move antique file deletion to a separate program
[22778] Server support for Virtualbox applications
[22440] Deal correctly with 32-bit apps that require > 2GB RAM
[20807] Improved implementation of locality scheduling
[20149] Client versions include release. Projects may need to update app_version.min_core_version, config options
[19053] Project-specified access control for admin web pages
[18764] All project-specific scheduling policies on a per-job level
[18182] Support read-only DB replica correctly
[17430] Support a combination of locality and regular scheduling
[15543] Fix problem were clients with malformed global prefs get perpetual "Incomplete request" errors; fix bug that broke create_work
[15398] Handle quotes and slashes correctly in profiles and forums; fix bugs in team foundership transfer mechanism
[15281] Add support for matchmaker scheduling
[15137] Add "job size matching" feature (send large jobs to fast hosts)
[14842] Add super-easy mechanism for submitting single jobs
[14767] Add mechanism for assigning work to hosts, users, or teams
[14448] Add uniform/flexible notification mechanism; users can choose 1 email per event, daily digest email, or no email. REQUIRES ADDING NOTIFY.PHP AS A PERIODIC TASK IN CONFIG.XML
[14367] Add 'weak account key' mechanism
[14297] Config option to make team forums visible only to members
[14294] Prevent UOTD from showing big image on front page. Use show_uotd().
[14272] Team search feature
[14240] HTML-escape text in BOINC-wide team export file
[14234] Add "team message board" feature
[14229] Add optional user job submission system
[14084] Add user search feature - link to this from home page
[13964] lines/page in top user/team/host lists is configurable
[13945] Add "merge computers by name" feature
[13732] New and improved "Find a team" function
[13673] Fix an annoyance using team foundership transfer
[13463] Preserve project specific preferences during web RPC
[13231] Let team founders view history of people joining/quitting team
[13223] Support for 'BOINC-wide teams'
[13193] Add 'suspend_if_no_recent_input' preference (let hosts power down)
[13182] Add 'mark all threads as read' feature (forums)
[13127] Improved feeder query; may fix DB performance problems
[13045] Relax restrictions on merging hosts
[12912] Add <no_darwin_6>, <no_amd_k6> options
[12834] Make list of supported platforms visible in get_project_config.php
[12813] Add a forum preference for private message notification
[12785] Add "merge hosts by name" function
[12754] Add Paypal-based donation system
[12743] Add mechanism to end project gracefully
[/arm twist]
____________
Rosetta Moderator: Mod.Sense

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77361 - Posted 18 Aug 2014 17:18:29 UTC

I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades.

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 77363 - Posted 18 Aug 2014 22:45:44 UTC - in response to Message ID 77361.

I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades.

Thank you, that would be great!
____________
Greetings,
TJ.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77364 - Posted 19 Aug 2014 6:31:57 UTC

'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades.

That's great!!

P.s. Please, try optimize the code for android (memory footprint, for example)
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77369 - Posted 20 Aug 2014 14:14:55 UTC
Last modified: 20 Aug 2014 14:17:10 UTC

[25675] Add feature for specifying plan classes in an XML file
[25321] Move antique file deletion to a separate program
[22778] Server support for Virtualbox applications


This is an old list (updated to june 2012). After my request, DA has updated it.
These are other changes to server code.

15-18.Aug.2014 Add support for per-app credit
8 Aug 2014 Convey user CPID to client (for BoincTasks?)
29 Jul 2014 version.xml can specify API version (for compressed apps)
25 Jul 2014 partial support in scheduler for generic coproessors (e.g. ASICs)
16 Jul 2014 scheduler support for client "brand"; store in DB
14 Jul 2014 add <maintenance_delay> config option
8 Jul 2014 matchmaker (score-based) scheduling is now the default
3 Jul 2014 fix bugs in changing code signing key
3 Jul 2014 scheduler: fix bugs if project has both NCI and regular apps
10 Jun 2014 add "delete_spammers.php" for removing various types of spam accounts
6 Jun 2014 app versions (as well as apps) can be marked as "beta"
4 Jun 2014 support CPU OpenCL apps in plan class spec
27 May 2014 fully implement targeted jobs
18 May 2014 include badges in XML stats export
8 May 2014 send notices w/ video or images only to 7.3+ clients
6 May 2014 file_deleter: delete .gz versions also
6 May 2014 add web page showing top CPU models and their stats
4 May 2014 apps can be marked as "exact fraction done" (base completion time est only on FD)
30 Apr 2014 generalize interface to PHPMailer
20 Apr 2014 support remote input files in create_work
18 Apr 2014 let projects disable forums and/or teams
10 Apr 2014 support efficient bulk job creation in create_work
2 Apr 2014 store job peak mem/disk usage in DB
26 Mar 2014 support gzipped input files
21 Mar 2014 use mysqli PHP functions if available
18 Mar 2014 add validator that checks for string in stderr
8 Mar 2014 enforce GPU job limits separately for each GPU type
6 Mar 2014 store gpu_active_frac, and use it in runtime estimation
5-20 Dec 2013 add generic support for badges
23 May 2013 parse client "product name" (e.g. phone model) and store in DB
9 May 2013 use HTTPS for forms containing password
25 Apr 2013 add support for multi-size apps
9 Apr 2013 add new score-based scheduling
27 Aug 2012 add support for limited locality scheduling
17 Aug 2012 add support for volunteer data archival
11 Jul 2012 pagination in forums
25 Jun 2012 scheduler: support Intel GPUs
____________

Orgil

Joined: Dec 11 05
Posts: 82
ID: 35185
Credit: 169,751
RAC: 0
Message 77371 - Posted 21 Aug 2014 2:18:00 UTC

Finished wu's not validating for 1 full day I checked the server status everything looking green, what happenned?!
____________

Miklos,M

Joined: Dec 8 13
Posts: 23
ID: 489912
Credit: 4,942,189
RAC: 0
Message 77375 - Posted 21 Aug 2014 17:42:44 UTC

Are we getting longer wu's effective 8/31/14? They seem to be estimated time to get done 40 hours or so. My preferences are not changed and still set for max 1 day to get a wu done per cpu.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77376 - Posted 21 Aug 2014 17:48:16 UTC

The jobs should still run based on the target cpu run time preference. The estimate is likely off because the workunit estimated FLOPS value has been doubled. The client should make better estimates as more jobs get processed but if the problem persists or if the job is actually running significantly longer than your target run time, let us know. Thanks.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77377 - Posted 21 Aug 2014 19:03:58 UTC

I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code.


Rosetta@Home and Ralph@Home run on same version of server code? If not, you can try to update Ralph and see what happens before update Rosetta...
____________

Orgil

Joined: Dec 11 05
Posts: 82
ID: 35185
Credit: 169,751
RAC: 0
Message 77381 - Posted 22 Aug 2014 4:31:33 UTC
Last modified: 22 Aug 2014 4:32:44 UTC

My wu's are waiting for 48hrs to validate or still in upload state. Houston we have a problem?!
____________

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77387 - Posted 22 Aug 2014 17:42:58 UTC - in response to Message ID 77354.
Last modified: 22 Aug 2014 18:05:25 UTC


I suspect you have too high an expectation of most users. Target CPU runtime has always existed. It's just a little more flexible now. But the people who post here, like you and me, are very much the exception. The "set & forget" option is much more the norm. A document would be nice - no objection to it - but unlikely to gain much of a readership beyond what it is now.

Aborting tasks is clearly different from tasks being timed out. One is an active choice, the other the result of no choice at all. I doubt there's much of a "discouragement" factor. More that defaults don't coincide with a normal pattern of behaviour for ordinary people.

That's why I suggested the proportion of tasks failing to meet deadline should be monitored following the changes. Personally I'd have gone to 4hrs first, but obviously the vast increase in users required a more extreme and urgent response at the time.

I trust TPTB will make the appropriate assessment, seeing as they're the ultimate beneficiaries.


i'd guess server side resource constraints could be a part of the reasons for some of these bottlenecks. i'd guess there are 'other solutions' e.g. an even more 'distributed' computing paradigm / design or partnering with 'mirror' servers say with a willing partner / institution may help alleviate some of the issues. but i'd think that those software changes possibly affecting the design of boinc itself and could take considerable effort to diagnose, develop and integrate with rosetta

Hence, i'd guess for the immediate term having a somewhat longer default run time is hence a *practical* consideration to alleviate some of the issues.

nevertheless, i'm attempting to make do with a somewhat longer self-defined run time (4 hrs) as a compromise for that.

i do agree that running long jobs do not coincide with say an average 'normal' usage pattern of a desktop or even notebook computer as for various reasons users would want to shut down their PC/notebook. a simple example could be that a computer could be running with a rather loud fan, and that'd be simply annoying at night in a bedroom and the (naive?) user could simply decide to abort the jobs and shutdown.

i used to run a PC that had a fan which almost runs like a jet engine (*noisy*) mainly due to an old graphic card lol,

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77391 - Posted 23 Aug 2014 3:02:05 UTC - in response to Message ID 77387.
Last modified: 23 Aug 2014 3:13:18 UTC


Personally I'd have gone to 4hrs first, but obviously the vast increase in users required a more extreme and urgent response at the time.

I trust TPTB will make the appropriate assessment, seeing as they're the ultimate beneficiaries.


i'd guess server side resource constraints could be a part of the reasons for some of these bottlenecks. i'd guess there are 'other solutions' e.g. an even more 'distributed' computing paradigm / design or partnering with 'mirror' servers say with a willing partner / institution may help alleviate some of the issues. but i'd think that those software changes possibly affecting the design of boinc itself and could take considerable effort to diagnose, develop and integrate with rosetta



i'd guess other possible 'designs/paradigms' such as a qos (quality of service) design can also be used to alleviate some of the high server load issues.
an example is that when the server is busy it can 'announce' qos controls and issue tokens with a number and a waiting period. this is to issue 'queue numbers' to the participating hosts and to request the hosts to back off and wait for the per-detimined period before retrying.

however in the same way these could involve various changes to boinc (both client and server) and integration with rosetta and could require rather large effort to develop them.

qos has similar limitations as a lengthened run time however a big difference is that the participant host computer is *idle* while waiting for re-contact with the server. this could alleviate cases where for instance the jobs runs with a noisy pc fan as the fan would likely wind down and run at lower speeds hence less noise

Norman

Joined: Oct 3 06
Posts: 3
ID: 116141
Credit: 1,744,777
RAC: 0
Message 77392 - Posted 23 Aug 2014 6:55:38 UTC

I have discovered a serious memory leak in Rosetta Mini 3.52 on a Mac OS X version 10.9.4 system on a Macbook Pro with 8 GB of physical memory. I watched as my system slowed to a crawl and then hung over several hours while my system was otherwise idle.

On another occasion I watched with the Memory panel of the Activity Monitor as my system slowed down, virtual memory grew to 49 GB and the swap file grew to 13 GB. Each of the three Rosette processes that were running were using about 1 GB each, but were not growing.

I interpret this as filling the available disk space with the swap file. Mac OS X apparently does not cope well with a full disk because there were also many weird error messages in the Console log. When I suspended the Rosetta project in BOINC Manager, my system returned to normal and has been running smoothly all day.

I will not resume running Rosetta until you tell me this bug is fixed.
____________

Orgil

Joined: Dec 11 05
Posts: 82
ID: 35185
Credit: 169,751
RAC: 0
Message 77396 - Posted 24 Aug 2014 5:27:13 UTC

I have few completed wu results on upload state for 4 days. And why no one from the project is answering my questions!! These results are not my property not project staffs property it is scientific property. It is shocking that project server status is showing fals green light status but a cruncher cannot upload the results.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77397 - Posted 24 Aug 2014 5:40:55 UTC

Not sure why your client isn't uploading results. Is anyone else having this issue? Is there any useful info in your client log?

Norman, that's a pretty serious bug/bad workunit. Any specifics? WU id?

Orgil

Joined: Dec 11 05
Posts: 82
ID: 35185
Credit: 169,751
RAC: 0
Message 77398 - Posted 24 Aug 2014 8:47:39 UTC
Last modified: 24 Aug 2014 8:52:28 UTC

The status says: Upload pending, project backoff .. (counting time)

WU id's:
1

application Rosetta Mini
created 18 Aug 2014 14:04:30 UTC
name tj_8_7_ordered_X_25_h20_BAB_20_BAB_wD_fragments_abinitio_SAVE_ALL_OUT_185149_3629
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1, 2, 1

2

application Rosetta Mini
created 6 Aug 2014 9:32:18 UTC
name 1L-18H-2L-8E-4L-8E-1L_1-2.A.0.rsmn_0060_2_fold_SAVE_ALL_OUT_183385_151
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1, 2, 1

3

application Rosetta Mini
created 16 Aug 2014 8:56:24 UTC
name flu.c05g_3_input_0244_0001_ss1_1_ss2_2_ss3_2_ss4_2_ss5_2_0001_0001_0001.B_fragments_fold_188798_181
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1,

4

application Rosetta Mini
created 16 Aug 2014 6:23:12 UTC
name db_triangle104B_fold_SAVE_ALL_OUT_189886_7480
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1, 2, 1
____________

Norman

Joined: Oct 3 06
Posts: 3
ID: 116141
Credit: 1,744,777
RAC: 0
Message 77403 - Posted 24 Aug 2014 12:27:27 UTC

For Mac OS X memory leak, three task names:
5htube05_relax_SAVE_ALL_OUT_189789_1457_0
batch2_pdb16_relax_SAVE_ALL_OUT_189866_5873_0
1L-7E-2L-11H-3L-7E-2L-11H-1L_1-2.P.0_0002_fold_SAVE_ALL_OUT_190736_101_0
____________

Miklos,M

Joined: Dec 8 13
Posts: 23
ID: 489912
Credit: 4,942,189
RAC: 0
Message 77404 - Posted 25 Aug 2014 11:24:20 UTC

Errors in the new long tasks.

Chunfu Xu

Joined: Oct 2 13
Posts: 2
ID: 484425
Credit: 8,816
RAC: 0
Message 77409 - Posted 25 Aug 2014 18:11:00 UTC - in response to Message ID 77403.

For Mac OS X memory leak, three task names:
5htube05_relax_SAVE_ALL_OUT_189789_1457_0
batch2_pdb16_relax_SAVE_ALL_OUT_189866_5873_0
1L-7E-2L-11H-3L-7E-2L-11H-1L_1-2.P.0_0002_fold_SAVE_ALL_OUT_190736_101_0



The 5htube* work unit was submitted by me. I am sorry that it caused a problem to your computer. I have identified the problem and will avoid it in the future. Sorry about that.

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 376,620
RAC: 363
Message 77412 - Posted 25 Aug 2014 21:58:21 UTC
Last modified: 25 Aug 2014 22:01:42 UTC

The "extremely long file name that goes over the Windows character limit" issue is back:

684142940, 684308973

WARNING! attempt to create gzipped file ../../projects/boinc.bakerlab.org_rosetta
/benchmark_0008_master_babd28351e57425d68b32333be5a837fb7cd5818_ploops
_64_input_0002_no_lig_fragments_contact_opt_iteration_2_50447fb2412049d0b1fecfb10acecfee
_fold_SAVE_ALL_OUT_170398_3761_0_0 failed.


As Windows has a path limit of 256 characters and the above path is 228 characters (excluding the file extension and higher levels of the path) you are bound to generate errors on a regular basis.

This issue has come up before but I guess that some of the scientists missed the memo.

Can you put in place a character limit for scientists submitting work?
I guess there will be a small inconvenience for the scientists in not being as descriptive as they want to be, but at least you don't scare the crunchers away with swathes of compute errors.

Norman

Joined: Oct 3 06
Posts: 3
ID: 116141
Credit: 1,744,777
RAC: 0
Message 77414 - Posted 26 Aug 2014 2:31:42 UTC

"For Mac OS X memory leak, three task names:
5htube05_relax_SAVE_ALL_OUT_189789_1457_0
batch2_pdb16_relax_SAVE_ALL_OUT_189866_5873_0
1L-7E-2L-11H-3L-7E-2L-11H-1L_1-2.P.0_0002_fold_SAVE_ALL_OUT_190736_101_0"

"The 5htube* work unit was submitted by me. I am sorry that it caused a problem to your computer. I have identified the problem and will avoid it in the future. Sorry about that."

"Avoid it" in the future is not enough. You have described changing the input data for the work unit, but since I am a retired software engineer, I know that the root cause of this problem probably is a software bug.
If such a problem can go wrong in the future, then it will. This software bug caused me to loose a week of work tracking it down. I will not use Minirosetta again until someone tells me that this bug is fixed.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77419 - Posted 26 Aug 2014 18:07:45 UTC - in response to Message ID 77412.
Last modified: 26 Aug 2014 18:37:07 UTC

The "extremely long file name that goes over the Windows character limit" issue is back
Can you put in place a character limit for scientists submitting work?
I guess there will be a small inconvenience for the scientists in not being as descriptive as they want to be, but at least you don't scare the crunchers away with swathes of compute errors.


Thanks for catching this. Yes, there is a character limit imposed but this job somehow slipped through. I'll have to reduce the max characters allowed so this doesn't happen again. edit- I see now how it slipped through and have fixed our submission code. thanks!

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77420 - Posted 26 Aug 2014 18:08:25 UTC

We'll definitely track this bug down and make sure it's fixed in the next app update.

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 338,704
RAC: 23
Message 77421 - Posted 26 Aug 2014 19:00:49 UTC - in response to Message ID 77412.

As Windows has a path limit of 256 characters and the above path is 228 characters (excluding the file extension and higher levels of the path) you are bound to generate errors on a regular basis.

File names are limited to 255 characters, full paths to 32,767 characters, see http://en.wikipedia.org/wiki/NTFS -> Internals.
____________
.

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 376,620
RAC: 363
Message 77422 - Posted 26 Aug 2014 21:11:13 UTC - in response to Message ID 77421.
Last modified: 26 Aug 2014 21:11:49 UTC

As Windows has a path limit of 256 characters and the above path is 228 characters (excluding the file extension and higher levels of the path) you are bound to generate errors on a regular basis.

File names are limited to 255 characters, full paths to 32,767 characters, see http://en.wikipedia.org/wiki/NTFS -> Internals.


Yes and no.
Yes, the maximum path can be extended to approximately 32,767 characters.
No, the default behaviour in Windows is to limit the path to 256 characters (even under NTFS).

See Naming Files, Paths, and Namespaces for more details.

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 6,140,242
RAC: 5,464
Message 77423 - Posted 27 Aug 2014 22:22:10 UTC

Unless I'm seeing a lot of old work units, the majority of the ones on my computer do not reflect the "workunit estimated FLOPS value has been doubled." At this point I have 50 work units awaiting processing and only two of them show the longer estimated time.

Miklos,M

Joined: Dec 8 13
Posts: 23
ID: 489912
Credit: 4,942,189
RAC: 0
Message 77427 - Posted 28 Aug 2014 18:25:24 UTC

I am hoping to receive more of the "25 Hour" wu's and not those at 45-50 hours. The latter ones result in errors for me.

ThrowerGB

Joined: Dec 4 05
Posts: 3
ID: 29380
Credit: 4,609,660
RAC: 3,822
Message 77431 - Posted 30 Aug 2014 13:28:28 UTC

Norman, in message 77392, reports a memory leak under OSX. I've been having the same problem for the last few weeks. The problem grows until the system freezes. As a result I've had to shut down R@H computations until the bug is fixed. This is unfortunate.

Sorry about posting this way. I can't see how to comment on a specific message directly.
____________

[AF>france>pas-de-calais]symaski62

Joined: Sep 19 05
Posts: 47
ID: 506
Credit: 33,871
RAC: 0
Message 77432 - Posted 30 Aug 2014 17:50:50 UTC - in response to Message ID 77427.

I am hoping to receive more of the "25 Hour" wu's and not those at 45-50 hours. The latter ones result in errors for me.


0 to 100% = (24 hour) minirosetta, (25 Hour) boinc



____________

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 6,140,242
RAC: 5,464
Message 77439 - Posted 4 Sep 2014 17:28:11 UTC

I am still getting a lot of work units for which the "workunit estimated FLOPS value has been doubled" has not happened (tj_ and rb_). My duration correction factor is all over the place. Please make up your mind(s).

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 960
ID: 14
Credit: 2,340,677
RAC: 1,123
Message 77440 - Posted 4 Sep 2014 18:19:59 UTC - in response to Message ID 77439.

I am still getting a lot of work units for which the "workunit estimated FLOPS value has been doubled" has not happened (tj_ and rb_). My duration correction factor is all over the place. Please make up your mind(s).


Thanks for the heads up. New rb_ jobs should be fixed. Older jobs that are still in our system unfortunately will keep the old values. They should gradually get purged out as they complete.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77577 - Posted 15 Oct 2014 9:35:53 UTC

696381923

ERROR: Assertion failure: runtime_assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: ..\..\..\src\protocols\simple_moves\FragmentMover.cc line: 307
std::cerr: Exception was thrown:


[ERROR] EXCN_utility_exit has been thrown from: ..\..\..\src\protocols\simple_moves\FragmentMover.cc line: 307
ERROR: Assertion failure: runtime_assert( ( begin + size - 1 ) <= pose.total_residue() );

____________

jpmfc46

Joined: Sep 17 14
Posts: 1
ID: 1007870
Credit: 389,010
RAC: 412
Message 77578 - Posted 16 Oct 2014 4:46:18 UTC

Rosettaq mini 3.52
event log: exited with zero status but no finished file
if this happens repeatidly you may need to reset the project.
Well it seems it happens quite a lot! should I reset the project?

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 376,620
RAC: 363
Message 77579 - Posted 16 Oct 2014 18:03:52 UTC - in response to Message ID 77578.

Rosettaq mini 3.52
event log: exited with zero status but no finished file
if this happens repeatidly you may need to reset the project.
Well it seems it happens quite a lot! should I reset the project?


That is a random BOINC bug that can affect any project and is not specifically a Rosetta issue.

See this BOINC FAQ page for several tips on how to fix the problem. I've been told that item
7 on the list often turns out to be the correct solution for most people, and it certainly worked
for me.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 552
ID: 25524
Credit: 1,530,641
RAC: 1,012
Message 77584 - Posted 17 Oct 2014 14:44:56 UTC

And this

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x01530DD9 read attempt to address 0x00000000

____________

Killersocke@rosetta

Joined: Nov 13 06
Posts: 20
ID: 129065
Credit: 664,666
RAC: 225
Message 77591 - Posted 19 Oct 2014 7:10:29 UTC

After to restart my PC this morning i get

7 x Client error / Compute error

Task:
https://boinc.bakerlab.org/rosetta/result.php?resultid=696831411
https://boinc.bakerlab.org/rosetta/result.php?resultid=696830180
https://boinc.bakerlab.org/rosetta/result.php?resultid=696823199
https://boinc.bakerlab.org/rosetta/result.php?resultid=696820951
https://boinc.bakerlab.org/rosetta/result.php?resultid=696819719
https://boinc.bakerlab.org/rosetta/result.php?resultid=696815927
https://boinc.bakerlab.org/rosetta/result.php?resultid=696812279

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
Unzulässige Funktion.
(0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2014-10-18 23:33:27:] :: BOINC:: Initializing ... ok.
[2014-10-18 23:33:27:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.52_windows_x86_64.exe @6htube8_53_fold_and_dock_flags -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip fold_and_dock_6htube8_53_data.zip -nstruct 10000 -cpu_run_time 21600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3669074
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_3d2618f.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/fold_and_dock_6htube8_53_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
[2014-10-19 8:43:40:] :: BOINC:: Initializing ... ok.
[2014-10-19 8:43:40:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.52_windows_x86_64.exe @6htube8_53_fold_and_dock_flags -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip fold_and_dock_6htube8_53_data.zip -nstruct 10000 -cpu_run_time 21600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3669074
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_3d2618f.zip
error: cannot create ./minirosetta_database/chemical/mm_atom_type_sets/fa_standard/top_all27_prot_na.rtf
error: cannot create ./minirosetta_database/rotamer/ExtendedOpt1-5/asn.bbdep.rotamers.lib.gz
error: cannot create ./minirosetta_database/rotamer/ExtendedOpt1-5/asp.bbdep.rotamers.lib.gz
error: cannot create ./minirosetta_database/rotamer/ExtendedOpt1-5/cys.bbdep.rotamers.lib.gz
error: cannot create ./minirosetta_database/rotamer/ExtendedOpt1-5/his.bbdep.rotamers.lib.gz
error: cannot create ./minirosetta_database/rotamer/ExtendedOpt1-5/lys.bbdep.rotamers.lib.gz
error: cannot create ./minirosetta_database/sampling/fragpicker_rama_tables/H_dcP.counts.gz
error: cannot create ./minirosetta_database/sampling/orientations/orientgridall/data/c48u54799.grid.gz
error: cannot create ./minirosetta_database/sampling/spheres/sphere_17282_neighbors.dat.gz
error: cannot create ./minirosetta_database/scoring/score_functions/etable/etable.twobead.dlj.dat.gz
error: cannot create ./minirosetta_database/scoring/score_functions/goap/angle_table.dat.gz
error: cannot create ./minirosetta_database/scoring/score_functions/goap/dist_table.dat.gz
error: cannot create ./minirosetta_database/sequence/genome_9mers/Homo_sapiens.GRCh37.66.pep.all.9mers.sort.refscore.gz
error: cannot create ./minirosetta_database/sequence/genome_9mers/Mus_musculus.NCBIM37.66.pep.all.9mers.sort.refscore.gz
error: cannot create ./minirosetta_database/sequence/tcell_ep_9mers/HLA-DRB10701.refscore.gz
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/fold_and_dock_6htube8_53_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00006_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00006_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00006_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00006_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00006_FragmentSampler__stage4_kk_2 ... success!

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: ..\..\..\src\core\pose\symmetry\util.cc line: 889

ERROR: Unable to open database file for Dun10 rotamer library: minirosetta_database\rotamer/ExtendedOpt1-5/cys.bbdep.rotamers.lib
ERROR:: Exit from: ..\..\..\src\core\pack\dunbrack\RotamerLibrary.cc line: 1145

ERROR: Energies:: operation NOT permitted during scoring.
ERROR:: Exit from: ..\..\..\src\core\scoring\Energies.cc line: 372
std::cerr: Exception was thrown:

[ERROR] EXCN_utility_exit has been thrown from: ..\..\..\src\core\scoring\Energies.cc line: 372
ERROR: Energies:: operation NOT permitted during scoring.

</stderr_txt>
]]>

sgaboinc

Joined: Apr 2 14
Posts: 170
ID: 498515
Credit: 125,409
RAC: 0
Message 77606 - Posted 25 Oct 2014 5:17:29 UTC
Last modified: 25 Oct 2014 5:48:31 UTC

feature request: currently target run time can be selected in even increments e.g. 2 hours, 4 hours, 6, 8, 10, 12, etc
however, i may like to specify 3,5,7,9 etc

could kindly allow the target run time to be selected for each hour instead rather than even hours?

that'd allow finer granuality to find the tradeoff between number of models completed each run, the processing prowess of each cpu and personal preferences dependent on the situation (e.g. the max run time available for the evening before deciding to shutdown the pc)

Matti Ollikainen

Joined: Jun 23 11
Posts: 1
ID: 422864
Credit: 3,177
RAC: 0
Message 77693 - Posted 27 Nov 2014 16:16:39 UTC
Last modified: 27 Nov 2014 16:25:11 UTC

Hi! When Boinc screensaver is on with pictures of proteins, numeric details etc. I can see also infomation of the calculation and the project itself. What puzzles me is the discontinuity of the calculation. Even if the computer is obviousaly idle otherwise. The calculation starts and stops, starts and stops,
about a half second and a second each phase. >Is this normal < or is perhaps the antiviral surveillance program or the firewall defence (Avast, Comodo resp.) to be suspected? The os is Win 7 and Boinc v.7.4.27 x64.

-At least Comodo does some harm to Boinc and often isolates minirosetta (v.3.52) projects as unknown programs. I've decreased the safety level of Defence+ module and this seems help somewhat.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77694 - Posted 28 Nov 2014 0:26:32 UTC

In order to ensure the majority of the idle computing power goes to useful work, the resources devoted to the graphic are limited, so it is not constantly refreshed.

In order to ensure the computing resources are available for whatever the primary purpose of the machine is, the tasks run at low priority, so if any other task on the machine pops in wanting CPU, it will take precedence. This can result in the CPU allocation being choppy as other, higher priority tasks request CPU.
____________
Rosetta Moderator: Mod.Sense

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 475,455
RAC: 852
Message 77777 - Posted 29 Dec 2014 19:10:08 UTC

I just got a new computation error. I recently switched to running work units for 2 days in the middle of running result 707976354 from one day, so I don't know if this is a problem caused by switching in the middle of the work unit or a bug in Minirosetta 3.52.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77780 - Posted 29 Dec 2014 21:27:03 UTC - in response to Message ID 77777.
Last modified: 29 Dec 2014 21:42:05 UTC

I just got a new computation error. I recently switched to running work units for 2 days in the middle of running result 707976354 from one day, so I don't know if this is a problem caused by switching in the middle of the work unit or a bug in Minirosetta 3.52.


No, I've not seen changing the runtime preference cause any problem. Only issue with that might be if you made it shorter than an active task had already run.

[edit]...But now I see similar errors on other tasks that tried to run with the 48hr runtime preference, so I've sent a note to the Project Team to look in to that. Appears the tasks have an internal runtime limit that may need to be extended to match the 48hrs (plus the 4hr watchdog).

Until they get a chance to resolve it, I'd suggest going back to the 24hr runtime preference.
____________
Rosetta Moderator: Mod.Sense

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 475,455
RAC: 852
Message 77781 - Posted 30 Dec 2014 4:31:41 UTC - in response to Message ID 77780.
Last modified: 30 Dec 2014 4:37:48 UTC

I just got a new computation error. I recently switched to running work units for 2 days in the middle of running result 707976354 from one day, so I don't know if this is a problem caused by switching in the middle of the work unit or a bug in Minirosetta 3.52.


No, I've not seen changing the runtime preference cause any problem. Only issue with that might be if you made it shorter than an active task had already run.

[edit]...But now I see similar errors on other tasks that tried to run with the 48hr runtime preference, so I've sent a note to the Project Team to look in to that. Appears the tasks have an internal runtime limit that may need to be extended to match the 48hrs (plus the 4hr watchdog).

Until they get a chance to resolve it, I'd suggest going back to the 24hr runtime preference.

Thanks!

By the way, since you noticed that the 2 day limit apparently causes errors, you might want to have that option edited to tell users to not use it or better yet to remove that option and force everyone on the 2 day limit to the 1 day limit and announce why it was done on the message board and news sections.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 77836 - Posted 17 Jan 2015 1:51:16 UTC

Hi.

I just checked why my RAC has gone down, it seems to be because of these type of tasks are all getting Validate errors.

For my rigs and the others that have done the same tasks.

This is just a few that I found of mine.

=======================================

svm_above_15_v2_1Q9BA_abinitio_SAVE_ALL_OUT_236779_94


svm_both_v2_1M8AA_abinitio_SAVE_ALL_OUT_235545_52


svm_above_15_v2_1OKSA_abinitio_SAVE_ALL_OUT_236783_53


svm_both_3AUBA_abinitio_SAVE_ALL_OUT_235153_97



____________


JugNut

Joined: Apr 30 12
Posts: 11
ID: 450030
Credit: 1,211,988
RAC: 13
Message 77874 - Posted 29 Jan 2015 13:50:45 UTC

Not sure whats goes on here.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713895694
I have my WU's set to 1 hour but this WU went for 5.1hrs. Is this behavior normal with some types of tasks?
On a side note besides the long time is the small credit given instead of getting 5 times more credit as was asked for instead the WU recieved 5 times less credit. This certainly isn't the first time i've seen this happen either but thankfully only seems to happen a few times a day. So is it mormal?

Any idea's?

TIA

JugNut

Joined: Apr 30 12
Posts: 11
ID: 450030
Credit: 1,211,988
RAC: 13
Message 77875 - Posted 29 Jan 2015 16:44:48 UTC
Last modified: 29 Jan 2015 17:13:09 UTC

Sorry my bad it's not the above link, it's this one.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713894281

@ P . P . L: I have many of those validate errors too but you still end up getting credited for them in the end. It's considered a normal part of the process. The way I understand it is these WU's are given credit by a script once every 24 hrs but it doesn't show up in your results in the normal spot. If you wait 24/48hrs then click the task details link you'll see right down the very bottom that they did get credited eventually after a day or two.

Like this one of yours.. http://boinc.bakerlab.org/rosetta/result.php?resultid=713405534

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77879 - Posted 30 Jan 2015 7:33:20 UTC - in response to Message ID 77875.

Sorry my bad it's not the above link, it's this one.. https://boinc.bakerlab.org/rosetta/result.php?resultid=713894281

At the end of the log it looks like the watchdog had to force the task to shut down - the watchdog is a kind of fail-safe if something goes wrong with the task and it doesn't complete properly.

It happens very occasionally. Looks like you were unlucky with that one.

____________

JugNut

Joined: Apr 30 12
Posts: 11
ID: 450030
Credit: 1,211,988
RAC: 13
Message 77880 - Posted 30 Jan 2015 12:41:46 UTC
Last modified: 30 Jan 2015 12:44:03 UTC

Thanks for answering Sid. Your right luckily there only seems to be about 4 or 5 a day but it adds up. Especially when for some reason they only get credited with a fraction of what they should. Also it's hard to tell exactly how many there are as I would have to search through 100's of tasks each day to find them.

Rosettas task viewing leaves much to be desired. As you'd know on other projects you can click on say errors & get a list of errors or do a search for a particular task name, that would be a big help here.

PS I've just noticed that out of the blue i'm having comp errors like this https://boinc.bakerlab.org/rosetta/result.php?resultid=714075014 on one of my PC's. It's the same exact error as described in this post above. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6444&nowrap=true#76816
So far i've had more than a dozen of these over the last 4hrs or so the strange thing is many of them end up getting validated by the next guy along. Not sure if that makes any difference or not but in most cases where it does gets validated it was by someone using Linux . Just a thought? I'll keep digging into it.

Thanks again..

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77884 - Posted 2 Feb 2015 4:45:21 UTC - in response to Message ID 77880.

Thanks for answering Sid. Your right luckily there only seems to be about 4 or 5 a day but it adds up. Especially when for some reason they only get credited with a fraction of what they should. Also it's hard to tell exactly how many there are as I would have to search through 100's of tasks each day to find them.

Rosetta's task viewing leaves much to be desired. As you'd know on other projects you can click on say errors & get a list of errors or do a search for a particular task name, that would be a big help here...

You're right on that last bit about sorting tasks by error etc. 4 or 5 a day sounds a lot, so I just had a look at your machines and tasks and... holy moly!

Why is it you have really fast computers but you run with just a 1hr runtime? You have near 1000 tasks per machine either complete or in progress! That must be taking up massive band width at both ends. In the context of 1000s, the occasional few tasks going wrong is trivial. I thought 4-5 would be a lot.
____________

JugNut

Joined: Apr 30 12
Posts: 11
ID: 450030
Credit: 1,211,988
RAC: 13
Message 77885 - Posted 3 Feb 2015 11:14:55 UTC
Last modified: 3 Feb 2015 11:58:12 UTC

Hi Sid,
Thank you again for your reply. The reason I use 1hr is simply because it credits the most, or at least it certainly seems too. While credits are no where near the top of my list for crunching they are like for most others a side interest. On different occasions I checked other PC's with similar rigs to mine & those I checked on that were using larger times than me on average never got got equal to what I was getting. And I figure since i'm helping anyway what does it matter? After all if using the 1hr option was bad why would it still be an option?

If it was an imperative to get crunchers to use longer times then there would be an advantage for them to do so, at the moment there isn't. A simple way to achieve this if it is indeed a project necessity would be to offer crunchers a bonus for crunching longer times for the extra risk & commitment involved in doing so. Things go pear shaped here more than most other projects. Other projects give bonuses for quick return & doing long tasks so it could be done here too.

Plus The extra overhead at my end seems negligible when running larger size units. Although I didn't do a thorough check when last I used the longer times so I could be wrong about it. Of course if it became a necessity for the projects good then I would oblige happily. With the errors & problems I had well if i'm having them then there could well be who knows how many others with such errors so I thought they would be worth reporting, especially since the majority of crunchers don't use the forums at all & when they find to many errors will just move on.

Crunch-on Cheers Greg.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 77886 - Posted 3 Feb 2015 17:19:33 UTC

Short runtimes simply report and claim credit sooner. The level of runtime and network overhead is reduced by longer runtimes. Credit is very hard to compare, as different tasks can have different performance characteristics. There is some level of overhead just opening up the zip files and reference data that is used by a task, so the less times you do that in a day, the less overhead in the processing. Longer runtimes should be a smidge more efficient. Also reduces the number of tasks on your pending and completed lists, and reduces the number of hits to the project servers. I don't think anyone intended to imply a long runtime was "imperative". Just that it may offer some benefits for you by reducing the overall number of tasks, disk space requirements, etc. The underlying work results are the same, so there is no premium either way for return time nor run length. The choice is there to help adapt to various usage scenarios.

BEWARE, changes to runtime preference will effect tasks currently on your machine and BOINC has to crunch a few with the new runtime preference before it accurately factors it in to it's future work requests. So ideally you reduce your work buffer, and change runtime preference gradually over the course of a week. Then bump the buffer of work back up as desired. Also, there currently seems to be an issue with the 2 day preference, so I suggest using 1 day.
____________
Rosetta Moderator: Mod.Sense

JugNut

Joined: Apr 30 12
Posts: 11
ID: 450030
Credit: 1,211,988
RAC: 13
Message 77889 - Posted 4 Feb 2015 2:23:43 UTC
Last modified: 4 Feb 2015 2:40:56 UTC

No worries mod sense i'll give that some thought & also try some longer timed WU's later & see what there like now.
Thanks for your time.

Greg

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77890 - Posted 4 Feb 2015 5:02:05 UTC - in response to Message ID 77885.

Hi Sid,
Thank you again for your reply. The reason I use 1hr is simply because it credits the most, or at least it certainly seems too. While credits are no where near the top of my list for crunching they are like for most others a side interest. On different occasions I checked other PC's with similar rigs to mine & those I checked on that were using larger times than me on average never got got equal to what I was getting. And I figure since i'm helping anyway what does it matter? After all if using the 1hr option was bad why would it still be an option?

If it was an imperative to get crunchers to use longer times then there would be an advantage for them to do so, at the moment there isn't. A simple way to achieve this if it is indeed a project necessity would be to offer crunchers a bonus for crunching longer times for the extra risk & commitment involved in doing so. Things go pear shaped here more than most other projects. Other projects give bonuses for quick return & doing long tasks so it could be done here too.

Plus The extra overhead at my end seems negligible when running larger size units. Although I didn't do a thorough check when last I used the longer times so I could be wrong about it. Of course if it became a necessity for the projects good then I would oblige happily. With the errors & problems I had well if i'm having them then there could well be who knows how many others with such errors so I thought they would be worth reporting, especially since the majority of crunchers don't use the forums at all & when they find to many errors will just move on.

Crunch-on Cheers Greg.

I didn't mean to make a big deal about bandwidth (though it is a side issue). I just meant it was such a chore to work through your task lists to see what issue you were having. 4 or 5 errors is a lot with the default 6hr runtimes, but at 1hr (with up to 16 cores running at a time) that's 4/5 out of 384 tasks a day, not 48.

For what it's worth, I did see someone experiment with different runtimes on a machine and the differences were barely perceptible, with just the slightest advantage to longer runtimes (nothing conclusive either way though). On the back of that, also with the bandwidth usage in mind, I admit, I decided to change from the default to 8hrs, but it's completely down to you.

If you're looking to maximise credit, I guess it's worth bearing in mind that if you have a rogue task, like the one you first reported, instead of over-running by 4hrs on a 1hr task (watchdog cuts in at runtime +4hrs) you lose 5 tasks worth of processing, whereas a default 6hr task will run for 10hrs, only losing 1.67 tasks worth of processing. This is very much splitting hairs though. Whatever suits you.

As mod.sense says, don't make a dramatic change. Either run down tasks first before switching and\or only change runtime by one step at a time. If 1000 tasks at 1hr suddenly became 1000 at 6hrs you'd have a problem!
____________

JugNut

Joined: Apr 30 12
Posts: 11
ID: 450030
Credit: 1,211,988
RAC: 13
Message 77891 - Posted 4 Feb 2015 8:14:05 UTC

Hi Sid,
It was me who had the wrong slant on things probably from skimming posts that I read before I read yours. Thanks again for your help I hope I can return the favour some day.

Cheers Greg

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 475,455
RAC: 852
Message 77893 - Posted 4 Feb 2015 20:24:12 UTC
Last modified: 4 Feb 2015 20:26:07 UTC

Work unit 647152330 generated result files that were too big to upload when the work unit processing time limit is set to 24 hours. Please see my result log and the result log for someone who used a shorter work unit time limit.

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 475,455
RAC: 852
Message 77895 - Posted 5 Feb 2015 4:36:37 UTC - in response to Message ID 77893.

Work unit 647152330 generated result files that were too big to upload when the work unit processing time limit is set to 24 hours. Please see my result log and the result log for someone who used a shorter work unit time limit.

I found the relevant BOINC event log entries by digging into the appropriate BOINC data directory. By default, this file is located at C:\ProgramData\BOINC\stdoutdae.old in Windows 7. The BOINC event log entries are listed below.
02-Feb-2015 13:14:03 [rosetta@home] Computation for task A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0 finished
02-Feb-2015 13:14:03 [rosetta@home] Output file A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0_0 for task A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0 exceeds size limit.
02-Feb-2015 13:14:03 [rosetta@home] File size: 65833683.000000 bytes. Limit: 50000000.000000 bytes

I therefore will have to change my preferences to 12 hour work units to prevent this error once my current work units drain out unless the file upload size limit is raised.

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 9,805,706
RAC: 6,914
Message 77899 - Posted 6 Feb 2015 3:01:11 UTC - in response to Message ID 77895.

Work unit 647152330 generated result files that were too big to upload when the work unit processing time limit is set to 24 hours. Please see my result log and the result log for someone who used a shorter work unit time limit.

I found the relevant BOINC event log entries by digging into the appropriate BOINC data directory. By default, this file is located at C:\ProgramData\BOINC\stdoutdae.old in Windows 7. The BOINC event log entries are listed below.
02-Feb-2015 13:14:03 [rosetta@home] Computation for task A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0 finished
02-Feb-2015 13:14:03 [rosetta@home] Output file A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0_0 for task A__2_2015_01_29_B__2_2015_01_29_patchdock_split_02_150129_SAVE_ALL_OUT__242418_37_0 exceeds size limit.
02-Feb-2015 13:14:03 [rosetta@home] File size: 65833683.000000 bytes. Limit: 50000000.000000 bytes

I therefore will have to change my preferences to 12 hour work units to prevent this error once my current work units drain out unless the file upload size limit is raised.

Blimey! That's a new one! I've never come across an output file that big and I never knew there was a limit to the filesize either.
____________

Costa

Joined: Jul 19 15
Posts: 5
ID: 1127732
Credit: 6,519,588
RAC: 2
Message 78496 - Posted 26 Jul 2015 2:29:28 UTC

Do you do GPU crunching or plan to?
Do you support NVidia and/or ATI?

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78498 - Posted 26 Jul 2015 4:26:25 UTC
Last modified: 26 Jul 2015 4:27:14 UTC

Short answer to both NO.
____________


Message boards : Number crunching : Minirosetta 3.52


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^