Minirosetta 3.52

Message boards : Number crunching : Minirosetta 3.52

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
Killersocke@rosetta

Send message
Joined: 13 Nov 06
Posts: 29
Credit: 2,579,125
RAC: 0
Message 76948 - Posted: 2 Jul 2014, 13:55:21 UTC

Client error/Compute error

Task ID 671091234
Workunit 608788253
Created 30 Jun 2014

Task ID 671559600
Workunit 608647393
Created 2 Jul 2014

Task ID 671560728
Workunit 609197398
Created 2 Jul 2014

Task ID 671576410
Workunit 609210460
Created 2 Jul 2014

Task ID 671592963
Workunit 609224067
Created 2 Jul 2014

As I get no response in this board.
To whom I do need to escalate this issues?

I will stop working until this problem will be solved.

regards
ID: 76948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 273,488,896
RAC: 105,997
Message 76963 - Posted: 6 Jul 2014, 10:10:54 UTC
Last modified: 6 Jul 2014, 10:11:43 UTC

In my case all units failing with client error/computer error are of the type with name starting by:

pd1_graftsheet_41limit_.....

but only some of them, the majority complete OK. No WU starting with a different name is causing issues, at least as repetitive as this one.

I don't think it could be related to the lenght of the WU names because there are other units with longer names that complete OK.

All my PCs are Ubuntu and the error message is always the same one:

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

In nearly all the cases wingmen fail as well, I've only seen one case on which it succeeded.

In my case the failure is specially annoying because it occurs when the unit has been processing for a long time, wingmen seem to fail just at the beggining.

It is more difficult to trace errors because the database does not allow see long names, to sort by status etc.
ID: 76963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 76967 - Posted: 7 Jul 2014, 14:29:11 UTC - in response to Message 76963.  

It is more difficult to trace errors because the database does not allow see long names, to sort by status etc.


Yes and the server software is very obsolete, but no-one on this project does bother to update, as most other projects have several years back.
Greetings,
TJ.
ID: 76967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 7,029
Message 76968 - Posted: 7 Jul 2014, 15:07:09 UTC

Yes and the server software is very obsolete, but no-one on this project does bother to update


I'm ignorant about that, but...it's so difficult to update the boinc server code? Have they to recompile all the software?
I found this, seems to be a "standard" procedure
http://boinc.berkeley.edu/trac/wiki/ToolUpgrade
ID: 76968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 76973 - Posted: 8 Jul 2014, 9:01:17 UTC - in response to Message 76968.  

I'm ignorant about that, but...it's so difficult to update the boinc server code? Have they to recompile all the software?
I found this, seems to be a "standard" procedure
http://boinc.berkeley.edu/trac/wiki/ToolUpgrade

It was reported in the past, that Rosetta has quite many modifications of the server software (this is not the first time that someone asks for a new server version). Getting all of that working with few generations newer server software might not be as easy task as using Windows Update. Hence Rosetta won't update the servers as long as possible, apparently even the issues with v7 clients were not reason enough.
.
ID: 76973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 7,029
Message 76975 - Posted: 8 Jul 2014, 9:44:25 UTC

It was reported in the past, that Rosetta has quite many modifications of the server software (this is not the first time that someone asks for a new server version). Getting all of that working with few generations newer server software might not be as easy task as using Windows Update.


Thanks for the answer!
ID: 76975 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Darrall

Send message
Joined: 9 Nov 08
Posts: 2
Credit: 406,925
RAC: 151
Message 76995 - Posted: 12 Jul 2014, 11:43:47 UTC

The project seems to be running okay so far, but the Minirosetta 3.52 graphics won't run...just returns "Not Responding." It doesn't seem to be affecting anything else.

This is on an older Core2 x86 running 32b Vista with integrated Intel graphics. This systems was recently rebuilt so all drivers are up to date.

Older versions of Rosetta have run fine in the past...

Thank you!
ID: 76995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Killersocke@rosetta

Send message
Joined: 13 Nov 06
Posts: 29
Credit: 2,579,125
RAC: 0
Message 76997 - Posted: 12 Jul 2014, 18:13:37 UTC

Ok Guys
this Project ist now dead for me.

Me

Good bye
ID: 76997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 77000 - Posted: 12 Jul 2014, 20:40:22 UTC

I'm also seeing the occasional failure "bad_alloc" failure on Ubuntu 14.04.

Sample: Task 673942668

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc

</stderr_txt>
ID: 77000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 77004 - Posted: 12 Jul 2014, 22:19:45 UTC

Errors keep flowing in, but no answer from the project guys as usual.
I have set it to very low priority.
Greetings,
TJ.
ID: 77004 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 7,029
Message 77009 - Posted: 14 Jul 2014, 9:32:21 UTC

Some "benchmark_alex_metric" erros (after 2 h), like this:
673949101

upload failure: <file_xfer_error>
<file_name>benchmark_0023_alex_metric_332d631779a7456b4ee7d35a7bbca2b0522dcada_ploops_42_input_0002_no_lig_fragments_contact_opt_iteration_6_a2d64bd69c1740e4b71103f0f2d70098_fold_SAVE_ALL_OUT_173723_311_1_0</file_name>
<error_code>-161 (not found)</error_code>

ID: 77009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 7,029
Message 77011 - Posted: 14 Jul 2014, 12:41:40 UTC

674405695

<message>
(unknown error) - exit code -529697949 (0xe06d7363)
</message>
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x75961D4D


ID: 77011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Monty

Send message
Joined: 14 Oct 07
Posts: 4
Credit: 146,497
RAC: 0
Message 77012 - Posted: 14 Jul 2014, 18:02:04 UTC

Same here , like boboviz above me:

Task ID: 674414117



ID: 77012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 7,029
Message 77016 - Posted: 16 Jul 2014, 8:21:58 UTC

Some "benchmark_alex_metric" erros (after 2 h)


Erratum: A LOT of "benchmark_alex_metric" errors after 2h
ID: 77016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 77019 - Posted: 16 Jul 2014, 9:49:46 UTC - in response to Message 77016.  

Some "benchmark_alex_metric" erros (after 2 h)


Erratum: A LOT of "benchmark_alex_metric" errors after 2h

We can keep posting about the erors but nothing changes, as ever here, we have to wait until all these crappy WU's are finished, or at least run at our systems while using energy but not contributing to science. It's a pity.
Greetings,
TJ.
ID: 77019 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77035 - Posted: 19 Jul 2014, 13:13:48 UTC
Last modified: 19 Jul 2014, 13:25:24 UTC

perhaps the rosetta designers need to enhance the app or even the web so that the 'errors' are more meaningful, lol :)

we've a scientist who at least came out to clarify one of the 'hot' errors.
https://boinc.bakerlab.org/forum_thread.php?id=6485&nowrap=true#77013

it turns out that some errors can be simply due to the job did not find any structures. deemed 'dead end elimination' (i.e. either impossible combinations or perhaps algorithmic chaos)

in addition, i'd think that rosetta@home and even boinc may need to look into granting credits for network bandwidth consumed (it makes sense after all), in these cases the errors are due to finding 'dead ends' and if that's true it is different from no result. it could mean that that particular combination leads to a 'dead end' hence no structures

while i'm no scientist in molecular simulations, i'm aware that some algorithms/solution search can lead to systematic 'chaos' when things are sufficiently complex and non-linear

http://en.wikipedia.org/wiki/Chaos_theory


ID: 77035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,547,106
RAC: 16,165
Message 77332 - Posted: 14 Aug 2014, 2:43:35 UTC

Why is it that some tasks have had their deadline extended to 14 days while many remain at 10 days? Is it by accident or related to the extra users that have been added recently?
ID: 77332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77333 - Posted: 14 Aug 2014, 3:20:29 UTC

I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks.

Remember, you can always set the run time preference to your liking, from as low as 1 hr to as long as 2 days. It is the "Target CPU run time" option in the Rosetta@home specific preferences. Also keep in mind that it's a target run time but if the job is a large protein, it may take longer than 1 hour to generate 1 model so the actual run time can exceed the target run time (at least 1 model is generated).
ID: 77333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77334 - Posted: 14 Aug 2014, 14:43:52 UTC

Up to 2 days? So that's new too. Used to be that 1 day was the top.

2 days will be great for full-time crunchers. Minimize network bandwidth, great if your bandwidth is capped. Minimize hits on the project servers.

Just beware that if you make changes, the work you've already downloaded will switch to the new target runtime. So, best to make changes when cache of outstanding work is low, and not to bump up the days between connection time setting at the same time. You want BOINC Manager to see a few new WUs complete before changing the network settings. Again, just to help prevent it getting way too much work. Not the end of the world, you can always abort a few, or adjust your runtime preference back down a couple notches.

Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.

DK, please post a notice on the project news and perhaps the twitter feed as well.
Rosetta Moderator: Mod.Sense
ID: 77334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Defender

Send message
Joined: 22 Mar 08
Posts: 10
Credit: 13,486,408
RAC: 0
Message 77336 - Posted: 14 Aug 2014, 18:34:43 UTC

Because you are talking about twitter: Why is there no activity on the Facebook page since 2010? https://www.facebook.com/pages/Rosettahome/161671540539170
ID: 77336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : Minirosetta 3.52



©2024 University of Washington
https://www.bakerlab.org