Message boards : Number crunching : Minirosetta 3.52
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
Client error/Compute error Task ID 671091234 Workunit 608788253 Created 30 Jun 2014 Task ID 671559600 Workunit 608647393 Created 2 Jul 2014 Task ID 671560728 Workunit 609197398 Created 2 Jul 2014 Task ID 671576410 Workunit 609210460 Created 2 Jul 2014 Task ID 671592963 Workunit 609224067 Created 2 Jul 2014 As I get no response in this board. To whom I do need to escalate this issues? I will stop working until this problem will be solved. regards |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 1 |
In my case all units failing with client error/computer error are of the type with name starting by: pd1_graftsheet_41limit_..... but only some of them, the majority complete OK. No WU starting with a different name is causing issues, at least as repetitive as this one. I don't think it could be related to the lenght of the WU names because there are other units with longer names that complete OK. All my PCs are Ubuntu and the error message is always the same one: BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc In nearly all the cases wingmen fail as well, I've only seen one case on which it succeeded. In my case the failure is specially annoying because it occurs when the unit has been processing for a long time, wingmen seem to fail just at the beggining. It is more difficult to trace errors because the database does not allow see long names, to sort by status etc. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
It is more difficult to trace errors because the database does not allow see long names, to sort by status etc. Yes and the server software is very obsolete, but no-one on this project does bother to update, as most other projects have several years back. Greetings, TJ. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,560,822 RAC: 6,600 |
Yes and the server software is very obsolete, but no-one on this project does bother to update I'm ignorant about that, but...it's so difficult to update the boinc server code? Have they to recompile all the software? I found this, seems to be a "standard" procedure http://boinc.berkeley.edu/trac/wiki/ToolUpgrade |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I'm ignorant about that, but...it's so difficult to update the boinc server code? Have they to recompile all the software? It was reported in the past, that Rosetta has quite many modifications of the server software (this is not the first time that someone asks for a new server version). Getting all of that working with few generations newer server software might not be as easy task as using Windows Update. Hence Rosetta won't update the servers as long as possible, apparently even the issues with v7 clients were not reason enough. . |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,560,822 RAC: 6,600 |
It was reported in the past, that Rosetta has quite many modifications of the server software (this is not the first time that someone asks for a new server version). Getting all of that working with few generations newer server software might not be as easy task as using Windows Update. Thanks for the answer! |
Mark Darrall Send message Joined: 9 Nov 08 Posts: 2 Credit: 432,591 RAC: 296 |
The project seems to be running okay so far, but the Minirosetta 3.52 graphics won't run...just returns "Not Responding." It doesn't seem to be affecting anything else. This is on an older Core2 x86 running 32b Vista with integrated Intel graphics. This systems was recently rebuilt so all drivers are up to date. Older versions of Rosetta have run fine in the past... Thank you! |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
|
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I'm also seeing the occasional failure "bad_alloc" failure on Ubuntu 14.04. Sample: Task 673942668 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc </stderr_txt> |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Errors keep flowing in, but no answer from the project guys as usual. I have set it to very low priority. Greetings, TJ. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,560,822 RAC: 6,600 |
Some "benchmark_alex_metric" erros (after 2 h), like this: 673949101 upload failure: <file_xfer_error> |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,560,822 RAC: 6,600 |
674405695 <message> |
Monty Send message Joined: 14 Oct 07 Posts: 4 Credit: 146,497 RAC: 0 |
Same here , like boboviz above me: Task ID: 674414117 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,560,822 RAC: 6,600 |
Some "benchmark_alex_metric" erros (after 2 h) Erratum: A LOT of "benchmark_alex_metric" errors after 2h |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Some "benchmark_alex_metric" erros (after 2 h) We can keep posting about the erors but nothing changes, as ever here, we have to wait until all these crappy WU's are finished, or at least run at our systems while using energy but not contributing to science. It's a pity. Greetings, TJ. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
perhaps the rosetta designers need to enhance the app or even the web so that the 'errors' are more meaningful, lol :) we've a scientist who at least came out to clarify one of the 'hot' errors. https://boinc.bakerlab.org/forum_thread.php?id=6485&nowrap=true#77013 it turns out that some errors can be simply due to the job did not find any structures. deemed 'dead end elimination' (i.e. either impossible combinations or perhaps algorithmic chaos) in addition, i'd think that rosetta@home and even boinc may need to look into granting credits for network bandwidth consumed (it makes sense after all), in these cases the errors are due to finding 'dead ends' and if that's true it is different from no result. it could mean that that particular combination leads to a 'dead end' hence no structures while i'm no scientist in molecular simulations, i'm aware that some algorithms/solution search can lead to systematic 'chaos' when things are sufficiently complex and non-linear http://en.wikipedia.org/wiki/Chaos_theory |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2121 Credit: 41,179,074 RAC: 11,480 |
Why is it that some tasks have had their deadline extended to 14 days while many remain at 10 days? Is it by accident or related to the extra users that have been added recently? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks. Remember, you can always set the run time preference to your liking, from as low as 1 hr to as long as 2 days. It is the "Target CPU run time" option in the Rosetta@home specific preferences. Also keep in mind that it's a target run time but if the job is a large protein, it may take longer than 1 hour to generate 1 model so the actual run time can exceed the target run time (at least 1 model is generated). |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Up to 2 days? So that's new too. Used to be that 1 day was the top. 2 days will be great for full-time crunchers. Minimize network bandwidth, great if your bandwidth is capped. Minimize hits on the project servers. Just beware that if you make changes, the work you've already downloaded will switch to the new target runtime. So, best to make changes when cache of outstanding work is low, and not to bump up the days between connection time setting at the same time. You want BOINC Manager to see a few new WUs complete before changing the network settings. Again, just to help prevent it getting way too much work. Not the end of the world, you can always abort a few, or adjust your runtime preference back down a couple notches. Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start. DK, please post a notice on the project news and perhaps the twitter feed as well. Rosetta Moderator: Mod.Sense |
Defender Send message Joined: 22 Mar 08 Posts: 10 Credit: 13,517,861 RAC: 502 |
Because you are talking about twitter: Why is there no activity on the Facebook page since 2010? https://www.facebook.com/pages/Rosettahome/161671540539170 |
Message boards :
Number crunching :
Minirosetta 3.52
©2024 University of Washington
https://www.bakerlab.org