Posts by Shoikan

1) Message boards : Number crunching : two "validate errors" in 24 hours (Message 52759)
Posted 27 Apr 2008 by Shoikan
Post:
Well, that's it, two in a row today convinced me to leave this project. It's a pity though, because I think that the science behind it is very promising. But something have to be done to the SW.

I'll gladly reattach when the serious issues with WU validation are adressed.

Regards

PD: Sorry about the lousy english
2) Message boards : Number crunching : two "validate errors" in 24 hours (Message 52754)
Posted 27 Apr 2008 by Shoikan
Post:
What I mean is that I'm getting one of this errors almost every day, and that it doesn't seem like compute errors, the WUs run complete, then error out when they are uploaded.

It's a waste of cycles methinks.
3) Message boards : Number crunching : two "validate errors" in 24 hours (Message 52738)
Posted 26 Apr 2008 by Shoikan
Post:
Validate errors keep happening on a regular basis. Anyone is experiencing this same problem?

Regards.

4) Message boards : Number crunching : two "validate errors" in 24 hours (Message 52609)
Posted 19 Apr 2008 by Shoikan
Post:
Hi Mod Sense, thanks for replying.

These are the messages related to the last WU that failed to validate:

18/04/2008 7:08:37|rosetta@home|Starting 1bm8__CONTROL_ABRELAX_040808_FRAGS_UNCONDENSED_SAVE_ALL_OUT_-1bm8_-__3079_2612_0
18/04/2008 7:08:37|rosetta@home|Starting task 1bm8__CONTROL_ABRELAX_040808_FRAGS_UNCONDENSED_SAVE_ALL_OUT_-1bm8_-__3079_2612_0 using rosetta_beta version 596
18/04/2008 11:12:36|rosetta@home|Restarting task 1bm8__CONTROL_ABRELAX_040808_FRAGS_UNCONDENSED_SAVE_ALL_OUT_-1bm8_-__3079_2612_0 using rosetta_beta version 596
18/04/2008 12:44:27|rosetta@home|Computation for task 1bm8__CONTROL_ABRELAX_040808_FRAGS_UNCONDENSED_SAVE_ALL_OUT_-1bm8_-__3079_2612_0 finished
18/04/2008 12:44:33|rosetta@home|Started upload of 1bm8__CONTROL_ABRELAX_040808_FRAGS_UNCONDENSED_SAVE_ALL_OUT_-1bm8_-__3079_2612_0_0
18/04/2008 12:44:36|rosetta@home|Finished upload of 1bm8__CONTROL_ABRELAX_040808_FRAGS_UNCONDENSED_SAVE_ALL_OUT_-1bm8_-__3079_2612_0_0

Seems pretty normal to me, nothing sugests any abnormal happened to the WU.

Regards
5) Message boards : Number crunching : two "validate errors" in 24 hours (Message 52601)
Posted 18 Apr 2008 by Shoikan
Post:
Another one today :(

It's really annoying. Can't anybody tell me what's happening?

Regards
6) Message boards : Number crunching : two "validate errors" in 24 hours (Message 52481)
Posted 15 Apr 2008 by Shoikan
Post:
If I'm not wrong this error is related to some problem in rosseta's servers end that lose the WUs.

Is this normal? Can I do something in my end to fix it?

Regards.
7) Message boards : Number crunching : Report Problems with Rosetta Version 5.25 (Message 19678)
Posted 2 Jul 2006 by Shoikan
Post:
I've had 5 FRA_t329 WU's resulting in compute errors just in a few seconds of computing all in a row and with different computers.

Here they are:

26659064
26667400
26668807
26669392
26670318

Regards
8) Message boards : Number crunching : Miscellaneous Work Unit Errors Version 5.01 (Message 14289)
Posted 21 Apr 2006 by Shoikan
Post:
First 5.01 error:

17794627

Error code (an oldie):

<message> - exit code -1073741811 (0xc000000d)
</message>

Regards.
9) Message boards : Number crunching : Miscellaneous Work Unit Errors - II (Message 14032)
Posted 18 Apr 2006 by Shoikan
Post:
10) Message boards : Number crunching : No checkpoint in more than 1 hour - Largescale_large_fullatom... (Message 13944)
Posted 17 Apr 2006 by Shoikan
Post:
Thanks for the quick replys!

So do I need to have my switch time > time for the entire WU to complete?
Isn't there any checkpointing?

-Sid


ALL Work Units will checkpoint at the completion of a model. For some Work Units this means every 5 minuets, for larger ones this could mean 5 or six hours. Also ALL Work units will complete AT LEAST one model no matter how you set your user selectable time setting

The BEST answer if you can do it, is to set your preferences to keep the application in memory during a swap. You could try to set the swap time to 4+ hours, but there is no guarantee that that will make it to a checkpoint. It depends on the size of the protein.

Also keep in mind that "keep in memory" only works if you do not turn your machine off, or stop BOINC for some reason, as these actions would also remove the application from memory.


This issue has to be adressed ASAP. Many cycles go directly to the trash can because of this. An improved checkpointing system should be #1 priority on the TO DO list of the development team of Rosie.

Regards.
11) Message boards : Number crunching : Incredible (Message 13815)
Posted 15 Apr 2006 by Shoikan
Post:
12) Message boards : Number crunching : Incredible (Message 13747)
Posted 14 Apr 2006 by Shoikan
Post:
How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space?


OK, I'll do it, but still doesn't replied to my question.

Regards.
13) Message boards : Number crunching : Incredible (Message 13746)
Posted 14 Apr 2006 by Shoikan
Post:
Then, why aren't they testing their new workunits in the testing environment?

I can't understand it.

Regards and thank you for replying.
14) Message boards : Number crunching : Incredible (Message 13742)
Posted 14 Apr 2006 by Shoikan
Post:
To the attention of this project manager:

Your buggy client/WUs is wasting many valuable computing cicles.

Do not advise it as a working project, it is at beta state, saying the best.

PS: sorry about the lousy english.






©2024 University of Washington
https://www.bakerlab.org