Message boards : Number crunching : Incredible
Author | Message |
---|---|
Shoikan Send message Joined: 4 Apr 06 Posts: 14 Credit: 180,211 RAC: 0 |
To the attention of this project manager: Your buggy client/WUs is wasting many valuable computing cicles. Do not advise it as a working project, it is at beta state, saying the best. PS: sorry about the lousy english. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Rosetta acknowledges there are issues. It created an entirely seperate project called Ralph (Rosetta Alpha) to make a good faith effort to find and kill these bugs. tony PS your english is perfectly understandable |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space? |
Shoikan Send message Joined: 4 Apr 06 Posts: 14 Credit: 180,211 RAC: 0 |
Then, why aren't they testing their new workunits in the testing environment? I can't understand it. Regards and thank you for replying. |
Shoikan Send message Joined: 4 Apr 06 Posts: 14 Credit: 180,211 RAC: 0 |
How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space? OK, I'll do it, but still doesn't replied to my question. Regards. |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Then, why aren't they testing their new workunits in the testing environment? Here is a some words from David Baker. As you know, I mistakenly sent out a large batch of jobs without properly testing them first on RALPH. I apologize again for the trouble this caused you over the weekend. This was version 4.97 Anders n |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Then, why aren't they testing their new workunits in the testing environment? I think they are, for the most part, a recent type of WU got released too soon. Dr. Baker apologized to us (the user), and I think that it shouldn't happen again soon (I.E he learned his lesson). I don't think they'd have created and actively giving time to Ralph if they weren't serious about improving our experience. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
There are several issues going on concurrently last few days and I honestly can't blame many of the people who got upset. 1/ The Rosetta 4.98 software upgrade last Friday/Saturday which had to be rolled back (see prior posts in this thread) 2/ The faulty WUs series which Rhiju, Project scientist, asked to be aborted (obviously, watching # of pageviews in the forum, less than 1% of active crunchers read these forums here!) Please do abort these workunits (below); otherwise, your client will continue to crunch the jobs until it times out (about 48 hours on a Windows machine). The good news is that we will give credit to all the jobs that time out, and are increasing the rigor of in-house testing to prevent this from happening in the future. And this little adventure helped us track down a pernicious bug in our code. Unfortunately, we don't yet have fixes for *all* the stuck jobs, though -- please continue to post info on other jobs that stop moving. It helps! 3/ The "big" *_largescale_large_fullatom_relax_* WUs of the last few days, which are going to upset even more people, because they'll effectively (NOT due to bugs, but due to the combination of PC-speed and BOINC-interaction) get "stuck" on slower PCs / non-24/7 / switch-apps-every < 4hr / leave-in-mem-when-preempted=no, as those PCs won't be able to complete even ONE (1) model before Rosetta gets unloaded from mem to run another BOINC project and will start again all over from scratch, ad infinitum, until WU times out. Also, as progress stays at 1% for hours, people will (mistakenly) assume it's the 1% bug, abort them, receive 0 credit and complain etc. Just look at the posts of the last few hours. This isn't pretty... PS: Having said that, I should still add that personally have had ONLY ONE (1) WU get stuck which I had to manually abort, in 3+ months of crunching for the project with 3 x P4 PCs (mostly 24/7). Maybe I've been lucky. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space? Shoikan: The request was to make it possible to identify which problem or problems were affecting you, and point out any known cures for said problems. The only errors I've had in the last few months were from buggy WUs, and one ghosted WU. And they've now promised to wait until the Ralph test runs are over before releasing WUs to Rosetta. |
Shoikan Send message Joined: 4 Apr 06 Posts: 14 Credit: 180,211 RAC: 0 |
|
Message boards :
Number crunching :
Incredible
©2024 University of Washington
https://www.bakerlab.org