Incredible

Message boards : Number crunching : Incredible

To post messages, you must log in.

AuthorMessage
Shoikan

Send message
Joined: 4 Apr 06
Posts: 14
Credit: 180,211
RAC: 0
Message 13742 - Posted: 14 Apr 2006, 18:14:27 UTC

To the attention of this project manager:

Your buggy client/WUs is wasting many valuable computing cicles.

Do not advise it as a working project, it is at beta state, saying the best.

PS: sorry about the lousy english.
ID: 13742 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 13744 - Posted: 14 Apr 2006, 18:27:29 UTC

Rosetta acknowledges there are issues. It created an entirely seperate project called Ralph (Rosetta Alpha) to make a good faith effort to find and kill these bugs.

tony

PS your english is perfectly understandable


ID: 13744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 13745 - Posted: 14 Apr 2006, 18:37:20 UTC

How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space?


ID: 13745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shoikan

Send message
Joined: 4 Apr 06
Posts: 14
Credit: 180,211
RAC: 0
Message 13746 - Posted: 14 Apr 2006, 18:37:35 UTC

Then, why aren't they testing their new workunits in the testing environment?

I can't understand it.

Regards and thank you for replying.

ID: 13746 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shoikan

Send message
Joined: 4 Apr 06
Posts: 14
Credit: 180,211
RAC: 0
Message 13747 - Posted: 14 Apr 2006, 18:39:39 UTC - in response to Message 13745.  

How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space?


OK, I'll do it, but still doesn't replied to my question.

Regards.
ID: 13747 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 13748 - Posted: 14 Apr 2006, 18:41:28 UTC - in response to Message 13746.  

Then, why aren't they testing their new workunits in the testing environment?

I can't understand it.

Regards and thank you for replying.


Here is a some words from David Baker.

As you know, I mistakenly sent out a large batch of jobs without properly testing them first on RALPH. I apologize again for the trouble this caused you over the weekend.


This was version 4.97

Anders n
ID: 13748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 13749 - Posted: 14 Apr 2006, 18:42:24 UTC - in response to Message 13746.  

Then, why aren't they testing their new workunits in the testing environment?

I can't understand it.

Regards and thank you for replying.

I think they are, for the most part, a recent type of WU got released too soon. Dr. Baker apologized to us (the user), and I think that it shouldn't happen again soon (I.E he learned his lesson). I don't think they'd have created and actively giving time to Ralph if they weren't serious about improving our experience.

ID: 13749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 13750 - Posted: 14 Apr 2006, 18:50:13 UTC
Last modified: 14 Apr 2006, 19:30:37 UTC

There are several issues going on concurrently last few days and I honestly can't blame many of the people who got upset.

1/ The Rosetta 4.98 software upgrade last Friday/Saturday which had to be rolled back (see prior posts in this thread)

2/ The faulty WUs series which Rhiju, Project scientist, asked to be aborted (obviously, watching # of pageviews in the forum, less than 1% of active crunchers read these forums here!)

Please do abort these workunits (below); otherwise, your client will continue to crunch the jobs until it times out (about 48 hours on a Windows machine). The good news is that we will give credit to all the jobs that time out, and are increasing the rigor of in-house testing to prevent this from happening in the future. And this little adventure helped us track down a pernicious bug in our code. Unfortunately, we don't yet have fixes for *all* the stuck jobs, though -- please continue to post info on other jobs that stop moving. It helps!

Jobs that should be aborted:
TRUNCATE_TERMINI_FULLRELAX_1enh__433
TRUNCATE_TERMINI_FULLRELAX_1b3aA_433
TRUNCATE_TERMINI_FULLRELAX_1ptq__433
TRUNCATE_TERMINI_FULLRELAX_2tif__433



3/ The "big" *_largescale_large_fullatom_relax_* WUs of the last few days, which are going to upset even more people, because they'll effectively (NOT due to bugs, but due to the combination of PC-speed and BOINC-interaction) get "stuck" on slower PCs / non-24/7 / switch-apps-every < 4hr / leave-in-mem-when-preempted=no, as those PCs won't be able to complete even ONE (1) model before Rosetta gets unloaded from mem to run another BOINC project and will start again all over from scratch, ad infinitum, until WU times out. Also, as progress stays at 1% for hours, people will (mistakenly) assume it's the 1% bug, abort them, receive 0 credit and complain etc.

Just look at the posts of the last few hours. This isn't pretty...

PS: Having said that, I should still add that personally have had ONLY ONE (1) WU get stuck which I had to manually abort, in 3+ months of crunching for the project with 3 x P4 PCs (mostly 24/7). Maybe I've been lucky.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 13750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 13793 - Posted: 14 Apr 2006, 23:12:00 UTC - in response to Message 13747.  

How about giving a few example links to the failed WUs from your machine, and a description of your computer's OS, OS version, amount of Ram, cpu, and cpu speed, and free HD space?


OK, I'll do it, but still doesn't replied to my question.

Regards.

Shoikan:
The request was to make it possible to identify which problem or problems were affecting you, and point out any known cures for said problems.

The only errors I've had in the last few months were from buggy WUs, and one ghosted WU. And they've now promised to wait until the Ralph test runs are over before releasing WUs to Rosetta.

ID: 13793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shoikan

Send message
Joined: 4 Apr 06
Posts: 14
Credit: 180,211
RAC: 0
Message 13815 - Posted: 15 Apr 2006, 9:20:52 UTC - in response to Message 13793.  


ID: 13815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Incredible



©2024 University of Washington
https://www.bakerlab.org