Help us solve the 1% bug!

Message boards : Number crunching : Help us solve the 1% bug!

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Ledo

Send message
Joined: 22 Feb 06
Posts: 3
Credit: 42,171
RAC: 0
Message 11234 - Posted: 23 Feb 2006, 10:03:15 UTC

I've attached to the project yesterday and since then i can't finish the download of the following file: avgE_from_pdb.gz.

This is the error i obtained after failing the: download

2/23/2006 9:54:38 AM|rosetta@home|Started download of avgE_from_pdb.gz
2/23/2006 9:54:39 AM|rosetta@home|Temporarily failed download of avgE_from_pdb.gz: error 403

Should i dettach from the project and try again, to see if the problem is solved (without this, my WU are on permanent status: downloading, on the work tab)?
ID: 11234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 11242 - Posted: 23 Feb 2006, 12:54:10 UTC - in response to Message 11234.  
Last modified: 23 Feb 2006, 12:55:07 UTC

I've attached to the project yesterday and since then i can't finish the download of the following file: avgE_from_pdb.gz.

This is the error i obtained after failing the: download

2/23/2006 9:54:38 AM|rosetta@home|Started download of avgE_from_pdb.gz
2/23/2006 9:54:39 AM|rosetta@home|Temporarily failed download of avgE_from_pdb.gz: error 403

Should i dettach from the project and try again, to see if the problem is solved (without this, my WU are on permanent status: downloading, on the work tab)?

It appears you lost your socket connection to Rosetta. Can you retry the communication? Does it yeild the same result?. It should automatically try to finish the download that got interrupted.
ID: 11242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 11244 - Posted: 23 Feb 2006, 13:09:59 UTC - in response to Message 10926.  

Today i have the same problem, but i'm not sure if this is really a bug, or a very large WU? The Client isn't frozen, the step-counter is raising(Step 1.544.555 so far) but progress is at 1% for 1.20 hour

greetings
Thorm

The percentage done seems to be updated after a model/stage is completed. Your Athlon processor is slow by todays standards and it seems appropriate you should see longer periods between updates than someone with a faster processor. I've looked at your results and it seems to be doing fine.


There is an item in the FAQs thread that discusses this issue.

We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 11244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ledo

Send message
Joined: 22 Feb 06
Posts: 3
Credit: 42,171
RAC: 0
Message 11249 - Posted: 23 Feb 2006, 15:04:05 UTC - in response to Message 11242.  
Last modified: 23 Feb 2006, 15:27:55 UTC

It appears you lost your socket connection to Rosetta. Can you retry the communication? Does it yeild the same result?. It should automatically try to finish the download that got interrupted.


I've aborted the download of this file and imediatelly finished the WU with the status: client error downloading. It downloaded again another WU wich have several files and it stucked on that file again. I hit the retry button several times and the result is the same ...error 403. I have another projects running on this PC and there is no problem with them.

Edit: I see now the system requirements. The PC i attached is a 350Mhz and is less the minimun required for this project, that why i can't finished the download. Time to change to another project
ID: 11249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
aguiar@carrier.com.br

Send message
Joined: 19 Feb 06
Posts: 6
Credit: 367,089
RAC: 0
Message 11320 - Posted: 24 Feb 2006, 13:51:51 UTC
Last modified: 24 Feb 2006, 14:14:32 UTC

Hi! I'm a newcomer to BOINC and very interested in Rosetta. I downloaded a WU yesterday and now it is running on my computer.

I am following the graphics to find out if it is working well (since the BOINC program says it is only 1% complete and time to complete is now 15:23 and increasing instead of decreasing). Graphics now read as follows:

Workunit: PRODUCTION_ABINITIO_INCREASECYCLES50_1urnA_317_213
1% Complete
CPU time: 2 hr 18 min 23 sec (and counting)
Stage: Ab initio
Model: 1
Step: 1299394 (and counting)

Is that OK? Is it running correctly? If so, how much time would be required (approximately) to finish this WU?

Thanks and regards,
Valter Aguiar.

Edit: I just noticed that it stepped back. It is now at step 151736 and counting. This is the second time it happens with this same WU.
ID: 11320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11322 - Posted: 24 Feb 2006, 13:58:26 UTC
Last modified: 24 Feb 2006, 14:01:21 UTC

For people having many work Unit Errors!!

I have received an e-mail from Dr. Baker with information for any of you who are having a lot of Work Unit errors.

"Could you help us to recommend to people having problems with lots of WU to set the target run time to a smaller value like 2 hours. We think there aren't any new bugs, just with longer run times it is more likely for a WU to have problems."

So if you are having a lot of errors please reset your Time setting to 2 hours and see if that helps.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nite Owl
Avatar

Send message
Joined: 2 Nov 05
Posts: 87
Credit: 3,019,449
RAC: 0
Message 11343 - Posted: 24 Feb 2006, 19:08:28 UTC - in response to Message 11322.  
Last modified: 24 Feb 2006, 19:09:43 UTC

For people having many work Unit Errors!!

I have received an e-mail from Dr. Baker with information for any of you who are having a lot of Work Unit errors.

"Could you help us to recommend to people having problems with lots of WU to set the target run time to a smaller value like 2 hours. We think there aren't any new bugs, just with longer run times it is more likely for a WU to have problems."

So if you are having a lot of errors please reset your Time setting to 2 hours and see if that helps.

That kinda defeats the whole purpose of having a adjustable Target Runtime doesn't it?
ID: 11343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 11349 - Posted: 24 Feb 2006, 21:41:12 UTC - in response to Message 11343.  
Last modified: 24 Feb 2006, 21:41:27 UTC

That kinda defeats the whole purpose of having a adjustable Target Runtime doesn't it?


Right. Doesn't for much for the dial-up users and the restricted bandwidth users issue. The whole point of extending the run time was to reduce the download frequency.

If the WUs will only run reliably for 2 hours, then there are still problems to be solved.

Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 11349 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 11361 - Posted: 25 Feb 2006, 0:56:57 UTC - in response to Message 11343.  

For people having many work Unit Errors!!
...

That kinda defeats the whole purpose of having a adjustable Target Runtime doesn't it?

I don't think a lot of people are having many of these crunching errors. I haven't seen such an error on any of the 13 systems I have crunching on rosetta. (I am currently using a 10 hour target run time.)

The suggestion for a shorter target run time was for the few people who are having a lot of errors.
ID: 11361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nite Owl
Avatar

Send message
Joined: 2 Nov 05
Posts: 87
Credit: 3,019,449
RAC: 0
Message 11399 - Posted: 25 Feb 2006, 21:14:31 UTC - in response to Message 11349.  

Angus wrote:
If the WUs will only run reliably for 2 hours, then there are still problems to be solved.

My point exactly...
Join the Teddies@WCG
ID: 11399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 11411 - Posted: 26 Feb 2006, 7:36:06 UTC - in response to Message 11320.  
Last modified: 26 Feb 2006, 7:37:45 UTC

Hi! I'm a newcomer to BOINC and very interested in Rosetta. I downloaded a WU yesterday and now it is running on my computer.

I am following the graphics to find out if it is working well (since the BOINC program says it is only 1% complete and time to complete is now 15:23 and increasing instead of decreasing). Graphics now read as follows:

Workunit: PRODUCTION_ABINITIO_INCREASECYCLES50_1urnA_317_213
1% Complete
CPU time: 2 hr 18 min 23 sec (and counting)
Stage: Ab initio
Model: 1
Step: 1299394 (and counting)

Is that OK? Is it running correctly? If so, how much time would be required (approximately) to finish this WU?

Thanks and regards,
Valter Aguiar.

Edit: I just noticed that it stepped back. It is now at step 151736 and counting. This is the second time it happens with this same WU.


You should read the FAQ list located here everything you have reported is normal behavior.


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 11411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11575 - Posted: 2 Mar 2006, 23:21:57 UTC

24hrs and 27 left on this one I aborted.



3/2/2006 5:22:32 PM|rosetta@home|Unrecoverable error for result ABINITew_hom002_1ew4A_322_56_0 (aborted via GUI RPC)

ID: 11575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 11651 - Posted: 4 Mar 2006, 16:01:47 UTC

I believe this is the first 1% error WU that I have ever received:

SSFEATURES_BARCODE_ABINITIO_1ew4A_334_354
Regards,
Bob P.
ID: 11651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11690 - Posted: 5 Mar 2006, 18:51:53 UTC

I am runnig at 2hrs and still getting errors so those that think there is no new bug think again. I didn't have these problems until recentlly. Is proceesing these WU's doing any good or should we wait until they find a fix ?
I am crunching as many hours of work but getting alot less done.






ID: 11690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Los Alcoholicos~La Muis

Send message
Joined: 4 Nov 05
Posts: 34
Credit: 1,041,724
RAC: 0
Message 11733 - Posted: 6 Mar 2006, 21:17:17 UTC

My first 1% wu on this pc. It was stuck at 1% for 6:30 hrs before I noticed it. After a restart of the Boincmanager it froze again at model 2106, so I aborted it.

HBLR_1.0_2reb_332_989_0
ID: 11733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [B@H] Ray
Avatar

Send message
Joined: 20 Sep 05
Posts: 118
Credit: 100,251
RAC: 0
Message 11784 - Posted: 8 Mar 2006, 17:11:43 UTC

I am running BOINC 4.72 on two machines and have never had one stuck at 1%. Also I do not run P@H, could it be a Rosetta & newer BOINC problem? Or a Rosetta & P@H problem.

In reading this thread I see others running BOINC 4.72 who never had one stuck at 1%.


Pizza@Home Rays Place Rays place Forums
ID: 11784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Morten Starkeby
Avatar

Send message
Joined: 18 Feb 06
Posts: 10
Credit: 472,142
RAC: 0
Message 11785 - Posted: 8 Mar 2006, 17:54:19 UTC - in response to Message 8802.  
Last modified: 8 Mar 2006, 17:55:53 UTC

Got my first 1% stuck bug.

The work unit is stuck at 1% for 1 hour and 54 minutes at the time of this writing. It is stuck at model 1, step 20880, in the Ab initio stage. Name of work unit: HB_BARCODE_30_1a32__351_1964

I supended boinc, and ran the following command from the command prompt:

rosetta_4.82_windows_intelx86.exe cc 1a32 _ -abrelax -stringent_relax -more_relax_cycles -output_chi_silent -vary_omega -rand_envpair_res_wt -rand_SS_wt -farlx -ex1 -ex2 -silent -barcode_from_fragments -new_centroid_packing -barcode_from_fragments_length 30 -ssblocks -barcode_mode 3 -omega_weight 0.5 -jitter_frag -jitter_variation gauss -output_silent_gz -nstruct 10 -paths ccfrags200.txt -relax_score_filter -filter1 -115 -filter2 -130 -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -increase_cycles 10 -cpu_run_time 7200 -constant_seed -jran 3046839

The command executed perfectly, and proceeded quickly beyond 1% (at the time of writing it is at model 3, 20.9%). I then aborted the stand alone rosetta execution.

I am using the 5.3.24 version of Boinc (beta version)





ID: 11785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Zazie

Send message
Joined: 1 Mar 06
Posts: 2
Credit: 159,032
RAC: 0
Message 11816 - Posted: 9 Mar 2006, 12:09:11 UTC

Hi, with half of my workunits errored out and the last one (HB_BARCODE_30_1ten__351_2528_0) stuck at 1% for 10+ hours, I decided not to waste my CPU time anymore and withdrew from the project. Sorry guys, I would have loved to help you by my tiny contribution, but there are too many bugs in Rosetta and from what I see on the message boards, no-one from your staff is too worried.
ID: 11816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 11817 - Posted: 9 Mar 2006, 12:46:36 UTC - in response to Message 11816.  

...from what I see on the message boards, no-one from your staff is too worried.

I don't know what message boards you are reading, but fixing the bugs is the top priority. See David Baker's latest Journal entry.

Regards,
Bob P.
ID: 11817 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11838 - Posted: 9 Mar 2006, 23:11:13 UTC

81 hrs and no credit to show for it except an ever declining RAC and alot of wasted cycles.

3/9/2006 5:13:49 PM|rosetta@home|Unrecoverable error for result HOMSdc_homDB002_1dcj__339_185_0 (aborted via GUI RPC)

ID: 11838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Help us solve the 1% bug!



©2024 University of Washington
https://www.bakerlab.org