Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 55 · Next

AuthorMessage
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,786
RAC: 1,319
Message 75148 - Posted: 20 Feb 2013, 12:12:25 UTC - in response to Message 75145.  

One thing to remember is almost ALL webpages are cached and only refreshed on a periodic basis, some by the minute, some by the hour and some by the day or week. There is no way to tell unless you check and notice a change, or they come out and actually tell us.


It's written on the server status when it was updated the last time.


Those WORDS were in the way, THANKS now I can SEE!!!
ID: 75148 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75213 - Posted: 9 Mar 2013, 22:29:58 UTC
Last modified: 9 Mar 2013, 22:32:17 UTC

This download problem is still happening, this one file is just over 4MB & keeps getting stuck & retries after a couple of KB over & over.

Some larger files download o.k. others don't.

I'm not having any problems with my other projects JUST ROSETTA.

Sun 10 Mar 2013 09:20:09 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:20:09 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:20:10 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:20:11 EST Internet access OK - project servers may be temporarily down.
Sun 10 Mar 2013 09:25:19 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:25:19 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:25:20 EST Internet access OK - project servers may be temporarily down.
Sun 10 Mar 2013 09:25:20 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:30:32 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:30:32 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:30:34 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:30:35 EST Internet access OK - project servers may be temporarily down.
ID: 75213 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 75220 - Posted: 11 Mar 2013, 9:59:46 UTC - in response to Message 75213.  

This download problem is still happening, this one file is just over 4MB & keeps getting stuck & retries after a couple of KB over & over.

Some larger files download o.k. others don't.

I'm not having any problems with my other projects JUST ROSETTA.

Sun 10 Mar 2013 09:20:09 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:20:09 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:20:10 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:20:11 EST Internet access OK - project servers may be temporarily down.
Sun 10 Mar 2013 09:25:19 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:25:19 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:25:20 EST Internet access OK - project servers may be temporarily down.
Sun 10 Mar 2013 09:25:20 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:30:32 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:30:32 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:30:34 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:30:35 EST Internet access OK - project servers may be temporarily down.

I posted a possible solution in the Slow to download thread, it has helped a lot people with SETI downloads, it also might help here.
.
ID: 75220 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75224 - Posted: 11 Mar 2013, 20:44:14 UTC - in response to Message 75220.  

This download problem is still happening, this one file is just over 4MB & keeps getting stuck & retries after a couple of KB over & over.

Some larger files download o.k. others don't.

I'm not having any problems with my other projects JUST ROSETTA.

Sun 10 Mar 2013 09:20:09 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:20:09 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:20:10 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:20:11 EST Internet access OK - project servers may be temporarily down.
Sun 10 Mar 2013 09:25:19 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:25:19 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:25:20 EST Internet access OK - project servers may be temporarily down.
Sun 10 Mar 2013 09:25:20 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:30:32 EST Project communication failed: attempting access to reference site
Sun 10 Mar 2013 09:30:32 EST rosetta@home Temporarily failed download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz: HTTP error
Sun 10 Mar 2013 09:30:34 EST rosetta@home Started download of rb_03_09_36646_70000_h001__hsapoe4_aah001_19_05.200_v1_3.gz
Sun 10 Mar 2013 09:30:35 EST Internet access OK - project servers may be temporarily down.

I posted a possible solution in the Slow to download thread, it has helped a lot people with SETI downloads, it also might help here.


I'm using Ubuntu on both rigs, not windows!



ID: 75224 · Rating: 0 · rate: Rate + / Rate - Report as offensive
PanicMan

Send message
Joined: 31 Jan 10
Posts: 7
Credit: 276,651
RAC: 0
Message 75296 - Posted: 29 Mar 2013, 21:02:25 UTC

i have been having issues with rosetta for about a week or so now...i reset project last week sometime due to errors..now i noticed this when i got home today..there was about 200 of them i copy/pasted a small section...along with 2 that had computational errors

3/29/2013 8:25:05 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file
3/29/2013 8:25:05 AM | rosetta@home | If this happens repeatedly you may need to reset the project.
3/29/2013 8:25:05 AM | rosetta@home | Restarting task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 using minirosetta version 345 in slot 1
3/29/2013 8:25:07 AM | rosetta@home | Task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 exited with zero status but no 'finished' file
3/29/2013 8:25:07 AM | rosetta@home | If this happens repeatedly you may need to reset the project.
3/29/2013 8:25:07 AM | rosetta@home | Restarting task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 using minirosetta version 345 in slot 2
3/29/2013 8:25:47 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file
3/29/2013 8:25:47 AM | rosetta@home | If this happens repeatedly you may need to reset the project.
3/29/2013 8:25:49 AM | rosetta@home | Task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 exited with zero status but no 'finished' file
3/29/2013 8:25:49 AM | rosetta@home | If this happens repeatedly you may need to reset the project.
3/29/2013 8:25:49 AM | rosetta@home | Restarting task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 using minirosetta version 345 in slot 2
3/29/2013 8:26:28 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file
3/29/2013 8:26:28 AM | rosetta@home | If this happens repeatedly you may need to reset the project.
3/29/2013 8:26:31 AM | rosetta@home | Task binding_helix_0339_disulf1_1_disulf2_1_disulf3_4_disulf4_2_disulf5_1_0005_abinitio_SAVE_ALL_OUT_73421_2970_1 exited with zero status but no 'finished' file
3/29/2013 8:26:31 AM | rosetta@home | If this happens repeatedly you may need to reset the project.
3/29/2013 8:27:10 AM | rosetta@home | Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file

the messages last week were similar but tasks were obviously different..is this a known issue or has something somehow changed that is causing this? thanks in advance for any help.

ID: 75296 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Rabinovitch
Avatar

Send message
Joined: 28 Apr 07
Posts: 28
Credit: 5,439,728
RAC: 0
Message 75304 - Posted: 1 Apr 2013, 4:17:47 UTC

Hi all! Consider please my problem described in adjacent thread.
From Siberia with love!
ID: 75304 · Rating: 0 · rate: Rate + / Rate - Report as offensive
lugal

Send message
Joined: 16 Jul 08
Posts: 2
Credit: 175,028
RAC: 0
Message 75349 - Posted: 11 Apr 2013, 17:32:36 UTC

Hi all,

I happen to have this problem now for several weeks and have reset several times.

This is most annoying and I hope someone will start soon to look into that.

regards to all Lugal
ID: 75349 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile chillwater

Send message
Joined: 18 Dec 09
Posts: 1
Credit: 3,483,338
RAC: 0
Message 75353 - Posted: 12 Apr 2013, 11:29:35 UTC

Greetings,
I have received 13 client error/compute error in the last three days. Had only 2 or 3 in the previous 5 days. At least one wingman errored the same wu. Have no other errors in other projects (6).

Just so you know.
ID: 75353 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,786
RAC: 1,319
Message 75355 - Posted: 12 Apr 2013, 12:40:13 UTC - in response to Message 75353.  

Greetings,
I have received 13 client error/compute error in the last three days. Had only 2 or 3 in the previous 5 days. At least one wingman errored the same wu. Have no other errors in other projects (6).

Just so you know.


Apparently so have ALOT of other people too!! It could be as simple as a bad batch or a Server side problem. HOPEFULLY Rosetta is working on it and NOT just watching!!
ID: 75355 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Drag'n Smoke
Avatar

Send message
Joined: 20 Jan 13
Posts: 1
Credit: 3,690
RAC: 0
Message 75356 - Posted: 12 Apr 2013, 16:03:10 UTC

Work units are not being completed. They are resetting themselves with some impossible deadlines. Just what is the problem???
ID: 75356 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 75360 - Posted: 12 Apr 2013, 21:35:27 UTC - in response to Message 75355.  

Greetings,
I have received 13 client error/compute error in the last three days. Had only 2 or 3 in the previous 5 days. At least one wingman errored the same wu. Have no other errors in other projects (6).

Just so you know.


Apparently so have ALOT of other people too!! It could be as simple as a bad batch or a Server side problem. HOPEFULLY Rosetta is working on it and NOT just watching!!

I have 36 error out of 176 WU's, it started on April 10. However it seems that today the error rate is very high.
Don't expect anything, we have to go through this batch.
Greetings,
TJ.
ID: 75360 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 75392 - Posted: 20 Apr 2013, 3:47:49 UTC - in response to Message 75360.  

One thing seems apparent -- the Cryo work units have a nasty tendency to yield computation errors. I wish I could configure the client to avoid them. Better yet, I wish the project folks would stop making them generally available until the problem is dealt with at the project level. Those units go computation errors with numerous different computers -- where no other project is kicking up the errors -- so it seems reasonably clear that the problem is a project specific and work unit class specific issue.




Greetings,
I have received 13 client error/compute error in the last three days. Had only 2 or 3 in the previous 5 days. At least one wingman errored the same wu. Have no other errors in other projects (6).

Just so you know.


Apparently so have ALOT of other people too!! It could be as simple as a bad batch or a Server side problem. HOPEFULLY Rosetta is working on it and NOT just watching!!

I have 36 error out of 176 WU's, it started on April 10. However it seems that today the error rate is very high.
Don't expect anything, we have to go through this batch.

ID: 75392 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,786
RAC: 1,319
Message 75396 - Posted: 20 Apr 2013, 12:43:10 UTC - in response to Message 75392.  

One thing seems apparent -- the Cryo work units have a nasty tendency to yield computation errors. I wish I could configure the client to avoid them. Better yet, I wish the project folks would stop making them generally available until the problem is dealt with at the project level. Those units go computation errors with numerous different computers -- where no other project is kicking up the errors -- so it seems reasonably clear that the problem is a project specific and work unit class specific issue.


AMEN TO THAT!! What I WISH Rosetta had, like MOST other Boinc Projects DO HAVE, is a way to select which tasks I want and which I don't want!! I abort EVERY cryo unit I see but they are insidious!! I abort 5 and upon the update they send me 3 more, I abort those and then send me even more!! It is frustrating and causing me to rethink my contribution to Rosie right now!!!
ID: 75396 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 75398 - Posted: 20 Apr 2013, 16:49:15 UTC - in response to Message 75396.  

One thing seems apparent -- the Cryo work units have a nasty tendency to yield computation errors. I wish I could configure the client to avoid them. Better yet, I wish the project folks would stop making them generally available until the problem is dealt with at the project level. Those units go computation errors with numerous different computers -- where no other project is kicking up the errors -- so it seems reasonably clear that the problem is a project specific and work unit class specific issue.


AMEN TO THAT!! What I WISH Rosetta had, like MOST other Boinc Projects DO HAVE, is a way to select which tasks I want and which I don't want!! I abort EVERY cryo unit I see but they are insidious!! I abort 5 and upon the update they send me 3 more, I abort those and then send me even more!! It is frustrating and causing me to rethink my contribution to Rosie right now!!!



And you will notice that NONE of the project guys read these threads and I do mean NONE!!! Saw on the download tab that YF Song was attached to the cyro units, surprised he has not shown up here, certainly the results they are getting back should be telling them something is wrong with the tasks, but that does not seem to be the case.
ID: 75398 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75399 - Posted: 20 Apr 2013, 18:15:45 UTC

These must work for some people, the POTD got one done:
Apr 20, 2013
Predictor of the day: Congratulations to Pat McGee for predicting the lowest energy structure for workunit cryo_bf__chain_J_subrun_002_SAVE_ALL_OUT__78436_0


I have not been seeing any cryo's, but finally got one, so suspended other tasks to go run it, it failed for me too. Had already failed for wingman too.

I'll just offer the comment on the question of why they do not let you select the work you wish to do, they are working a constant mix of various types of work, and so if such a choice existed, there would quickly be 100 choices. To enable such a thing would tax the scheduler to no end.

In the end, I believe I'm correct in saying that so long as the tasks complete normally, you don't so much need such a choice.
Rosetta Moderator: Mod.Sense
ID: 75399 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75400 - Posted: 20 Apr 2013, 18:30:57 UTC

I found I had two other cryo tasks, all failed. One had a successful wingman. It was a MAC. Looks like the POTD only has a MAC as well.
Rosetta Moderator: Mod.Sense
ID: 75400 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 75404 - Posted: 21 Apr 2013, 3:35:13 UTC - in response to Message 75399.  

OK folks thanks for the confirmation -- what I find I do is manually go in an 'pre-abort' the cryo's before they start -- but it is a fair amount of intervention and some bad ones slip by. It wouldn't be so bad if they failed quickly, but some go computation error after a couple of hours of wasted processing time.

I would very much like it if the project folks got on this and stopped generating them until they figured out what the problem is on the project side of things.
ID: 75404 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 75405 - Posted: 21 Apr 2013, 5:09:23 UTC

I gad a cryo error also, when it had nearly finished the 12 hours I allow for such workunits.

I noticed earlier, that after running for 5 hours, it had not written any checkpoints at all.
ID: 75405 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 75406 - Posted: 21 Apr 2013, 7:28:26 UTC - in response to Message 75405.  

Indeed -- the problem of course is that while I periodically ferret out ALL Cryo units I have, not only does that not stop me from getting new ones, but also the ones I abort simply go back into the queue for future downloads.

I realize that some of the cryo workunits are OK -- but it seems to me that it is incumbent on the project folks to simply stop these from going out at the project level and debug them there.

There are no doubt plenty of folks running Rosetta in a 'no attention mode' - and they are really wasting CPU cycles here.

I gad a cryo error also, when it had nearly finished the 12 hours I allow for such workunits.

I noticed earlier, that after running for 5 hours, it had not written any checkpoints at all.

ID: 75406 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,786
RAC: 1,319
Message 75408 - Posted: 21 Apr 2013, 11:03:16 UTC - in response to Message 75406.  

Indeed -- the problem of course is that while I periodically ferret out ALL Cryo units I have, not only does that not stop me from getting new ones, but also the ones I abort simply go back into the queue for future downloads.

I realize that some of the cryo workunits are OK -- but it seems to me that it is incumbent on the project folks to simply stop these from going out at the project level and debug them there.

There are no doubt plenty of folks running Rosetta in a 'no attention mode' - and they are really wasting CPU cycles here.


And I think THAT is a major part of what could be Rosetta downfall, they simply don't manage their project well and just keep on keeping on despite the problems they are having. I found TEN cryo units this morning on ONE machine, one had errored out and I aborted the rest!! I have already turned a different machine off, as in NO NEW TASKS, and will move on to Poem with it. Two other machines are now on Eon with Malaria as backups! I am getting VERY tired aborting cryo units or being asleep and having them error out!!! I only have Windows machines, so it seems they will NEVER work for me!!
ID: 75408 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org