Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 55 · Next

AuthorMessage
bcrosby

Send message
Joined: 13 Apr 07
Posts: 2
Credit: 2,742,612
RAC: 0
Message 73839 - Posted: 15 Sep 2012, 13:13:56 UTC

My issue is a persistent 'Communication deferred' situation - routinely 18 hrs plus. I'm able to download units, and have a set pending upload/reporting ... but with communication constantly 'deferred', that never happens.

There are no local computational settings that I can see to invoke this delay in finished job reporting. Is there something more 'central' that I need to adjust (with my Account) to get this out of the way?
ID: 73839 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ArcSedna

Send message
Joined: 23 Oct 11
Posts: 14
Credit: 61,243,157
RAC: 70,740
Message 73842 - Posted: 16 Sep 2012, 0:59:07 UTC - in response to Message 73839.  
Last modified: 16 Sep 2012, 1:05:08 UTC

My issue is a persistent 'Communication deferred' situation - routinely 18 hrs plus. I'm able to download units, and have a set pending upload/reporting ... but with communication constantly 'deferred', that never happens.

There are no local computational settings that I can see to invoke this delay in finished job reporting. Is there something more 'central' that I need to adjust (with my Account) to get this out of the way?


Seeing your computer summary, your "Maximum daily WU quota per CPU" is dropping too low (only 1/day).
This means you are allowed only 1 WU per CPU core per day. (In your case, 1 x 4 cores = 4 WUs per day.)

This value is maximum 100. Each time the computer returns WU error, the value is reduced little by little, and finally drop to 1 (or zero?).

Your computer seems to be returning a lot of error results (*1), so WU quota was dropped to 1.
The way to recover from this, is to return a valid successful results. Then WU quota will be back to normal state (100/day), little by little.

Regarding the computation errors, it seems to be NOT your fault (hard to solve on user side).
There is reported some issues, similar to yours:
Thread : Client error for ALL tasks since a month. (Linux 64 bits boinc 7.0.27)


(*1) It seems hard to notice your errors, because on the local BOINC Manager, it shows "Ready to report", not "Computation error".
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6050&nowrap=true#73836
What is most troubling about this problem, is that on the CLIENT, it shows that it completed successfully with "Ready to report". It doesn't even show that it resulted in an error at all! It is only after checking my Tasks that I see that it was Client error.
ID: 73842 · Rating: 0 · rate: Rate + / Rate - Report as offensive
J.S.

Send message
Joined: 25 Jul 12
Posts: 3
Credit: 845
RAC: 0
Message 73892 - Posted: 27 Sep 2012, 9:39:01 UTC

Hi, I just became aware of an issue on my system: the BOINC client does not save the state of the tasks when I restart my netbook. So when I had to restart my system, the tasks were reset and started back from zero.

Is there anything I can do to prevent this? The usual tasks run six to seven hours on my system and I'm running two at a time so resetting it is like loosing a whole day of cpu time I donated. :-(

I'm running BOINC 7.0.28 on Lubuntu Linux 3.2.0-31. My machine is a Samsung N150 netbook, if that matters.
ID: 73892 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 73893 - Posted: 27 Sep 2012, 16:18:08 UTC - in response to Message 73892.  

Hi, I just became aware of an issue on my system: the BOINC client does not save the state of the tasks when I restart my netbook. So when I had to restart my system, the tasks were reset and started back from zero.

Is there anything I can do to prevent this? The usual tasks run six to seven hours on my system and I'm running two at a time so resetting it is like loosing a whole day of cpu time I donated. :-(

I'm running BOINC 7.0.28 on Lubuntu Linux 3.2.0-31. My machine is a Samsung N150 netbook, if that matters.


Although I'm not that familiar with netbooks, I believe that their processing power is generally pretty limited. Your tasks may simply not be making it to the first "checkpoint", or save point, therefore they start over from the beginning.
ID: 73893 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 24
Message 73897 - Posted: 27 Sep 2012, 23:06:21 UTC

Looks like the folks back at the shop had something go bump earlier today.

The entire site was unreacheable for while.

Then I could access the site, but the server status page showed a number of processes not running.

Now the server status page is green, but uploads and reporting are still not working.

Waiting on a Baker labs update on this.

ID: 73897 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73899 - Posted: 27 Sep 2012, 23:17:08 UTC

I'm not able to upload either.


Fri 28 Sep 2012 09:14:09 EST Internet access OK - project servers may be temporarily down.
Fri 28 Sep 2012 09:14:09 EST rosetta@home Temporarily failed upload of rb_09_26_33702_63856_t000__casp9_ben_IGNORE_THE_REST_03_08_60229_13_0_0: connect() failed
Fri 28 Sep 2012 09:14:09 EST rosetta@home Backing off 1 min 0 sec on upload of rb_09_26_33702_63856_t000__casp9_ben_IGNORE_THE_REST_03_08_60229_13_0_0
Fri 28 Sep 2012 09:14:09 EST rosetta@home Temporarily failed upload of rb_09_26_33707_63866_h003__casp9_ben_IGNORE_THE_REST_05_18_60239_11_0_0: connect() failed
Fri 28 Sep 2012 09:14:09 EST rosetta@home Backing off 1 min 0 sec on upload of rb_09_26_33707_63866_h003__casp9_ben_IGNORE_THE_REST_05_18_60239_11_0_0

At least the web page works. ;)

ID: 73899 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile KWSN THE Holy Hand Grenade!

Send message
Joined: 3 May 07
Posts: 5
Credit: 2,542,452
RAC: 0
Message 73900 - Posted: 27 Sep 2012, 23:25:13 UTC

Agreed : uploading and any type of reporting (reporting a task, requesting new work, or just a stats update) are off-line!
ID: 73900 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 73902 - Posted: 28 Sep 2012, 0:57:37 UTC

Arghh!!! I set all my projects on my laptop to No New Tasks to clear an issue here, now can't grab new ones. Bad timing. No downtime all year then I get hit at the worst moment :(

Oh well, let's give WCG a bit of love for a change...
ID: 73902 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 73906 - Posted: 28 Sep 2012, 6:09:01 UTC

System still broken, NO up or downloads.!

ID: 73906 · Rating: 0 · rate: Rate + / Rate - Report as offensive
harlequin
Avatar

Send message
Joined: 29 Dec 06
Posts: 1
Credit: 655,030
RAC: 0
Message 73907 - Posted: 28 Sep 2012, 9:48:06 UTC

Hello!

Same here:
28.09.2012 11:18:10 | rosetta@home | Reporting 15 completed tasks, not requesting new tasks
28.09.2012 11:18:11 |  | Project communication failed: attempting access to reference site
28.09.2012 11:18:11 | rosetta@home | Scheduler request failed: Couldn't connect to server
28.09.2012 11:18:12 |  | Internet access OK - project servers may be temporarily down.



ID: 73907 · Rating: 0 · rate: Rate + / Rate - Report as offensive
J.S.

Send message
Joined: 25 Jul 12
Posts: 3
Credit: 845
RAC: 0
Message 73908 - Posted: 28 Sep 2012, 9:54:37 UTC - in response to Message 73893.  

Although I'm not that familiar with netbooks, I believe that their processing power is generally pretty limited.

Granted, but irrelevant for my question.

Your tasks may simply not be making it to the first "checkpoint", or save point, therefore they start over from the beginning.

Not that I knew very much about the checkpoints you are referring to, but a task that run for, say, five hours straight and indicates a progress of about 75% should have reached some kind of checkpoint by then - otherwise, the software would be fundamentally broken.

Can anybody else help?
ID: 73908 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bockovic

Send message
Joined: 22 Feb 10
Posts: 1
Credit: 3,975
RAC: 0
Message 73909 - Posted: 28 Sep 2012, 10:25:06 UTC - in response to Message 73908.  

Your tasks may simply not be making it to the first "checkpoint", or save point, therefore they start over from the beginning.

Not that I knew very much about the checkpoints you are referring to, but a task that run for, say, five hours straight and indicates a progress of about 75% should have reached some kind of checkpoint by then - otherwise, the software would be fundamentally broken.

Can anybody else help?[/quote]

Maybe you should try to play a little bit with settings? Go to Preferences > Disk and memory usage. Then check/uncheck "Leave aplications in memory while suspended" and try to decrease time for "Tasks checkpoints to disk every:" (I put here 600 seconds).
Maybe there is a problem writing to disk if the partition is closed or Boinc do not see it porperly or else (I am not familiar with Linux). Try changing default folder for tasks. Be carefull if you do that and monitor behaviour of boinc client. I chenged my BOINC folder to D: partition and in some older version of BOINC it would stuck at "Reconnecting to client". As if it can not access 127.0.0.0. Very strange behaviour. When I leave it to install to default C: drive everything works fine, but there was no room on that partition because Win is there off course, so the PC crashed. Solved it with newer version of Boinc :)

Hope this helps to someone ;)
Bockovic

ID: 73909 · Rating: 0 · rate: Rate + / Rate - Report as offensive
J.S.

Send message
Joined: 25 Jul 12
Posts: 3
Credit: 845
RAC: 0
Message 73910 - Posted: 28 Sep 2012, 10:33:09 UTC - in response to Message 73909.  

Maybe you should try to play a little bit with settings? Go to Preferences > Disk and memory usage. Then check/uncheck "Leave aplications in memory while suspended" and try to decrease time for "Tasks checkpoints to disk every:" (I put here 600 seconds). (...)

My default there is 60 seconds. The application is supposed to stay in memory while suspended (box is checked).

I often also suspend the application manually, so it can finish writing to disk before I suspend/hibernate the machine.

Is there a place where I can read more about the "reset conditions" of a task - under which circumstance can this happen?

Oh, by the way, yes, I tried turning it off and on again. I also uninstalled and reinstalled. ;-)
ID: 73910 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 73911 - Posted: 28 Sep 2012, 10:40:18 UTC - in response to Message 73910.  

Maybe you should try to play a little bit with settings? Go to Preferences > Disk and memory usage. Then check/uncheck "Leave aplications in memory while suspended" and try to decrease time for "Tasks checkpoints to disk every:" (I put here 600 seconds). (...)

My default there is 60 seconds. The application is supposed to stay in memory while suspended (box is checked).

I often also suspend the application manually, so it can finish writing to disk before I suspend/hibernate the machine.

Is there a place where I can read more about the "reset conditions" of a task - under which circumstance can this happen?

Oh, by the way, yes, I tried turning it off and on again. I also uninstalled and reinstalled. ;-)


This won't help you, but I have read somewhere with another project, that hibernating a pc (under windows) will result in strange behavior of BOINC and that thanks error out eventually.
Greetings,
TJ.
ID: 73911 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Doug_Hood

Send message
Joined: 15 Dec 05
Posts: 2
Credit: 3,416,526
RAC: 0
Message 73913 - Posted: 28 Sep 2012, 14:17:00 UTC - in response to Message 73907.  

Hello!

Same here:
28.09.2012 11:18:10 | rosetta@home | Reporting 15 completed tasks, not requesting new tasks
28.09.2012 11:18:11 |  | Project communication failed: attempting access to reference site
28.09.2012 11:18:11 | rosetta@home | Scheduler request failed: Couldn't connect to server
28.09.2012 11:18:12 |  | Internet access OK - project servers may be temporarily down.




Same here. I have about 50 work units trying to report for the last 15+ hours

ID: 73913 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bill Kozorra

Send message
Joined: 25 Jan 11
Posts: 5
Credit: 86,535,873
RAC: 1,880
Message 73914 - Posted: 28 Sep 2012, 14:34:07 UTC

Is anyone having problems uploading results? Not one of my computers will upload. My Internet access is ok. I noticed that the server status page says that everything is ok. Still nothing will upload.
ID: 73914 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 73915 - Posted: 28 Sep 2012, 14:42:35 UTC - in response to Message 73914.  

Is anyone having problems uploading results? Not one of my computers will upload. My Internet access is ok. I noticed that the server status page says that everything is ok. Still nothing will upload.


Yes, see my thread "upload problem".
No new work either.
Greetings,
TJ.
ID: 73915 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chuck

Send message
Joined: 13 Aug 10
Posts: 3
Credit: 3,297
RAC: 0
Message 73916 - Posted: 28 Sep 2012, 15:39:24 UTC - in response to Message 73914.  

Is anyone having problems uploading results? Not one of my computers will upload. My Internet access is ok. I noticed that the server status page says that everything is ok. Still nothing will upload.


Getting these same uploading results myself, nothing from rosetta@home will upload, just keeps saying project backoff with a timer delay of anywhere 2 mins to 2 hours or more.
ID: 73916 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chuck

Send message
Joined: 13 Aug 10
Posts: 3
Credit: 3,297
RAC: 0
Message 73917 - Posted: 28 Sep 2012, 15:43:38 UTC - in response to Message 70813.  

Hi Snagletooth

I have never installed a firewall in Ubuntu - I go through a router which protects me.

As I say, everything was fine (therefore port access in the router was setup ok) until suddenly.

And since 6am this morning on this machine, 2 tasks completed ok - the next 9 exited with errors after a few minutes (1hr 09, 30m, 19m, 07m, 17m, 15m, 01m, 0.01m, 0.02m respectively).

It seems odd that 2 computers suddenly became error prone at the same time and and in the same approximate quantity.

I feel very despondent about this - research into cancer is very personal to me and I thought I was contributing, albeit in a small way.

Please help as I am not technical in sorting out why this is being wasted. I'm afraid your paste of the error is double-dutch to me.

David


Please remember David even 2 out of 9 completed WU's get us closer to a cure. Every wu done is another step closer. Don't give up. You can always run BOINC in a windows OS. Keep on crunching.
ID: 73917 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 24
Message 73918 - Posted: 28 Sep 2012, 16:22:36 UTC

While, on occasion, this project has encountered connectivity problems (perhaps a few times a year), it is relatively rare for this project in the distributed processing world.

Somewhat more troublesome is -- and this is VERY rare for Rosetta -- an informational black out. As we move toward a full day of the outage, we've not seen any information from the project folks, not even an acknowledgement of what folks are reporting here.

It may well be that they are aware of the problem and are working on it, but at this juncture, for the community here, it is all speculation. I very much looking toward at least an acknowledgement that the folks back at the lab are aware there is a problem,

ID: 73918 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org