Posts by David E K

61) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81499)
Posted 19 Apr 2017 by Profile David E K
Post:
A note from our systems engineer:

UW IT got back to us and said that they removed the filtering on their equipment. And indeed, since around 11 am all five BOINC web servers have been receiving results OK with the minimal failure rate, about 10 in the last 6 hours vs. 15000/day.

The issue seems to be resolved for now but let's keep an eye on things. UW IT provided no information what triggered this block, so it might happen again.

Luki
62) Message boards : Number crunching : RALPH@Home ? (Message 81498)
Posted 19 Apr 2017 by Profile David E K
Post:
Thanks for the heads up. A db table became corrupted. It's back up now.
63) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81474)
Posted 17 Apr 2017 by Profile David E K
Post:
Luki was referring to a specific job and was not speaking generally. The data size is variable.

I think this issue is not just affecting uploads, when I tried my last post on this thread last night it timed out and the data was truncated. I had to manually remove the truncated post and repost.
64) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81471)
Posted 17 Apr 2017 by Profile David E K
Post:
Have one WU stuck on upload since at least Friday, Other Rosetta tasks seem to complete and upload fine.
If this problem is so elusive, why aren't there any admins/programmers actively communicating with folks that have this problem in order to solve it?

Ralf


We can see the errors in our logs. I don't know what additional info would help from users but if you have info that may help, please continue to post it here.

The issue manifested at around 11:30 am on April 12. See Luki's (our systems engineer) notes:

1) The problem started on Wednesday 4/12 at 11:30 am PST. All web servers started misbehaving at the same time. Upload timeouts went from 100x (from ~80/day to ~8000/day).
2) Out of ~37000 unique client IPs that uploaded results this week, only ~1400 are affected (4%). So it's not random.
3) The network or machine loads have not really changed.
4) The size of the uploaded results is only ~600 KB, yet the upload stalls after ~8 KB (client stops sending data to server). Hence the upload_handler waits for more data until apache times out the request. The upload handler uses no CPU and causes no IO load (yet).
5) As you know, the web server nodes are directly connected to UW switches; our switched can't be to blame here. Still, I tried moving one of the public IPs (bsrv5) to another server, connected to another switch UW -- the problem moved with it instantaneously.
6) The culprit really seems to be at the network level, like the TCP ACKs don't make it to the client; yet we capture them on the wire. Is UW dropping them? Like they trigger an IDS?

Luki


We are still trying to figure out what is causing this and will keep you all posted if we make progress.
65) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81450)
Posted 15 Apr 2017 by Profile David E K
Post:
Could you modify your Server Status web page to show which of the server programs handle uploads and downloads?


I just added some text:

Web servers: boinc, srv1, srv2, srv3, srv4, srv5 (upload and download servers)


boinc is load balanced among the srv web servers. The srv servers handle uploads and downloads.
66) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81449)
Posted 15 Apr 2017 by Profile David E K
Post:
This has been a very elusive issue. Our sys admins have been working pretty hard at trying to figure out what is going on. It may be a network issue on the UW side but we are not sure. Still working on trying to figure this out. Thanks for all the feeback/updates.
67) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81437)
Posted 15 Apr 2017 by Profile David E K
Post:
We've been trying to troubleshoot this issue but still do not know what is causing it. Our sys admin said that UW-IT has been contacted to determine if it's a UW network issue. Sorry for any inconvenience.
68) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81434)
Posted 14 Apr 2017 by Profile David E K
Post:
I'll take a look. Sorry for being late on this.
69) Message boards : Number crunching : Minirosetta android 3.83 (Message 81412)
Posted 6 Apr 2017 by Profile David E K
Post:
I updated the validator to hopefully address this issue.

4 res6 tasks validated ok, including one that hit 600 decoys, after running the full time. One res7 task pulled up short again. Though I probably need to let a few more complete to give it a proper chance. Different tasks going through now.


I'll look into that one further. It should have been ok with the limits I set in the validator. thanks!
70) Message boards : Number crunching : Minirosetta android 3.83 (Message 81410)
Posted 6 Apr 2017 by Profile David E K
Post:
I updated the validator to hopefully address this issue.
71) Message boards : Number crunching : Minirosetta android 3.83 (Message 81407)
Posted 5 Apr 2017 by Profile David E K
Post:
Interesting. I'll check with the researcher who owns these jobs. Looks like the job created TOO MANY models so it failed validation.
72) Message boards : Number crunching : Minirosetta android 3.83 (Message 81404)
Posted 4 Apr 2017 by Profile David E K
Post:
It seems like there are lots of v3.73 tasks in the buffer at your end. I've seen no v3.83 even download yet.


I'll take a look. So far the success rate is high and 3.83 results are coming in.
73) Message boards : Number crunching : Android - Immediate Computation Error, over and over again (Message 81400)
Posted 4 Apr 2017 by Profile David E K
Post:
The stable release is out now for android platforms.
74) Message boards : Number crunching : Minirosetta android 3.83 (Message 81399)
Posted 3 Apr 2017 by Profile David E K
Post:
Please post issues/bugs regarding the minirosetta android version 3.83 application in this thread.
75) Message boards : Number crunching : Android - Immediate Computation Error, over and over again (Message 81397)
Posted 1 Apr 2017 by Profile David E K
Post:
Yep, I updated the app last night and hopefully this version will be stable for Android 4-7+ versions. Look for an update on R@h soon!


Awesome thanks! Will you let us know when the r@h project is ready? Or should I just check periodically?


I'll let everyone know. It's looking good so far. We are testing some protocols/jobs this weekend and if everything looks good, we'll push it out on Monday.
76) Message boards : Number crunching : upcoming 3-day-long challenge (April 5-8) -> enough WUs available ? (Message 81396)
Posted 1 Apr 2017 by Profile David E K
Post:
Hello,

there will be another 3-day-long challenge (A), already 15 teams signed up.

I: WU availability:

David, could you please check if there will be enough WUs available for the challenge time ?


II. BOINC users: please participate
- 1. tell your teamfounder to join the challenge on boincstats
- 2. the users just have to crunch the WUs in that time period. nothing more


Erkan

(A) https://boincstats.com/en/stats/challenge/team/chat/912



Plenty of work units. There's a huge month long backlog. Thanks!
77) Message boards : Number crunching : Android - Immediate Computation Error, over and over again (Message 81390)
Posted 30 Mar 2017 by Profile David E K
Post:
Yep, I updated the app last night and hopefully this version will be stable for Android 4-7+ versions. Look for an update on R@h soon!
78) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81386)
Posted 28 Mar 2017 by Profile David E K
Post:
My mac is running well. Are others having mac client issues?


My iMac hasn't received any work for several days. I've tried manual Updates and Reset Project from the BOINC Manager, but still get no new work.


I've gotten projects from others but none from Rosetta.



LHC gives me no tasks and the Lattice Project downloads fail one at a time.

I'm on a new MacBook.

Joan


I'm not sure what's happening but I'll look into it.
79) Message boards : Number crunching : RALPH@Home ? (Message 81373)
Posted 27 Mar 2017 by Profile David E K
Post:
I noticed that over the weekend and contacted our sys admins. Hopefully they'll be able to get it back up soon. Thanks.
80) Message boards : Number crunching : Android - Immediate Computation Error, over and over again (Message 81356)
Posted 22 Mar 2017 by Profile David E K
Post:
I'm still working on debugging the android app for android 7+ devices. The current version testing on Ralph@home appears to be pretty stable for 4+, 5+, and 6+ android devices. Please attach to this project manually using http://ralph.bakerlab.org as the project URL via the boinc client on your phone if you'd like to join to help the testing phase.

Once I figure out the issue with android 7+ devices and test it on Ralph@home, I'll post the updated fixed app on Rosetta@home. Thanks!


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org