61)
Message boards :
Number crunching :
Stuck on uploading is a new problem?
(Message 81499)
Posted 19 Apr 2017 by David E K Post: A note from our systems engineer: UW IT got back to us and said that they removed the filtering on their equipment. And indeed, since around 11 am all five BOINC web servers have been receiving results OK with the minimal failure rate, about 10 in the last 6 hours vs. 15000/day. The issue seems to be resolved for now but let's keep an eye on things. UW IT provided no information what triggered this block, so it might happen again. Luki |
62)
Message boards :
Number crunching :
RALPH@Home ?
(Message 81498)
Posted 19 Apr 2017 by David E K Post: Thanks for the heads up. A db table became corrupted. It's back up now. |
63)
Message boards :
Number crunching :
Stuck on uploading is a new problem?
(Message 81474)
Posted 17 Apr 2017 by David E K Post: Luki was referring to a specific job and was not speaking generally. The data size is variable. I think this issue is not just affecting uploads, when I tried my last post on this thread last night it timed out and the data was truncated. I had to manually remove the truncated post and repost. |
64)
Message boards :
Number crunching :
Stuck on uploading is a new problem?
(Message 81471)
Posted 17 Apr 2017 by David E K Post: Have one WU stuck on upload since at least Friday, Other Rosetta tasks seem to complete and upload fine. We can see the errors in our logs. I don't know what additional info would help from users but if you have info that may help, please continue to post it here. The issue manifested at around 11:30 am on April 12. See Luki's (our systems engineer) notes: 1) The problem started on Wednesday 4/12 at 11:30 am PST. All web servers started misbehaving at the same time. Upload timeouts went from 100x (from ~80/day to ~8000/day). 2) Out of ~37000 unique client IPs that uploaded results this week, only ~1400 are affected (4%). So it's not random. 3) The network or machine loads have not really changed. 4) The size of the uploaded results is only ~600 KB, yet the upload stalls after ~8 KB (client stops sending data to server). Hence the upload_handler waits for more data until apache times out the request. The upload handler uses no CPU and causes no IO load (yet). 5) As you know, the web server nodes are directly connected to UW switches; our switched can't be to blame here. Still, I tried moving one of the public IPs (bsrv5) to another server, connected to another switch UW -- the problem moved with it instantaneously. 6) The culprit really seems to be at the network level, like the TCP ACKs don't make it to the client; yet we capture them on the wire. Is UW dropping them? Like they trigger an IDS? Luki We are still trying to figure out what is causing this and will keep you all posted if we make progress. |
65)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 81450)
Posted 15 Apr 2017 by David E K Post: Could you modify your Server Status web page to show which of the server programs handle uploads and downloads? I just added some text: Web servers: boinc, srv1, srv2, srv3, srv4, srv5 (upload and download servers) boinc is load balanced among the srv web servers. The srv servers handle uploads and downloads. |
66)
Message boards :
Number crunching :
Stuck on uploading is a new problem?
(Message 81449)
Posted 15 Apr 2017 by David E K Post: This has been a very elusive issue. Our sys admins have been working pretty hard at trying to figure out what is going on. It may be a network issue on the UW side but we are not sure. Still working on trying to figure this out. Thanks for all the feeback/updates. |
67)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 81437)
Posted 15 Apr 2017 by David E K Post: We've been trying to troubleshoot this issue but still do not know what is causing it. Our sys admin said that UW-IT has been contacted to determine if it's a UW network issue. Sorry for any inconvenience. |
68)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 81434)
Posted 14 Apr 2017 by David E K Post: I'll take a look. Sorry for being late on this. |
69)
Message boards :
Number crunching :
Minirosetta android 3.83
(Message 81412)
Posted 6 Apr 2017 by David E K Post: I updated the validator to hopefully address this issue. I'll look into that one further. It should have been ok with the limits I set in the validator. thanks! |
70)
Message boards :
Number crunching :
Minirosetta android 3.83
(Message 81410)
Posted 6 Apr 2017 by David E K Post: I updated the validator to hopefully address this issue. |
71)
Message boards :
Number crunching :
Minirosetta android 3.83
(Message 81407)
Posted 5 Apr 2017 by David E K Post: Interesting. I'll check with the researcher who owns these jobs. Looks like the job created TOO MANY models so it failed validation. |
72)
Message boards :
Number crunching :
Minirosetta android 3.83
(Message 81404)
Posted 4 Apr 2017 by David E K Post: It seems like there are lots of v3.73 tasks in the buffer at your end. I've seen no v3.83 even download yet. I'll take a look. So far the success rate is high and 3.83 results are coming in. |
73)
Message boards :
Number crunching :
Android - Immediate Computation Error, over and over again
(Message 81400)
Posted 4 Apr 2017 by David E K Post: The stable release is out now for android platforms. |
74)
Message boards :
Number crunching :
Minirosetta android 3.83
(Message 81399)
Posted 3 Apr 2017 by David E K Post: Please post issues/bugs regarding the minirosetta android version 3.83 application in this thread. |
75)
Message boards :
Number crunching :
Android - Immediate Computation Error, over and over again
(Message 81397)
Posted 1 Apr 2017 by David E K Post: Yep, I updated the app last night and hopefully this version will be stable for Android 4-7+ versions. Look for an update on R@h soon! I'll let everyone know. It's looking good so far. We are testing some protocols/jobs this weekend and if everything looks good, we'll push it out on Monday. |
76)
Message boards :
Number crunching :
upcoming 3-day-long challenge (April 5-8) -> enough WUs available ?
(Message 81396)
Posted 1 Apr 2017 by David E K Post: Hello, Plenty of work units. There's a huge month long backlog. Thanks! |
77)
Message boards :
Number crunching :
Android - Immediate Computation Error, over and over again
(Message 81390)
Posted 30 Mar 2017 by David E K Post: Yep, I updated the app last night and hopefully this version will be stable for Android 4-7+ versions. Look for an update on R@h soon! |
78)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 81386)
Posted 28 Mar 2017 by David E K Post: My mac is running well. Are others having mac client issues? I'm not sure what's happening but I'll look into it. |
79)
Message boards :
Number crunching :
RALPH@Home ?
(Message 81373)
Posted 27 Mar 2017 by David E K Post: I noticed that over the weekend and contacted our sys admins. Hopefully they'll be able to get it back up soon. Thanks. |
80)
Message boards :
Number crunching :
Android - Immediate Computation Error, over and over again
(Message 81356)
Posted 22 Mar 2017 by David E K Post: I'm still working on debugging the android app for android 7+ devices. The current version testing on Ralph@home appears to be pretty stable for 4+, 5+, and 6+ android devices. Please attach to this project manually using http://ralph.bakerlab.org as the project URL via the boinc client on your phone if you'd like to join to help the testing phase. Once I figure out the issue with android 7+ devices and test it on Ralph@home, I'll post the updated fixed app on Rosetta@home. Thanks! |
©2024 University of Washington
https://www.bakerlab.org