Message boards : Cafe Rosetta : Other projects.
Previous · 1 · 2 · 3 · 4 · 5
| Author | Message |
|---|---|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
I took a very rare visit to the WCG forums to see what was being said and the credit increase was almost completely ignored All recent credits have been withdrawn and, having re-read their forum thread, I think they actually had it sussed quite early on, but I couldn't follow the discussion well enough first time around. Let's hope their second attempt goes better (and mine tbf) My Boinc WCG credits are back down from 54.5m to where they started before the weekend - 35,655,547
|
|
mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,896,559 RAC: 209 |
On a linux box ( - mine is Linux Mint 22.3), an easy way to poll for new rosetta tasks is using https://www.sidock.si/sidock/ is doing some Chemistry related stuff as well |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
SiDock has just come back. All transfers uploaded, tasks sent and 8 tasks received back. WCG still down and Rosetta still hit-and-hope
|
|
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 442 Credit: 15,623,340 RAC: 19,647 |
A new update from WCG :- April 9, 2026 BOINC web traffic has been blocked at the load balancer for maintenance, all BOINC scheduler, downloads, uploads will be met with HTTP 503 codes until maintenance is completed - we expect completion between April 10th and April 11th, but no earlier than 18:00 UTC on April 10th to allow projected file transfers and migration of sharded database table records between citus postgres workers to complete. We will update here and on the forums if we expect extended maintenance over the weekend. Volunteers should expect that a successful rollout during this maintenance window will increase workunit availability, and put another dent in the validation backlog. |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
A new update from WCG Not back yet, so I'll keep my fingers crossed for tomorrow Meanwhile, SiDock went back down again not long after my first results were sent back. And some of the tasks that got sent back initially when it came back up missed deadline and weren't credited from the look of things. Crunching tasks is like pulling teeth at the moment, even if we finally get sent some work to do Very frustrating
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
A new update from WCG <sigh>
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
WCG April 13, 2026 We are aware of the web site and forum issues - looking into it. Our certificates are valid. We aren't back online - yet - we are still waiting for some answers from UT and UHN about the cause of this issue. We will update once we have some answers.
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
WCG April 13, 2026 All my outstanding WCG transfers went through yesterday, but tasks weren't able to upload. This evening they uploaded. No new tasks have come done yet, though. In the meantime I'm still only sneaking the occasional one or two Rosetta tasks - not all of which run successfully either. While completed SiDock tasks are still failing to upload and every attempt leads to a 24hr backoff
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
All my outstanding WCG transfers went through yesterday, but tasks weren't able to upload. Tasks coming down now - 138 at a first grab for me
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
All my outstanding WCG transfers went through yesterday, but tasks weren't able to upload. WCG tasks are still coming down However, nothing seems to be transferring back, so tasks are piling up with a status of 'uploading' Glancing at the WCG forums I now realise a huge credit update has gone to Boincstats Boincstats seems to have an issue between hosts and teams I've had an update totalling 2.85m across 3 hosts, but my team update is only 987k
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
However, nothing seems to be transferring back, so tasks are piling up with a status of 'uploading' Transfers went through about an hour ago and new tasks started arriving just now Some downloads failed, I suspect because many hosts are all polling at the same time I got 143 tasks, then a further 20, so availability seems high, for the moment at least
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
However, nothing seems to be transferring back, so tasks are piling up with a status of 'uploading' Now getting "HTTP service unavailable" again, so it's not currently reliable Perhaps give it an extra few hours before piling in for new tasks, although uploading transfers seems ok for the moment
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2578 Credit: 47,209,648 RAC: 1,295 |
Sorry - a bit late on this from WCG April 17, 2026 More problems - this time issues with the data center. Website/Forum Outage - possibly due to a service interruption in our cloud environment that hosting is working to fix and prevents us from accessing our running instances for maintenance, and may be responsible for other issues although this is currently unclear. Hosting provided an ETA of 1h at 19:00 UTC today April 17th, 2026, and we will keep volunteers posted as we get information and attempt to come back online. BOINC Backend Outage Ongoing after brief success window on Wednesday, April 15th, 2026 - we enabled the BOINC stats dump after seeing the new architecture handle load and fixing the 503 upload issue. However, in attempting to fix the 404s on the download path by rebuilding the input files and writing them to the tmpfs cache from which downloads are served, we pushed several nodes into various failure states such as SUnreclaim slab memory exhaustion due to the overhead of writing each file to tmpfs en masse, and ill-advised queries run against postgres before EXPLAIN and without paging results to disk that backed up everything else waiting on postgres and caused soft lockup on the node. In addition to issues with the io_method = 'io_uring' setting in combination with our network-attached volumes for the postgres datadir, a setting we may have to change and evaluate before restarting. The naive "should be backup tonight" note in the forums by the WCG Tech was based on having recovered from one of these soft lockups many times in the past few weeks, and before understanding the reason for the initial crash or causing the later crashes on other nodes while attempting various methods of safely regenerating workunits that were throwing 404s on download after being assigned by the scheduler. We will bring the system online as soon as we are confident the results the scheduler will assign have download URLs with files at those paths, and the cluster is stable again.
|
Message boards :
Cafe Rosetta :
Other projects.
©2026 University of Washington
https://www.bakerlab.org