I am getting intermittent 'No work from project' replies from the Rosetta server. Last time it took 3 attempts of my client to get work. Yet I see in the status that 89,369 wu are queued. What is causing this? I have my work queue set a 0.1 days. But with this going on it might be safer to take a higher number to avoid running dry.
____________ BOINC.BE: For Belgians who love the smell of glowing red cpu's in the morning Tutta55's Lair
'Huston, we have a problem'.....no download, no work and I notice that the 'Queued' have exceeded 100,000........help! One box dry already.
____________
I have 5 PC's running Rosetta. Two are running BOINC 5.2.2 and have plenty of work. One is running 4.68 and is also getting work. The other two are running 4.45 and suddenly cannot get work (so they're crunching SETI and Predictor instead).
Not sure if it means anything, probably coincidence?
Edit: Just to clarify, I have 4 boxes, currently away so only able to monitor 1. It has 20 WU queued, but usually can I set the 'network update' to say, 3 days, and still get another 40 WU on top of that. Right now I can't (it says 'no new work available'.
____________
Team CFVault.com http://www.cfvault.com
I have 5 PC's running Rosetta. Two are running BOINC 5.2.2 and have plenty of work. One is running 4.68 and is also getting work. The other two are running 4.45 and suddenly cannot get work (so they're crunching SETI and Predictor instead).
Not sure if it means anything, probably coincidence?
Actually this reminds me of something I thought about the other day - potentially someone with a large number of boxes (say, the behemoth Housing and Food Services) could set their network settings at 10 days (the max) and update all their boxes at the same time. Surely that would suck the project dry of WUs. Or would it? Just a thought.
____________
Team CFVault.com http://www.cfvault.com
I managed to get a handful of WU for these two PCs - hopefully there will be more in an hour or two when they run dry again
For what it is worth: I'm on v5.2.5 and have been getting work regularly for days. Suddenly overnight I received only 2 WU mixed in with the No Work From Project messages. LHC has no work and Rosetta has no work (supposedly) so I am down to one project at work.
____________
I have no Rosetta wu's either, just the No work... message. Core client 4.25. Server status now shows 165,000+ wu's queued.
____________ Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
I'm getting new WU's from time to time and getting no work the rest of the time. My backup project (seti) has kicked in when I have no RAH work.
Remember, the servers are on the west coast of the U.S. As I type this from the east coast it is 10:30 AM here and 7:30 AM there. The people are probably just getting into work now so give them a chance to size up the situation and fix it. Probabby will be just a restart of the servers.
I'm using Trux's 5.3.1 client, and I've noticed that I'd get the "No work..." message , and then at 10:05 EST, I downloaded 2 WU, and since then I get the "No work..." msg again. :(
Looking at my msg log, I noticed that I started get no work at around 06:45 EST (11:45 UTC, i believe) this morning.
When did everyone else start seeing the "No work..." msg?
____________
I'm using Trux's 5.3.1 client, and I've noticed that I'd get the "No work..." message , and then at 10:05 EST, I downloaded 2 WU, and since then I get the "No work..." msg again. :(
Looking at my msg log, I noticed that I started get no work at around 06:45 EST (11:45 UTC, i believe) this morning.
When did everyone else start seeing the "No work..." msg?
Yeah, that's it, it's dry as the desert right now. No more wus. On the plus side of thing it's the first time some of my computers cpus temperature went below 60 celcius, and it's noticeably cooler in the room :)
BTW - this should be reflected on boincstats tonight - should see a noticeable drop in returned WUs.
____________
Team CFVault.com http://www.cfvault.com
November 1, 2005
Give your computer some variety - check out Rosetta@home. This new BOINC-based project aims to solve the ab initio protein structure prediction problem, and to design new chemical catalysts and potential HIV vaccines. Participants who found the lowest energy structures have already been acknowledged in a scientific paper describing the results.
Maybe there's a surge of new subscribers causing a serious backlog in work production.
____________ BOINC.BE: For Belgians who love the smell of glowing red cpu's in the morning Tutta55's Lair
Getting same thing here. It first asked for work at 15:30 UTC today and got the "No work from project" message. I've also noticed that the server status on Rosetta's home page says "Server status as of 30 Oct 2005 13:42:37 UTC".
Is this a related problem or just a coincidence?
____________
02/11/2005 19:03:43|rosetta@home|Message from server: Server has software problem
02/11/2005 19:03:43|rosetta@home|Project is down
02/11/2005 19:03:48|rosetta@home|Deferring communication with project for 59 minutes and 54 seconds
I just noticed that too, now it's back. I think they may have been fixing the issue. Seems ok now...fingers crossed. I'll check the rest of my clients..
My thought also. Is everyone asleep at the switchboard? What good is donating unused CPU cycles if they continue to remain unused? I'm being sarcastic of course, I have another project to churn. Still can't be good for the Rosetta project to leave the hosts in limbo. Rosetta, EAH thanks you!
One ran out and switched to CPDN. (usually runs Rosetta 75% of time, SETI and CPDN get smaller amounts). Was not watching to see if that was intermittent or just stopped getting Rosetta.
Other system usually runs CPDN most the time but has 17 Rosetta WU's ( half 1pvaA, the long running ones ), suspended CPDN on this one to run those Rosetta units. This system usually runs CPDN about 65%, most of the rest goes to Rosetta and a small amount to SETI. Sure wish that I could send some of these to the other system.
Will be good when the 1st can download again and get these two back to what thet usually do. But for now SETI seems to get the most benifit of the Rosetta not handing out work, the RAC there has dropped a lot with doing a lot more Rosetta and CPDN anyways.
I'm new :P
Trying to switch-over my puters to R@H because FaD is closing down as of Dec 16 ...
Hence, a lot of FaD crunchers is looking for a new project. Most want a medical tainted project. Could be Rosetta, or Predictor or even Folding. Who'll tell :)
I made my choice ;)
____________
ID: 2074 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
Looks like the database load is high. Working on a fix now.
>Exactly! The first indication that trouble was brewing was appromimately 2 AM their local time this morning. I'm sure if your system went down at that time it would take until the next morning for you to react as well. To my knowledge, this is the first time this problem has appeared at R@H and I hope they have it up soon.
____________
Yes we're impatient. That's easy when isolated in our own little fiefdoms and rely solely on message boards to know what is going on, to get comfort in knowing that someone else is sharing the same experience or not, to get the attention of the project team. I would have expected at least an acknowledgment of the problem before now in spite of the difference in time zones. Don't those dedicated university types show up to work at 6 am? (chuckle, chuckle).
____________
"11/2/2005 11:12:43 AM|rosetta@home|Message from server: Project is temporarily shut down for maintenance
11/2/2005 11:12:43 AM|rosetta@home|Project is down"
I still got plenty of WU's, so I'm not worried, and got Seti and Seti Beta to crunch too, so no big deal here. :-)
Jeremy
____________
ID: 2084 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
I was doing a server restart. The project may be down intermittently (for just a few minutes each time) for the next hour or so as I purge the database.
ID: 2086 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
I hope we have not reached the maximum capacity :o
Nothing to worry about. Just have to purge the database of old WUs.
....... I would have expected at least an acknowledgment of the problem before now in spite of the difference in time zones. Don't those dedicated university types show up to work at 6 am? (chuckle, chuckle).[/quote]
>Can't argue about the acknowlegment part. They usually do communicate very well, however, maybe they can't communicate and scratch their heads at the same time?:).....cheers, Rog.
P.S.... I see David is on the job.....thanks for the info!
____________
I'm not getting any work either, but I'm also seeing a message that the project is down. The message also says that there is a server problem.
The main project page seems to show that the project is still up and running.
Same here.
Last night it started,got a couple w/u today,last night got to end of 3hour w/u would not finish,Grrrr,Refuse to get 5.2.6.Enough is enough Boinc,get your Shxt
Together.3 Releases in 4 days.If I cannot get work.Toooooo Badddd!
Doug
____________
I guess you may be sleeping now or at least not at work, but it looks like the problem still isn't solved. None of my PC's are getting any work from Rosetta. Maybe a short announcement on the home page would be appropriate if it's going to take a while to fix?
Was able to upload a WU just fine about 5pm PST, my computer has not requested any more work yet from Rosetta. But, as I said in my last post, i still have plenty of WU's for rosetta, since my machine doesn't crunch them that fast, and I also have other projects that take their fair share. Way I look at it, is it gets fixed when it gets fixed. The "big guys" know whats wrong, as David has already said, and when its fixed, I'm more than certain he will let us all know, or you will start getting more WU's. Pretty simple there I think. :-) Happy crunching.
A bit tense are we? Just leave it attached with the other projects and let the client work it out.
I'm thinking the same. I can't get any WU's from Seti either, but LHC have created WU's I'm crunching right now. And since LHC has 50 % of my share, I guess the debt is in it's favour. So let me crunch LHC WU's untill the people at Rosetta and Seti has sorted it out or my BOINC manager does.
That's the benefit of BOINC, one can crunch more than one project and never have an idle processor!
____________
"I'm trying to maintain a shred of dignity in this world." - Me
I attached a Linux box and a Windoze box a few hours back. Per the stats, there seems to be plenty of work ready to send, but on both systems, I'm getting "no work available", now every 10-20 minutes.
Is there a problem with the servers, or maybe the network?
Something's not right, and it would be nice to understand if I should retry tomorrow, next week, next month, or next year.
Is there a problem with the servers, or maybe the network?
Something's not right, and it would be nice to understand if I should retry tomorrow, next week, next month, or next year.
Thanks in advance.
Read David Kim's posts above. It looks like they are having to purge the database of completed work units. Might take a few hours I'd guess. Might want to give it another try tomorrow or a little later this evening. They are on Pacific (Washingtion state) time, so might still get it back up tonight though. I just fired up Einstein for the evening or until Rosetta comes back. Plenty of work, the database just choked on the unpurged finished work units.
____________ Team MacNN - The best Macintosh team ever.
I managed to download a WU, but now I get this message:
11/3/2005 7:41:45 AM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
11/3/2005 7:41:45 AM|rosetta@home|Reason: Requested by user
11/3/2005 7:41:45 AM|rosetta@home|Requesting 8640 seconds of new work
11/3/2005 7:42:00 AM|rosetta@home|Scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 11/3/2005 7:42:00 AM|rosetta@home|Message from server: Server has software problem
11/3/2005 7:42:00 AM|rosetta@home|Project is down
11/3/2005 7:42:05 AM|rosetta@home|Deferring communication with project for 59 minutes and 54 seconds
____________
"I'm trying to maintain a shred of dignity in this world." - Me
=-( 1 machine dry. 3 running on fumes. No more work consistently now. I know I can let them download from other projects, but it will be with reluctance.
____________ BOINC.BE: For Belgians who love the smell of glowing red cpu's in the morning Tutta55's Lair
Just joined today, got 1 WU and now I'm having the same problem. "No work from project." Get it fixed or the 1 WU I have done will be the only one.
A bit tense are we? Just leave it attached with the other projects and let the client work it out.
Yes, I was a bit tense at the time. It wasn't the only thing I was having a problem with at that time. Other problem gone, and I am now waiting patiently until they get whatever is wrong fixed and my computer can resume work.
In the meantime I have five other projects going.
Yes, I was a bit tense at the time. It wasn't the only thing I was having a problem with at that time. Other problem gone, and I am now waiting patiently until they get whatever is wrong fixed and my computer can resume work.
In the meantime I have five other projects going.
Glad to hear that Poohbear.
And that's what Boinc is all about... If one project isn't producing work at least your other projects will keep you computer busy :)
____________
Got plunty of work last night but not today. Still have some on the system that runs mostely Rosetta, but LHC has work now so I set that to 50% until Rosetta is working correct again. LHC and CPDN will keep that system busy. Only 4 waiting to be done for Rosetta there.
Still have plunty of Rosetta on the other system that usually runs CPDN most of the time but going down, should last another 4 to 5 days on that system.
____________
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
Here is an update. The database is still being purged. It has gone through over 500,000 WU's and results but there is close to another 500,000 to go. The purging and archiving is also burdening the database server of course. I ask for peoples' patience as it is taking a while to write out the data into archive files. I do not want to just delete the WU records without archiving them so I can still link the structure predictions (output data) with who and what computer did them to give everyone feedback and credit.
We'll be sure to add new hardware in the future if necessary. I'd like to acknowledge our hard working servers with an image :)
Seems to be a long time to do the purging though? What is the bottleneck do you know ? I work with large volume DB's, mostly Oracle & DB2 mind, and I know this can't be a fair comparison, but I've run applications which unload millions of transactions in a matter of minutes.
I presume this is mysql or PostGres ? I don't want to serve up the obvious but are the tables optimized, is it IO writing to disk etc etc.
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
Hi Bok,
We use mysql and I haven't optimized the set up yet but I'm glad you mentioned it because that is one of the first things on my list to do. It is also writing a bunch of data to disk to generate archive files. I'm also printing out debug statements since this is the first time using the purge utility and I want to monitor it's progress.
Please let me know if anyone is still experiencing problems getting work. The database load is down but I am going to continue purging tonight.
I'm not able to get any work right now, nor am I able to report a handful of my last results.
EDIT: I see that the database is down, so you must be working on that purge at the moment. Thanks for keeping the server status page updated!
____________
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
Sorry, I'm still working on the database server. But hope to finish within the next hour.
ID: 2222 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
I am purging the database more overnight. The load on the database has reduced though so let me know if you are still experiencing problems getting work.
ID: 2226 | Rating: 0 | rate:
/
Carlos Joined: Oct 29 05 Posts: 1 ID: 7401 Credit: 114,043 RAC: 33
I am purging the database more overnight. The load on the database has reduced though so let me know if you are still experiencing problems getting work.
Time is Central Time Zone (USA)
11/3/2005 10:55:17 PM|rosetta@home|Fetching master file
11/3/2005 10:55:22 PM|rosetta@home|Master file download succeeded
11/3/2005 10:55:27 PM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
11/3/2005 10:55:27 PM|rosetta@home|Reason: To fetch work
11/3/2005 10:55:27 PM|rosetta@home|Requesting 43200 seconds of new work
11/3/2005 10:55:32 PM|rosetta@home|Scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
11/3/2005 10:55:32 PM|rosetta@home|No work from project
11/3/2005 10:55:37 PM|rosetta@home|Deferring communication with project for 54 seconds
11/3/2005 10:56:33 PM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
11/3/2005 10:56:33 PM|rosetta@home|Reason: To fetch work
11/3/2005 10:56:33 PM|rosetta@home|Requesting 43200 seconds of new work
11/3/2005 10:56:38 PM|rosetta@home|Scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
11/3/2005 10:56:38 PM|rosetta@home|No work from project
11/3/2005 10:56:43 PM|rosetta@home|Deferring communication with project for 54 seconds
11/3/2005 10:57:39 PM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
11/3/2005 10:57:39 PM|rosetta@home|Reason: To fetch work
11/3/2005 10:57:39 PM|rosetta@home|Requesting 43200 seconds of new work
11/3/2005 10:57:44 PM|rosetta@home|Scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
11/3/2005 10:57:44 PM|rosetta@home|No work from project
11/3/2005 10:57:49 PM|rosetta@home|Deferring communication with project for 54 seconds
I am purging the database more overnight. The load on the database has reduced though so let me know if you are still experiencing problems getting work.
Plenty of work now, thanks David.
I think the purge is now so far ahead that it purges results the minute they have come in and validated (and credit has been granted). All I see now on all my hosts is work units not yet completed (even minutes after they have been reported).
Hopefully these results aren't lost to the science?
____________
*** Join BOINC@Australia today ***
I am purging the database more overnight. The load on the database has reduced though so let me know if you are still experiencing problems getting work.
Plenty of work now, thanks David.
I think the purge is now so far ahead that it purges results the minute they have come in and validated (and credit has been granted). All I see now on all my hosts is work units not yet completed (even minutes after they have been reported).
Hopefully these results aren't lost to the science?
Same here,
Finish w/u gets uploaded given credit "Dissapears" Magic,into cyber space,I hope not?
Thanks
Doug
____________
I am purging the database more overnight. The load on the database has REDUCED though so let me know if you are still experiencing problems getting work.
But the homepage says:
Our database is being purged and old workunits and results are being archived. As a result, the load on the server is HIGH and work flow has been reduced.
I think the homepage should state "load on the server is HIGH" other way round -> "As a result, the load on the server is LOWER/REDUCED/etc."
My English is not very good but this put me in confusion on first reading :-)
____________
ID: 2272 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
The database purge has finished. Completed WU's and results should now be kept in the database for 1 week before they get purged and archived so users can view their results. WU's and results (and the credits granted) that have been purged and archived have not been lost. The credits for each user is totaled and kept in the database. The result output (structures and rmsd vs energy) can and will still be linked to users and computers for the results that have been archived. Thanks for everyone's patience!!
The database purge has finished. Completed WU's and results should now be kept in the database for 1 week before they get purged and archived so users can view their results. WU's and results (and the credits granted) that have been purged and archived have not been lost. The credits for each user is totaled and kept in the database. The result output (structures and rmsd vs energy) can and will still be linked to users and computers for the results that have been archived. Thanks for everyone's patience!!
The database purge has finished. Completed WU's and results should now be kept in the database for 1 week before they get purged and archived so users can view their results. WU's and results (and the credits granted) that have been purged and archived have not been lost. The credits for each user is totaled and kept in the database. The result output (structures and rmsd vs energy) can and will still be linked to users and computers for the results that have been archived. Thanks for everyone's patience!!
Your message is welcomed but I feel uneasy because I would like to know if the real reason for the downtime has been permanently remedied. For the period preceding the downtime I observed what seemed to be a large number of Work Units that ran perhaps 1/4 of the usual time (tests?). If the effect was to quadruple the number of WU processed in a given time period leading to a data base overflow, could we experience this phenomenon again or has it been resolved?
____________
The database purge has finished. Completed WU's and results should now be kept in the database for 1 week before they get purged and archived so users can view their results. WU's and results (and the credits granted) that have been purged and archived have not been lost. The credits for each user is totaled and kept in the database. The result output (structures and rmsd vs energy) can and will still be linked to users and computers for the results that have been archived. Thanks for everyone's patience!!
Dave,
Very good way to do it,
Database,was wondering how the changes were affecting the "Results" tab
and now understand.Thanks for the help explaining this Process.
Aslso have a great Weekend
Doug Worrall
____________
Your message is welcomed but I feel uneasy because I would like to know if the real reason for the downtime has been permanently remedied. For the period preceding the downtime I observed what seemed to be a large number of Work Units that ran perhaps 1/4 of the usual time (tests?). If the effect was to quadruple the number of WU processed in a given time period leading to a data base overflow, could we experience this phenomenon again or has it been resolved?
The database I believe had not been purged since the project started on BOINC, which included several weeks of short work units. Therefore if I may read between the lines, a weekly purge should remedy the problem, and if not I suppose the project team would make any necessary adjustments to prevent a similar "outage" in the future.
This is what happened with Einstein@home, and after the first "emergency" purge the project has been running very smoothly ever since. My two cents. :)
____________
Regards,
Bob P.
ID: 2294 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
Here are the steps we are or will be taking in the near future:
1. Weekly database purge and archive to keep the wu and results tables at managable sizes.
2. Adding more memory to the database server. Memory to fill out our server will be ordered today which will at least double the current size to 8gigs (we still have to check how many slots are available but an excess will be ordered regardless).
3. We are currently looking into getting a very beafy database server.
4. The size of work units will be increased (note however that they will still depend on factors like the size of the protein and what kind of prediction is being run).
We will also get any necessary hardware to handle increased demand.
After 5 failed attempts I finally got work. I'm a newbie and the signup process was a snap. I like the new BOINC 5.2.6. Email and password, now that's the way to go.
Do you have any plans yet for a regular schedule for maintenance (backups, purging old results, etc.) or is this still being assessed?
____________
ID: 2632 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
The state of our servers and expansion for the future is still being assessed. Two dual opterons are going to be ordered today and we are considering possibly getting a 16gig server. Once we have our architecture set and stable, we will set up a regular maintenance schedule for backups, purging, cleanup..etc. Our goal is to have a system that may be capable of handling Seti-like usage.