Problems with web site

Message boards : Number crunching : Problems with web site

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 54
Message 58334 - Posted: 1 Jan 2009, 21:43:51 UTC

I recently upgraded my version of BOINC from 5.10.45 to 6.2.28. Before that, I had selected 14 CPU hours as the preferred workunit length. Both of the Rosetta@home workunits I've downloaded since the BOINC upgrade came with an expected length of 29 hours. Are you out of 14 hour workunits? Or does Rosetta@home have problems with the combination of BOINC 6.2.28 and a selection of 14 hours as the preferred length?

1/1/2009 1:08:22 PM|rosetta@home|Sending scheduler request: To fetch work. Requesting 15375 seconds of work, reporting 0 completed tasks
1/1/2009 1:08:27 PM|rosetta@home|Scheduler request succeeded: got 1 new tasks
1/1/2009 1:08:29 PM|rosetta@home|Started download of mpzn_vvaa1t3kA09_05.200_v1_3.gz
1/1/2009 1:08:29 PM|rosetta@home|Started download of mpzn_vvaa1t3kA03_05.200_v1_3.gz
1/1/2009 1:08:52 PM|rosetta@home|Finished download of mpzn_vvaa1t3kA03_05.200_v1_3.gz
1/1/2009 1:08:52 PM|rosetta@home|Started download of mpzn_vv1t3kA.psipred_ss2.gz
1/1/2009 1:08:53 PM|rosetta@home|Finished download of mpzn_vv1t3kA.psipred_ss2.gz
1/1/2009 1:08:53 PM|rosetta@home|Started download of mpzn_vv1t3kA.pdb.gz
1/1/2009 1:08:55 PM|rosetta@home|Finished download of mpzn_vv1t3kA.pdb.gz
1/1/2009 1:09:45 PM|rosetta@home|Finished download of mpzn_vvaa1t3kA09_05.200_v1_3.gz


1/1/2009 2:47:43 PM|rosetta@home|Starting 1t3kA_BOINC_MPZN_vanilla_abrelax_5901_13195_1
1/1/2009 2:47:44 PM|rosetta@home|Starting task 1t3kA_BOINC_MPZN_vanilla_abrelax_5901_13195_1 using minirosetta version 147
1/1/2009 2:47:46 PM|rosetta@home|Sending scheduler request: To fetch work. Requesting 3208 seconds of work, reporting 0 completed tasks
1/1/2009 2:47:51 PM|rosetta@home|Scheduler request succeeded: got 1 new tasks
1/1/2009 2:47:53 PM|rosetta@home|Started download of boinc_mfr_aawd20_03_05.200_v1_3.gz
1/1/2009 2:47:53 PM|rosetta@home|Started download of boinc_mfr_aawd20_09_05.200_v1_3.gz
1/1/2009 2:48:01 PM|rosetta@home|Finished download of boinc_mfr_aawd20_03_05.200_v1_3.gz
1/1/2009 2:48:01 PM|rosetta@home|Started download of wd20_.fasta
1/1/2009 2:48:02 PM|rosetta@home|Finished download of wd20_.fasta
1/1/2009 2:48:02 PM|rosetta@home|Started download of boinc_description_file.txt
1/1/2009 2:48:03 PM|rosetta@home|Finished download of boinc_description_file.txt
1/1/2009 2:48:03 PM|rosetta@home|Started download of wd20.pdb
1/1/2009 2:48:06 PM|rosetta@home|Finished download of wd20.pdb
1/1/2009 2:48:06 PM|rosetta@home|Started download of wd202.pdb
1/1/2009 2:48:09 PM|rosetta@home|Finished download of wd202.pdb
1/1/2009 2:48:16 PM|rosetta@home|Finished download of boinc_mfr_aawd20_09_05.200_v1_3.gz


ID: 58334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 54
Message 58336 - Posted: 1 Jan 2009, 21:57:30 UTC - in response to Message 57446.  

Mod.Sense suggested a quick fix for this by using the Redirect directive on our webserver. The main question now is does the client handle redirects so please let me know if this fixes the issue or not.

Thanks Mod.Sense!


Note that I've seen messages of other BOINC projects indicating that if you make certain updates to the selection of servers, just one update is not enough to make clients download the file specifying which servers to use; instead, it takes about 10 updates in a row that try to download at least one workunit, but don't succeed in downloading any at all.
ID: 58336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 73
Credit: 4,935,548
RAC: 4
Message 58447 - Posted: 4 Jan 2009, 6:29:10 UTC

I just realized my CPU usage was essentially nil, and when I checked BOINC, the message log shows it's been requesting tasks from Rosetta all day (since 8:15 AM local) and never gotten any tasks to work on.

I'm running BOINC Manager 6.2.19 on Windows XP SP2, with 4 GB of RAM and a 3.2 GHz Pentium 4 CPU. I've never seen this problem before. Every request for tasks gets this:

01/03/2009 10:23:32 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 60480 seconds of work, reporting 0 completed tasks
01/03/2009 10:23:37 PM|rosetta@home|Scheduler request succeeded: got 0 new tasks

Is there something wrong with requesting 60480 seconds of work? Do I need to upgrade BOINC Manager again? Have you no tasks available? Help?
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 58447 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5652
Credit: 5,622,096
RAC: 0
Message 58452 - Posted: 4 Jan 2009, 9:33:09 UTC - in response to Message 58447.  

There is no work available for the time being due to a system problem. Perhaps by Monday morning sometime Pacific time the team will correct this problem. Hang in there. See the server problems and not getting any work threads to follow whats going on. Also you can tell by the home page stats if there is any work or not, also look at the server status page via the link on the lower left hand side of the home page.


I just realized my CPU usage was essentially nil, and when I checked BOINC, the message log shows it's been requesting tasks from Rosetta all day (since 8:15 AM local) and never gotten any tasks to work on.

I'm running BOINC Manager 6.2.19 on Windows XP SP2, with 4 GB of RAM and a 3.2 GHz Pentium 4 CPU. I've never seen this problem before. Every request for tasks gets this:

01/03/2009 10:23:32 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 60480 seconds of work, reporting 0 completed tasks
01/03/2009 10:23:37 PM|rosetta@home|Scheduler request succeeded: got 0 new tasks

Is there something wrong with requesting 60480 seconds of work? Do I need to upgrade BOINC Manager again? Have you no tasks available? Help?

ID: 58452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 54
Message 58455 - Posted: 4 Jan 2009, 11:13:40 UTC - in response to Message 58447.  

I just realized my CPU usage was essentially nil, and when I checked BOINC, the message log shows it's been requesting tasks from Rosetta all day (since 8:15 AM local) and never gotten any tasks to work on.

I'm running BOINC Manager 6.2.19 on Windows XP SP2, with 4 GB of RAM and a 3.2 GHz Pentium 4 CPU. I've never seen this problem before. Every request for tasks gets this:

01/03/2009 10:23:32 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 60480 seconds of work, reporting 0 completed tasks
01/03/2009 10:23:37 PM|rosetta@home|Scheduler request succeeded: got 0 new tasks

Is there something wrong with requesting 60480 seconds of work? Do I need to upgrade BOINC Manager again? Have you no tasks available? Help?


The systems status shows that one of the two work generator programs is down, and the other one can't keep up with the demand for more workunits; there were only 22 workunits available the last I looked, not necessarily including any suitable for your machine.

For cases like this, you might want to add another project rather reliable at supplying workunits, but give it only a small fraction of your available CPU time:

http://boinc.fzk.de/poem/

This one, at least, offers rather short workunits compared to many BOINC projects. Also, it is working on protein folding, as Rosetta@home is. Doesn't claim to be helping any specific diseases, though.

Giving it only a small share of your available CPU time means that even if you keep it enabled, it normally won't use much of your machine, but it will still fill in any times when you can't get workunits from Rosetta@home.
ID: 58455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1887
Credit: 6,129,451
RAC: 3
Message 58457 - Posted: 4 Jan 2009, 11:28:26 UTC - in response to Message 58455.  

For cases like this, you might want to add another project rather reliable at supplying workunits, but give it only a small fraction of your available CPU time:

http://boinc.fzk.de/poem/

This one, at least, offers rather short workunits compared to many BOINC projects. Also, it is working on protein folding, as Rosetta@home is. Doesn't claim to be helping any specific diseases, though.


Thanks I was looking for another Project with short turn around times AND was doing something for Science. I always thought Poem had to to do with poetry, silly me!!! I couldn't right poetry if my life depended on it, so always ignored the site. Thank again!!!
ID: 58457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 54
Message 58465 - Posted: 4 Jan 2009, 13:15:28 UTC - in response to Message 58457.  
Last modified: 4 Jan 2009, 13:16:41 UTC

For cases like this, you might want to add another project rather reliable at supplying workunits, but give it only a small fraction of your available CPU time:

http://boinc.fzk.de/poem/

This one, at least, offers rather short workunits compared to many BOINC projects. Also, it is working on protein folding, as Rosetta@home is. Doesn't claim to be helping any specific diseases, though.


Thanks I was looking for another Project with short turn around times AND was doing something for Science. I always thought Poem had to to do with poetry, silly me!!! I couldn't right poetry if my life depended on it, so always ignored the site. Thank again!!!


You're welcome.

The SIMAP site also has short workunits, but is only active perhaps one week every month. Active today, though.

http://boinc.bio.wzw.tum.de/boincsimap/

The malariacontrol.net site also has short workunits, but is going less active and often doesn't have any workunits available.

http://www.malariacontrol.net/

Neither is as closely related to what Rosetta@home is doing, though.

You might also want to watch Cels@home. Currently inactive for some changes including moving to another server, and with fewer workunits than requested even when it was active. Typical workunits were about 6 CPU hours on my machine.

http://cels-at-home-dev.dyndns.org/cels/

This website is related to Cels@home, but it's not very clear if that's where they plan to move:

http://ficp.engr.utexas.edu/cels/

Predictor@home has been essentially inactive for several months; I don't know how long the workunits are.

World Community Grid tends to provide long workunits, but is about to go inactive in order to move to another servers site.
ID: 58465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1916
Credit: 35,478,949
RAC: 811
Message 58487 - Posted: 4 Jan 2009, 18:23:00 UTC - in response to Message 58457.  

I couldn't right poetry if my life depended on it...

Surely you mean you couldn't write poetry if your life depended on it. That sounds right.

Ok, sorry. I'm waiting for work and have nothing better to say. You may beat me with a stick...
ID: 58487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
upstatelabs

Send message
Joined: 22 Jun 06
Posts: 10
Credit: 516,767
RAC: 0
Message 58700 - Posted: 9 Jan 2009, 15:56:18 UTC

Seems as there aren't any new WUs available....

Any idea when it'll be back to normal?
ID: 58700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5652
Credit: 5,622,096
RAC: 0
Message 58702 - Posted: 9 Jan 2009, 17:22:34 UTC - in response to Message 58700.  

Seems as there aren't any new WUs available....

Any idea when it'll be back to normal?



no one but the team does, its 920am back in Seattle so they should be looking into the problem by now.
ID: 58702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
upstatelabs

Send message
Joined: 22 Jun 06
Posts: 10
Credit: 516,767
RAC: 0
Message 58703 - Posted: 9 Jan 2009, 19:34:12 UTC - in response to Message 58702.  

no one but the team does, its 920am back in Seattle so they should be looking into the problem by now.


Things seem to be fixed... I have WUs again :)


ID: 58703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 58708 - Posted: 9 Jan 2009, 22:16:24 UTC

I have not been able to tell if they are issuing again, but WCG is on the air and taking work back and the web site is up again ...

I am not sure if they are issuing work again or not ... I can't tell from my buffers ... and I just lowered their priority so I am probably not asking for work yet ...

But most of their projects are Bio related ...
ID: 58708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 54
Message 58713 - Posted: 10 Jan 2009, 6:02:05 UTC - in response to Message 58708.  

I have not been able to tell if they are issuing again, but WCG is on the air and taking work back and the web site is up again ...

I am not sure if they are issuing work again or not ... I can't tell from my buffers ... and I just lowered their priority so I am probably not asking for work yet ...

But most of their projects are Bio related ...


They are issuing workunits again, although perhaps not as many. They're still checking for any unexpected effects of their new environment, such as running their server software on new servers.
ID: 58713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kurre

Send message
Joined: 12 Apr 06
Posts: 9
Credit: 69,240
RAC: 0
Message 58806 - Posted: 14 Jan 2009, 15:20:27 UTC

My comp cant report work. This is the last error indication i could find in my log's. I have about 5 workunits that my comp tried to report but couldn't. They just dissapare from my comp and must be floating out there somewhere in the big cloud.
Is it my instalation or is this an general issue????

Running boinc 6.4.5

2009-01-14 15:32:23||Internet access OK - project servers may be temporarily down.
2009-01-14 15:32:23|rosetta@home|Finished upload of abinitio_norelax_homfrag_129_B_2ccvA_SAVE_ALL_OUT_4626_12633_0_0
2009-01-14 15:32:25|rosetta@home|Scheduler request failed: Transferred a partial file
2009-01-14 15:33:25|rosetta@home|Sending scheduler request: To fetch work.
ID: 58806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58808 - Posted: 14 Jan 2009, 16:07:46 UTC

If the tasks were removed from the list in your BOINC Manager, then another scheduler request went through successfully. BOINC will automatically retry until it goes through.
Rosetta Moderator: Mod.Sense
ID: 58808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kurre

Send message
Joined: 12 Apr 06
Posts: 9
Credit: 69,240
RAC: 0
Message 58810 - Posted: 14 Jan 2009, 16:44:53 UTC - in response to Message 58808.  
Last modified: 14 Jan 2009, 16:47:44 UTC

If the tasks were removed from the list in your BOINC Manager, then another scheduler request went through successfully. BOINC will automatically retry until it goes through.


The thing is that they dissapare from my lokal boinc client but they are still marked as in progress and new at the website. This is an example
220365003 200760705 12 Jan 2009 19:28:50 UTC 22 Jan 2009 19:28:50 UTC In Progress Unknown New
ID: 58810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58813 - Posted: 14 Jan 2009, 18:26:48 UTC

The only way I know of for tasks to show on the website, but not appear on your local BOINC Manager display is when your machine never received the work in the first place. The server thinks that it assigned it to you, but your machine never saw it. Some people called these "ghost WUs".

Many of the task names are very similar. The simplest way to tell them apart is by the numbers at the very end of the name. Are you certain these are the tasks that you completed?

Here is a link to your host.
Rosetta Moderator: Mod.Sense
ID: 58813 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kurre

Send message
Joined: 12 Apr 06
Posts: 9
Credit: 69,240
RAC: 0
Message 58814 - Posted: 14 Jan 2009, 19:29:25 UTC - in response to Message 58813.  
Last modified: 14 Jan 2009, 19:36:38 UTC

The only way I know of for tasks to show on the website, but not appear on your local BOINC Manager display is when your machine never received the work in the first place. The server thinks that it assigned it to you, but your machine never saw it. Some people called these "ghost WUs".

Many of the task names are very similar. The simplest way to tell them apart is by the numbers at the very end of the name. Are you certain these are the tasks that you completed?

Here is a link to your host.


Seems like I have 2 different problems
Today i had same problem that i had a few years ago server was down but my client didn't care about that and just fluched the result.

13-Jan-2009 17:28:02 [rosetta@home] Finished upload of t075_1_NMRREF_1_t075_1_S_00002_0000200IGNORE_THE_REST_070000_6211_20_0_0
13-Jan-2009 20:50:14 [rosetta@home] Finished upload of abrelax_nofilter_-1n0u_-SAVE_ALL_OUT_6206_14591_0_0
13-Jan-2009 22:07:34 [rosetta@home] Finished upload of MaR214A_t071_1_RDC_NMR_NESG_SAVE_ALL_OUT_6215_13455_0_0
14-Jan-2009 15:32:23 [rosetta@home] Finished upload of abinitio_norelax_homfrag_129_B_2ccvA_SAVE_ALL_OUT_4626_12633_0_0
14-Jan-2009 17:21:43 [rosetta@home] Finished upload of abinitio_norelax_homfrag_129_B_1ten__SAVE_ALL_OUT_4626_10396_0_0
14-Jan-2009 19:25:12 [rosetta@home] Finished upload of t076_1_NMRREF_1_t076_1_idid_model_05_coreIGNORE_THE_REST_idl_6217_7848_0_0


Then in another log i found these notes that indicates problems at reboot or uncontrolled shutdowns of the client. Had that fealing before today that the error happened around or after reboots. Had to reboot the comp some times after patches and upgades after a 3 weeks vacation.

cant find C:ProgramBOINC\RebootPending.txt

Lets hope that the doubble \ just is a error in the logstring ;-)

So no i can't be sure yet that my client had got all those jobs has to dig a bit deeper into the logfile's before i can say that but i haven't the time for that right now.

Can there be some problems in the code that handle restart of the workunits. Probably an commom pease of code shared by all your projects.
ID: 58814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58815 - Posted: 14 Jan 2009, 20:23:07 UTC

Mike posted yesterday some of the issues they have been working on for the next release.

I don't know the details of exactly what he means, but as I read his comment about "Bug fix in checkpointing machinery, states were not being correctly restored", I would say it sounds possible that this is exactly what you are talking about. If so, then yes, some problems were uncovered, and fixes are being tested and should be available in the next release.
Rosetta Moderator: Mod.Sense
ID: 58815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kurre

Send message
Joined: 12 Apr 06
Posts: 9
Credit: 69,240
RAC: 0
Message 58818 - Posted: 14 Jan 2009, 23:00:59 UTC - in response to Message 58815.  

Mike posted yesterday some of the issues they have been working on for the next release.

I don't know the details of exactly what he means, but as I read his comment about "Bug fix in checkpointing machinery, states were not being correctly restored", I would say it sounds possible that this is exactly what you are talking about. If so, then yes, some problems were uncovered, and fixes are being tested and should be available in the next release.


Ah ok the one that talking about instability in handling textfiles might fit on my earlier problem, because my machine get an error no 2 from XP. Can't find file that is and it's textfiles that don't seems to be handeled ok when a restart is done. And it seems to affects any WU:s so it's probably fixed now.

And the error i had today is probably a harder one to isolate because it might not be easily reprodusable. Your server or my connection has to be in a special state so the boinc client thinks it's ok until it's to late (files already deleted from my client). It's not easy to get a two face commit to work properly or what it's called today.

Some wasted cputime but you can't get them all.
ID: 58818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next

Message boards : Number crunching : Problems with web site



©2023 University of Washington
https://www.bakerlab.org