Warning: Don't shut down BOINC Manager..!!

Message boards : Number crunching : Warning: Don't shut down BOINC Manager..!!

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 4,100,301
RAC: 84
Message 74 - Posted: 17 Sep 2005, 12:37:02 UTC

Don't shut down the BOINC Manager or the WU will start over again at 0:00 Time ... This is in the first few minutes (about 10) I don't know if you let it run longer if the client_state.xml file will pick it up where you left off ..
ID: 74 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JimB
Avatar

Send message
Joined: 17 Sep 05
Posts: 19
Credit: 228,111
RAC: 0
Message 81 - Posted: 17 Sep 2005, 14:04:59 UTC - in response to Message 74.  

Don't shut down the BOINC Manager or the WU will start over again at 0:00 Time ... This is in the first few minutes (about 10) I don't know if you let it run longer if the client_state.xml file will pick it up where you left off ..


I had a momentary power outage, rebooted computer. Rosetta stayed at 26.67% done when it came back up. Could be the interval it saves its' work?


"Be all that you can be...considering." Harold Green
ID: 81 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 84 - Posted: 17 Sep 2005, 14:40:32 UTC
Last modified: 17 Sep 2005, 14:41:09 UTC

Too late - similar thing just happened to me after 10 hours crunching. I'll wait for the optimised Windows version before trying again.
ID: 84 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 87 - Posted: 17 Sep 2005, 15:18:56 UTC - in response to Message 84.  
Last modified: 17 Sep 2005, 15:22:24 UTC

Too late - similar thing just happened to me after 10 hours crunching. I'll wait for the optimised Windows version before trying again.


It will pick up on the structure where it last left off. It will give a 0% at the start but after it initializes, a few minutes to read in databases and load structures, it will continue as it was before it was stopped. It is, however, in relatively large intervals (structures, 12-15/workunit). The updated app will have the same behaviour.
ID: 87 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 88 - Posted: 17 Sep 2005, 15:36:26 UTC

With all projects, when you shut down and then restart, it will only start at the last checkpoint. If it takes a long time to get to the first checkpoint, well, then you will start back at zero.

Since I just attached to the project, it is not clear to me how it works yet ...

But, I am interested in that it seems to be another project with long run time work units. Though that may be a mistaken perspective. Hmmm, it looks like short deadlines too ... this may not be good ... :(
ID: 88 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [B^S] DonaldXP

Send message
Joined: 17 Sep 05
Posts: 1
Credit: 58,122
RAC: 0
Message 92 - Posted: 17 Sep 2005, 15:55:09 UTC - in response to Message 87.  
Last modified: 17 Sep 2005, 15:57:14 UTC


It will pick up on the structure where it last left off. It will give a 0% at the start but after it initializes, a few minutes to read in databases and load structures, it will continue as it was before it was stopped. It is, however, in relatively large intervals (structures, 12-15/workunit). The updated app will have the same behaviour.



yes,but the Cpu-time-counter is also reset to zero,so the claimed credit of a restarted WU is lower than a "normal"WU.
Had to reboot,Rosetta started from 0% and jumped after 1 minute to the former
33%,but cpu-Time and credit also started from 0.So after 5 hour crunching i have
a cpu-time of 50 minutes and a claimed credit of 9,65 instead of 5 hour cpu-Time
and est. 20 credits.
SZTAKI had the same behaviour at the beginning,but they fixed it very fast.

cheers

Donald
<img src='http://www.boincsynergy.com/images/stats/comb-1747.jpg'>
ID: 92 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
devn

Send message
Joined: 17 Sep 05
Posts: 18
Credit: 2,063
RAC: 0
Message 120 - Posted: 17 Sep 2005, 22:17:36 UTC

re previous post...same thing happened to me. i shut down because of powerful t/storms overhead and when i rebooted, the cpu time started from zero although the wu progress showed 40% after a couple of minutes (same as before shutdown).
ID: 120 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 121 - Posted: 17 Sep 2005, 22:25:29 UTC - in response to Message 120.  
Last modified: 17 Sep 2005, 22:32:51 UTC

Does anyone know if this is a boinc client issue or something that can be coded into the application to make sure the cpu time does not reset? I really want to make sure people get appropriate credit for their cpu time. I will increase the deadline for future workunits and if necessary, make them shorter. It should only be an issue with the windows app but that will change when I update it with an optimized version soon.

I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct?
ID: 121 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stanley A Bourdon

Send message
Joined: 17 Sep 05
Posts: 3
Credit: 112,907
RAC: 2
Message 129 - Posted: 18 Sep 2005, 1:30:12 UTC - in response to Message 121.  

I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct?


NO

I do not know on your sight but on SZTAKI when this happened the displayed CPU time, even if it was only a few seconds out of thousands, was what was reported. They were able to correct it quickly.

http://szdg.lpds.sztaki.hu/szdg/
Stanley


Boinc Wikipedia - the FAQ in active change
ID: 129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Hermes

Send message
Joined: 17 Sep 05
Posts: 2
Credit: 113,946
RAC: 0
Message 143 - Posted: 18 Sep 2005, 9:36:28 UTC - in response to Message 121.  

Does anyone know if this is a boinc client issue or something that can be coded into the application to make sure the cpu time does not reset? I really want to make sure people get appropriate credit for their cpu time.


It's an application issue, as this problem only occurs with rosetta but not with others like seti, einstein, mfold, sixtrack, hadsm, etc. I think something is missing in the boinc documentation as this is a problem mainly encountered by new projects.

I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct?


I would argue no.
To remember the cpu time at the checkpoint it needs to be saved in the client_state.xml in the active_task section under checkpoint_cpu_time . This number stays at 0.000000 througout the complete work unit, so when the application ist restartet, the amount of cpu time spent so far is read as zero.
ID: 143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 158 - Posted: 18 Sep 2005, 16:05:40 UTC - in response to Message 121.  

I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct?

When you find out, this is the type of thing that I want to capture in the Wiki in the development side. If you don't have time to make it look good, you can send me the "rough" notes of what module and so forth and I will make the changes.
ID: 158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 217 - Posted: 20 Sep 2005, 2:12:33 UTC

NOW I find out about this.

I just had to restart BOINC mgr for external reasons, and the CPU time has reset to 0:00

Since I had about 93% of this WU done after crunching since Friday afternoon on an XP-2600+, and I'm likely to only get squat for credit, it's going in the bit bucket.

After the restart, it showed 0:00 CPU time, and the % complete shows 60.0%, neither of which is correct.

Why are the Windows clients so slow?????

This project is going on reset/sspend until the problems are fixed.

Also, If you access the web site through Boinc Mgr v 4.72, isn't it automatically supposed to log you in to the web site, without having to go find the danged key?

Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that?


Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 217 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 218 - Posted: 20 Sep 2005, 2:54:21 UTC - in response to Message 217.  
Last modified: 20 Sep 2005, 2:56:41 UTC

NOW I find out about this.

I just had to restart BOINC mgr for external reasons, and the CPU time has reset to 0:00

Since I had about 93% of this WU done after crunching since Friday afternoon on an XP-2600+, and I'm likely to only get squat for credit, it's going in the bit bucket.

After the restart, it showed 0:00 CPU time, and the % complete shows 60.0%, neither of which is correct.

Why are the Windows clients so slow?????

This project is going on reset/sspend until the problems are fixed.

Also, If you access the web site through Boinc Mgr v 4.72, isn't it automatically supposed to log you in to the web site, without having to go find the danged key?

Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that?



I'm working to fix the various issues. We will have a fix soon. It's one of those things where more and more issues are creaping up and I want to address them in the updated app version. In the near future, we will upgrade the server to boinc 5+ which uses usernames and passwords. The windows app will be optimized, total cpu time will be saved after reboot/exit, and there will be a better time estimate. The % complete will still not be accurate with time because each prediction trajectory is different and there are two types of predictions per workunit, low and high-res, and there are filters that are applied that can lengthen any given workunit (not significantly, but enough to make the correspondence with time to be off). It will also still be in big steps, for this particular example. However, the percentage complete should be accurate as far as result data is concerned.
ID: 218 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 220 - Posted: 20 Sep 2005, 3:18:59 UTC

Are we talking hours, days, weeks or what for this new app?


Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RDC

Send message
Joined: 16 Sep 05
Posts: 43
Credit: 101,644
RAC: 0
Message 221 - Posted: 20 Sep 2005, 3:36:19 UTC - in response to Message 217.  

Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that?



I'm not 100% positive but I recall reading one one of the project sites that the e-mail login feature was dependent on if the project servers have the latest BOINC scripting installed. The Seti@home site always has the most current scripts while the other projects usually are lagging behind on keeping the scripts updated.

Also remember, Rosetta is a Beta project, there will be some problems


ID: 221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 222 - Posted: 20 Sep 2005, 4:09:52 UTC - in response to Message 220.  

Are we talking hours, days, weeks or what for this new app?



I will be on vacation starting the beginning of next week, for a week, so I hope to get the update available and running okay by then. Don't want to work much during vacation. Of course, it may not completely address all the issues but at least it will be a start. My priorities are to make sure the % cpu time gets checkpointed so a reboot or force quit will not reset it back to zero, and to optimize the windows app.
ID: 222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 223 - Posted: 20 Sep 2005, 4:13:56 UTC - in response to Message 221.  

Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that?



I'm not 100% positive but I recall reading one one of the project sites that the e-mail login feature was dependent on if the project servers have the latest BOINC scripting installed. The Seti@home site always has the most current scripts while the other projects usually are lagging behind on keeping the scripts updated.

Also remember, Rosetta is a Beta project, there will be some problems



There are also minor database updates for the e-mail login feature.
ID: 223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile KWSN Sir Clark

Send message
Joined: 18 Sep 05
Posts: 46
Credit: 387,432
RAC: 0
Message 227 - Posted: 20 Sep 2005, 10:17:21 UTC

Thanks for the info.

Enjoy your well-earned vacation next week.
ID: 227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile drezha

Send message
Joined: 21 Sep 05
Posts: 5
Credit: 76,513
RAC: 0
Message 268 - Posted: 21 Sep 2005, 0:36:25 UTC

I've had something like this...

It'll do work and I've done about 20 minutes work on a WU and had no prgress appear at all...
ID: 268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 287 - Posted: 21 Sep 2005, 18:03:41 UTC

I had a strange system crash on my laptop the other day while running Rosetta@home. I'm paying close attention to see if anything like it happens again. I was running Lotus Notes, MS Word, Excel, PowerPoint and Visio at the same time, so I am 99% certain that the crash was due to a Microsoft product problem rather than a BOINC or Rosetta application issue! ;)

The WU in progress did loose all of it's checkpointing information, however. When I rebooted the system and started crunching again, the WU did not recover any previous CPU time, and ended up finishing with only 4 hours of reported processing time rather than the typical 10 - 11. There was also an odd access violation error listed in the results stderr out. I've noticed that this WU was sent out to another machine that previously had a similar error.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13099

I'm not sure if this is a problem with the WU or if it's a problem due to my system crash.
ID: 287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Warning: Don't shut down BOINC Manager..!!



©2024 University of Washington
https://www.bakerlab.org