Message boards : Number crunching : Warning: Don't shut down BOINC Manager..!!
Author | Message |
---|---|
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,100,301 RAC: 139 |
Don't shut down the BOINC Manager or the WU will start over again at 0:00 Time ... This is in the first few minutes (about 10) I don't know if you let it run longer if the client_state.xml file will pick it up where you left off .. |
JimB Send message Joined: 17 Sep 05 Posts: 19 Credit: 228,111 RAC: 0 |
Don't shut down the BOINC Manager or the WU will start over again at 0:00 Time ... This is in the first few minutes (about 10) I don't know if you let it run longer if the client_state.xml file will pick it up where you left off .. I had a momentary power outage, rebooted computer. Rosetta stayed at 26.67% done when it came back up. Could be the interval it saves its' work? "Be all that you can be...considering." Harold Green |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
Too late - similar thing just happened to me after 10 hours crunching. I'll wait for the optimised Windows version before trying again. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Too late - similar thing just happened to me after 10 hours crunching. I'll wait for the optimised Windows version before trying again. It will pick up on the structure where it last left off. It will give a 0% at the start but after it initializes, a few minutes to read in databases and load structures, it will continue as it was before it was stopped. It is, however, in relatively large intervals (structures, 12-15/workunit). The updated app will have the same behaviour. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
With all projects, when you shut down and then restart, it will only start at the last checkpoint. If it takes a long time to get to the first checkpoint, well, then you will start back at zero. Since I just attached to the project, it is not clear to me how it works yet ... But, I am interested in that it seems to be another project with long run time work units. Though that may be a mistaken perspective. Hmmm, it looks like short deadlines too ... this may not be good ... :( |
[B^S] DonaldXP Send message Joined: 17 Sep 05 Posts: 1 Credit: 58,122 RAC: 0 |
yes,but the Cpu-time-counter is also reset to zero,so the claimed credit of a restarted WU is lower than a "normal"WU. Had to reboot,Rosetta started from 0% and jumped after 1 minute to the former 33%,but cpu-Time and credit also started from 0.So after 5 hour crunching i have a cpu-time of 50 minutes and a claimed credit of 9,65 instead of 5 hour cpu-Time and est. 20 credits. SZTAKI had the same behaviour at the beginning,but they fixed it very fast. cheers Donald <img src='http://www.boincsynergy.com/images/stats/comb-1747.jpg'> |
devn Send message Joined: 17 Sep 05 Posts: 18 Credit: 2,063 RAC: 0 |
re previous post...same thing happened to me. i shut down because of powerful t/storms overhead and when i rebooted, the cpu time started from zero although the wu progress showed 40% after a couple of minutes (same as before shutdown). |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Does anyone know if this is a boinc client issue or something that can be coded into the application to make sure the cpu time does not reset? I really want to make sure people get appropriate credit for their cpu time. I will increase the deadline for future workunits and if necessary, make them shorter. It should only be an issue with the windows app but that will change when I update it with an optimized version soon. I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct? |
Stanley A Bourdon Send message Joined: 17 Sep 05 Posts: 3 Credit: 112,907 RAC: 3 |
I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct? NO I do not know on your sight but on SZTAKI when this happened the displayed CPU time, even if it was only a few seconds out of thousands, was what was reported. They were able to correct it quickly. http://szdg.lpds.sztaki.hu/szdg/ Stanley Boinc Wikipedia - the FAQ in active change |
Hermes Send message Joined: 17 Sep 05 Posts: 2 Credit: 113,946 RAC: 0 |
Does anyone know if this is a boinc client issue or something that can be coded into the application to make sure the cpu time does not reset? I really want to make sure people get appropriate credit for their cpu time. It's an application issue, as this problem only occurs with rosetta but not with others like seti, einstein, mfold, sixtrack, hadsm, etc. I think something is missing in the boinc documentation as this is a problem mainly encountered by new projects. I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct? I would argue no. To remember the cpu time at the checkpoint it needs to be saved in the client_state.xml in the active_task section under checkpoint_cpu_time . This number stays at 0.000000 througout the complete work unit, so when the application ist restartet, the amount of cpu time spent so far is read as zero. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
I'm guessing the manager just displays the current cpu time since restart, but will report the total cpu time for the workunit. Does anyone know if this is correct? When you find out, this is the type of thing that I want to capture in the Wiki in the development side. If you don't have time to make it look good, you can send me the "rough" notes of what module and so forth and I will make the changes. |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
NOW I find out about this. I just had to restart BOINC mgr for external reasons, and the CPU time has reset to 0:00 Since I had about 93% of this WU done after crunching since Friday afternoon on an XP-2600+, and I'm likely to only get squat for credit, it's going in the bit bucket. After the restart, it showed 0:00 CPU time, and the % complete shows 60.0%, neither of which is correct. Why are the Windows clients so slow????? This project is going on reset/sspend until the problems are fixed. Also, If you access the web site through Boinc Mgr v 4.72, isn't it automatically supposed to log you in to the web site, without having to go find the danged key? Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that? Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
NOW I find out about this. I'm working to fix the various issues. We will have a fix soon. It's one of those things where more and more issues are creaping up and I want to address them in the updated app version. In the near future, we will upgrade the server to boinc 5+ which uses usernames and passwords. The windows app will be optimized, total cpu time will be saved after reboot/exit, and there will be a better time estimate. The % complete will still not be accurate with time because each prediction trajectory is different and there are two types of predictions per workunit, low and high-res, and there are filters that are applied that can lengthen any given workunit (not significantly, but enough to make the correspondence with time to be off). It will also still be in big steps, for this particular example. However, the percentage complete should be accurate as far as result data is concerned. |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
Are we talking hours, days, weeks or what for this new app? Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
RDC Send message Joined: 16 Sep 05 Posts: 43 Credit: 101,644 RAC: 0 |
Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that? I'm not 100% positive but I recall reading one one of the project sites that the e-mail login feature was dependent on if the project servers have the latest BOINC scripting installed. The Seti@home site always has the most current scripts while the other projects usually are lagging behind on keeping the scripts updated. Also remember, Rosetta is a Beta project, there will be some problems |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Are we talking hours, days, weeks or what for this new app? I will be on vacation starting the beginning of next week, for a week, so I hope to get the update available and running okay by then. Don't want to work much during vacation. Of course, it may not completely address all the issues but at least it will be a start. My priorities are to make sure the % cpu time gets checkpointed so a reboot or force quit will not reset it back to zero, and to optimize the windows app. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Also, I though the BOINC registrations were being keyed off email and a password now - what happened to that? There are also minor database updates for the e-mail login feature. |
KWSN Sir Clark Send message Joined: 18 Sep 05 Posts: 46 Credit: 387,432 RAC: 0 |
Thanks for the info. Enjoy your well-earned vacation next week. |
drezha Send message Joined: 21 Sep 05 Posts: 5 Credit: 76,513 RAC: 0 |
I've had something like this... It'll do work and I've done about 20 minutes work on a WU and had no prgress appear at all... |
Divide Overflow Send message Joined: 17 Sep 05 Posts: 82 Credit: 921,382 RAC: 0 |
I had a strange system crash on my laptop the other day while running Rosetta@home. I'm paying close attention to see if anything like it happens again. I was running Lotus Notes, MS Word, Excel, PowerPoint and Visio at the same time, so I am 99% certain that the crash was due to a Microsoft product problem rather than a BOINC or Rosetta application issue! ;) The WU in progress did loose all of it's checkpointing information, however. When I rebooted the system and started crunching again, the WU did not recover any previous CPU time, and ended up finishing with only 4 hours of reported processing time rather than the typical 10 - 11. There was also an odd access violation error listed in the results stderr out. I've noticed that this WU was sent out to another machine that previously had a similar error. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13099 I'm not sure if this is a problem with the WU or if it's a problem due to my system crash. |
Message boards :
Number crunching :
Warning: Don't shut down BOINC Manager..!!
©2024 University of Washington
https://www.bakerlab.org