WUs estimated time way off to elapsed time

Message boards : Number crunching : WUs estimated time way off to elapsed time

To post messages, you must log in.

AuthorMessage
San-Fernando-Valley

Send message
Joined: 16 Mar 16
Posts: 6
Credit: 117,897
RAC: 0
Message 86810 - Posted: 14 Jul 2017, 6:57:58 UTC

Very annoying:

Estimated time at start for each WU is stated as about 4 hours.
BUT actual elapsed time is around 1 DAY (24 hours) !!!!

On all my six different PCs for all WU done in the last days.

ABSOLUTELY unacceptable.

I'm sure someone can inform me of what I'm doing wrong OR perhaps misunderstanding (on this project).
"Not everything that can be counted counts, and not everything that counts can be counted."
- Albert Einstein (1879-1955)
ID: 86810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator
Project administrator

Send message
Joined: 22 Aug 06
Posts: 3538
Credit: 0
RAC: 0
Message 86812 - Posted: 14 Jul 2017, 15:39:56 UTC

It sounds like you recently changed your R@h runtime preference to 1 day. This is set from the website by clicking on your user name at the upper right corner, and selecting the "Rosetta@home preferences" link. There is a preference for each location (default, home, work, school). You can see which location a given host is associated with by looking at the event log as BOINC Manager starts up.

So, either the preference changed, or perhaps the location associated with the host has changed to one with a different runtime preference. Either way, BOINC Manager will learn how long the tasks are taking to complete, and adjust in a couple of days. During the time that the estimates are off, the BOINC Manager's request for new work are off as well. It requests work based on it's current estimated time remaining. To the extent this does not agree with your actual runtimes, the work requests will be off.
Rosetta Moderator: Mod.Sense
ID: 86812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 86936 - Posted: 1 Aug 2017, 12:41:14 UTC

I had a number of very long running tasks I recently had to abort. Several jobs were over 24 hours and others we 3% at 12 hours.

These workunits are running on a server dedicated to Rosetta@Home running Linux.

Any ideas?

Task WorkUnit
931190869 840070369
930920244 839826801
930919923 839826521
930912635 839819841
930986536 839886162
Thx!

Paul

ID: 86936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator
Project administrator

Send message
Joined: 22 Aug 06
Posts: 3538
Credit: 0
RAC: 0
Message 86939 - Posted: 1 Aug 2017, 15:07:28 UTC - in response to Message 86936.  

I had a number of very long running tasks I recently had to abort. Several jobs were over 24 hours and others we 3% at 12 hours.

These workunits are running on a server dedicated to Rosetta@Home running Linux.

Any ideas?

Task WorkUnit
931190869 840070369
930920244 839826801
930919923 839826521
930912635 839819841
930986536 839886162


Paul, I'm not certain if the BOINC Manager has the same quirk on Linux that it does on Windows. Sometimes on Windows it will show tasks with a state of "running", but they are not actually getting CPU dispatched to them. You could see that using top or other utility to show current most active tasks on your machine, or given the number of CPUs, perhaps easier would be to review the task properties for actual CPU time and confirm the number reported in increasing.

If you see some tasks running long and confirm they are not getting CPU time, ending and restarting BOINC Manager seems to reset things. Unfortunately, as with any time you end BOINC Manager and it's tasks, you lose work done since last checkpoint on each active task.

If the tasks are getting CPU time and running long, the "watch-dog" should step in and mark them as completed once they go 4 hours (CPU hours, not wall-clock hours) passed your runtime preference.

Notes: It appears Paul's runtime preference is set to 8 hours for this host.
Rosetta Moderator: Mod.Sense
ID: 86939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 86980 - Posted: 5 Aug 2017, 13:07:20 UTC - in response to Message 86939.  

Is BOINC Manager required?? I have BOINC Client set to launch at startup. If I close BOINC Manager and let the tasks continue to run will I avoid the bug? Can BOINC Client send & receive tasks without BOINC Manager??

Just looking for a workaround

Thx
Thx!

Paul

ID: 86980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator
Project administrator

Send message
Joined: 22 Aug 06
Posts: 3538
Credit: 0
RAC: 0
Message 86989 - Posted: 6 Aug 2017, 5:12:36 UTC - in response to Message 86980.  

Is BOINC Manager required?? I have BOINC Client set to launch at startup. If I close BOINC Manager and let the tasks continue to run will I avoid the bug? Can BOINC Client send & receive tasks without BOINC Manager??

Just looking for a workaround

Thx


Well, I guess you're right, technically all of these logistics are handled in the client, and the manager just presents them and then interacts with the client to effect changes you specify (to suspend a task, perform a scheduler request etc.). So, unfortunately, no, I would not expect any change by not using the manager.
Rosetta Moderator: Mod.Sense
ID: 86989 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 86997 - Posted: 6 Aug 2017, 21:00:25 UTC

All

I really need some help here. I have at least 5 cores that are idle because of this bug. I have never experienced this in the past. Is there a version of BOINC that does not have this bug? I hate to wait for 8 hours & abort tasks but I can restart BOINC every 8 hours.

Please help
Thx!

Paul

ID: 86997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator
Project administrator

Send message
Joined: 22 Aug 06
Posts: 3538
Credit: 0
RAC: 0
Message 86999 - Posted: 6 Aug 2017, 22:45:03 UTC - in response to Message 86997.  

All

I really need some help here. I have at least 5 cores that are idle because of this bug. I have never experienced this in the past. Is there a version of BOINC that does not have this bug? I hate to wait for 8 hours & abort tasks but I can restart BOINC every 8 hours.

Please help


I hadn't remembered until now, but I believe, at least on Windows, the issue where BOINC shows tasks are running but is not giving them CPU is related to the preference for % of CPU. There are two similar preferences there. One is the preference for the number of CPUs to utilize, and the other is the preference for the percentage of CPU time to be running tasks.

From what we've seen before, it only seems to be a problem when you use the preference for the percentage of CPU. So the workaround would be to set this to use up to 100% of CPU. And instead, if needed by your environment, use less than 100% of the number of CPUs.
Rosetta Moderator: Mod.Sense
ID: 86999 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 87010 - Posted: 8 Aug 2017, 11:53:02 UTC

I think I got it fixed. I aborted several tasks that ran over but I also killed a few instances of minirosetta running but using 0% CPU. When I counted the minirosetta tasks I found 50 of them but I only have 48 cores. I killed the 2 tasks at 0%. So far everything has been back to normal for about 48 hours.

Wish I knew exactly what fixed it
Thx!

Paul

ID: 87010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 87065 - Posted: 13 Aug 2017, 11:58:19 UTC - in response to Message 86999.  

I can't find any information on this bug and the problem is back. Can you tell me how to get this reported to the BOINC team?

In all the years I have run BOINC I have never seen this before. It is a dedicated cruncher so I have it set to 100% CPU utilization & 100% CPUs.
Thx!

Paul

ID: 87065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator
Project administrator

Send message
Joined: 22 Aug 06
Posts: 3538
Credit: 0
RAC: 0
Message 87069 - Posted: 13 Aug 2017, 20:30:05 UTC

Specifically, the bug I've seen is where the R@h tasks do not get CPU time. From your description of long run-time, that is one explanation. Another is that the specific R@h tasks is running long. This has been seen on some recent protocols being developed.

So, if indeed that tasks are not getting CPU time, you could post on the message boards for BOINC here:
https://boinc.berkeley.edu/dev/forum_forum.php?id=2
Rosetta Moderator: Mod.Sense
ID: 87069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
furukitsune

Send message
Joined: 19 Mar 16
Posts: 6
Credit: 2,784,657
RAC: 2,228
Message 87097 - Posted: 18 Aug 2017, 4:04:54 UTC

When I recently upgraded to boinc ver. 7.6.33, I started seeing the same problems.
Clients using 0% cpu, tasks never finishing, etc. I reverted back to boinc ver 7.2.42 and all problems disappeared. You can see cpu usage in
windows resource monitor, memory tab. I did have a problem with cc_config not being reset when I re-installed boinc, so do a clean install.

fk
ID: 87097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 299
Credit: 8,792,031
RAC: 18,266
Message 87100 - Posted: 18 Aug 2017, 13:46:41 UTC - in response to Message 86810.  
Last modified: 18 Aug 2017, 13:47:22 UTC

Estimated time at start for each WU is stated as about 4 hours.
BUT actual elapsed time is around 1 DAY (24 hours) !!!!

They are always inaccurate when you start, but eventually correct themselves.
To speed up this correction, you can use an app_config.xml file, placed in the Rosetta project folder.

It looks like this:

<app_config> 
<app> 
  <name>minirosetta</name> 
  <fraction_done_exact/> 
</app> 
</app_config>

I assume you are familiar with the app_config file. If not, just create it in notepad, and save it as an "app_config.xml" file
(not as a .txt file).
ID: 87100 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 87104 - Posted: 19 Aug 2017, 12:34:55 UTC - in response to Message 87097.  

It sounds like 7.2.42 is the fix I need. I am running on Linux so I know how to apt-get but I don't know how to ask for an older version. Is there a way to apt-get a specific version? How do I tell software updates that I don't want updates on that app?
Thx!

Paul

ID: 87104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 58,937,270
RAC: 22,532
Message 87105 - Posted: 19 Aug 2017, 12:36:27 UTC - in response to Message 87100.  

When I look at the processes, they are getting 0% CPU so I think it is a bug in BOINC. Maybe it will be fixed soon.
Thx!

Paul

ID: 87105 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LarryMajor

Send message
Joined: 1 Apr 16
Posts: 21
Credit: 27,212,959
RAC: 88,702
Message 87106 - Posted: 19 Aug 2017, 16:08:44 UTC - in response to Message 87104.  

apt-get install package_name=version
is the syntax for it. I'm running Debian, and apt only wants to use 7.6.33 of boinc. boinc-client and boinc manager after I upgraded to kernel 4.9.0 - if that's any help.

When you find the versions and packages that work for you -
apt-mark hold package_name
should keep the system from upgrading them.

My machine is very similar to yours and it's been on 7.6.33 for close to a month with no problems, so if there's anything that I can check on it that might help, let me know.
ID: 87106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : WUs estimated time way off to elapsed time



©2019 University of Washington
http://www.bakerlab.org