BOINC not requesting work

Message boards : Number crunching : BOINC not requesting work

To post messages, you must log in.

AuthorMessage
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 527
Message 101324 - Posted: 16 Apr 2021, 18:54:58 UTC

1 PC will not request work ,even if other waiting units and 1 executing suspended.
BOINC 7.16.6 on Ubuntu 20.04 newly built.
WCG CPU units working OK.
Can anyone help interpret and debug.
Log extract with work_request_debug on below:

13412 Rosetta@home 16/04/2021 17:42:21 work fetch resumed by user
13413 16/04/2021 17:42:21 [work_fetch] Request work fetch: project work fetch resumed by user
13414 16/04/2021 17:42:22 choose_project(): 1618591342.513777
13415 16/04/2021 17:42:22 [work_fetch] ------- start work fetch state -------
13416 16/04/2021 17:42:22 [work_fetch] target work buffer: 103680.00 + 8640.00 sec
13417 16/04/2021 17:42:22 [work_fetch] --- project states ---
13418 Rosetta@home 16/04/2021 17:42:22 [work_fetch] REC 0.000 prio -0.000 can request work
13419 Einstein@Home 16/04/2021 17:42:22 [work_fetch] REC 0.168 prio -0.000 can't request work: "no new tasks" requested via Manager
13420 World Community Grid 16/04/2021 17:42:22 [work_fetch] REC 197.084 prio -24.357 can't request work: scheduler RPC backoff (106.08 sec)
13421 WUProp@Home 16/04/2021 17:42:22 [work_fetch] REC 0.001 prio -0.000 can't request work: non CPU intensive
13422 16/04/2021 17:42:22 [work_fetch] --- state for CPU ---
13423 16/04/2021 17:42:22 [work_fetch] shortfall 272807.27 nidle 0.00 saturated 38425.40 busy 0.00
13424 Rosetta@home 16/04/2021 17:42:22 [work_fetch] share 1.000
13425 Einstein@Home 16/04/2021 17:42:22 [work_fetch] share 0.000
13426 World Community Grid 16/04/2021 17:42:22 [work_fetch] share 0.000
13427 16/04/2021 17:42:22 [work_fetch] --- state for Intel GPU ---
13428 16/04/2021 17:42:22 [work_fetch] shortfall 112320.00 nidle 1.00 saturated 0.00 busy 0.00
13429 Rosetta@home 16/04/2021 17:42:22 [work_fetch] share 0.000 no applications
13430 Einstein@Home 16/04/2021 17:42:22 [work_fetch] share 0.000
13431 World Community Grid 16/04/2021 17:42:22 [work_fetch] share 0.000
13432 16/04/2021 17:42:22 [work_fetch] ------- end work fetch state -------
13433 Rosetta@home 16/04/2021 17:42:22 choose_project: scanning
13434 Rosetta@home 16/04/2021 17:42:22 can fetch CPU
13435 Rosetta@home 16/04/2021 17:42:22 can't fetch Intel GPU: no applications
13436 Einstein@Home 16/04/2021 17:42:22 choose_project: scanning
13437 Einstein@Home 16/04/2021 17:42:22 skip: "no new tasks" requested via Manager
13438 WUProp@Home 16/04/2021 17:42:22 choose_project: scanning
13439 WUProp@Home 16/04/2021 17:42:22 skip: non CPU intensive
13440 World Community Grid 16/04/2021 17:42:22 choose_project: scanning
13441 World Community Grid 16/04/2021 17:42:22 skip: scheduler RPC backoff
13442 16/04/2021 17:42:22 [work_fetch] No project chosen for work fetch
13443 Rosetta@home 16/04/2021 17:42:25 update requested by user
13444 16/04/2021 17:42:25 [work_fetch] Request work fetch: project updated by user
13445 Rosetta@home 16/04/2021 17:42:27 piggyback_work_request()
13446 16/04/2021 17:42:27 [work_fetch] ------- start work fetch state -------
13447 16/04/2021 17:42:27 [work_fetch] target work buffer: 103680.00 + 8640.00 sec
13448 16/04/2021 17:42:27 [work_fetch] --- project states ---
13449 Rosetta@home 16/04/2021 17:42:27 [work_fetch] REC 0.000 prio -0.000 can request work
13450 Einstein@Home 16/04/2021 17:42:27 [work_fetch] REC 0.168 prio -0.000 can't request work: "no new tasks" requested via Manager
13451 World Community Grid 16/04/2021 17:42:27 [work_fetch] REC 197.084 prio -24.357 can't request work: scheduler RPC backoff (101.04 sec)
13452 WUProp@Home 16/04/2021 17:42:27 [work_fetch] REC 0.001 prio -0.000 can't request work: non CPU intensive
13453 16/04/2021 17:42:27 [work_fetch] --- state for CPU ---
13454 16/04/2021 17:42:27 [work_fetch] shortfall 272837.98 nidle 0.00 saturated 38420.16 busy 0.00
13455 Rosetta@home 16/04/2021 17:42:27 [work_fetch] share 1.000
13456 Einstein@Home 16/04/2021 17:42:27 [work_fetch] share 0.000
13457 World Community Grid 16/04/2021 17:42:27 [work_fetch] share 0.000
13458 16/04/2021 17:42:27 [work_fetch] --- state for Intel GPU ---
13459 16/04/2021 17:42:27 [work_fetch] shortfall 112320.00 nidle 1.00 saturated 0.00 busy 0.00
13460 Rosetta@home 16/04/2021 17:42:27 [work_fetch] share 0.000 no applications
13461 Einstein@Home 16/04/2021 17:42:27 [work_fetch] share 0.000
13462 World Community Grid 16/04/2021 17:42:27 [work_fetch] share 0.000
13463 16/04/2021 17:42:27 [work_fetch] ------- end work fetch state -------
13464 Rosetta@home 16/04/2021 17:42:27 piggyback: resource CPU
13465 Rosetta@home 16/04/2021 17:42:27 [work_fetch] using MC shortfall 0.000000 instead of shortfall 272837.976021
13466 Rosetta@home 16/04/2021 17:42:27 [work_fetch] set_request() for CPU: ninst 4 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 0.00 req_secs 0.00
13467 Rosetta@home 16/04/2021 17:42:27 piggyback: resource Intel GPU
13468 Rosetta@home 16/04/2021 17:42:27 piggyback: can't fetch Intel GPU: no applications
13469 Rosetta@home 16/04/2021 17:42:27 [work_fetch] request: CPU (0.00 sec, 0.00 inst) Intel GPU (0.00 sec, 0.00 inst)
13470 Rosetta@home 16/04/2021 17:42:27 Sending scheduler request: Requested by user.
13471 Rosetta@home 16/04/2021 17:42:27 Not requesting tasks: don't need (CPU: ; Intel GPU: )
13472 Rosetta@home 16/04/2021 17:42:30 Scheduler request completed
13473 Rosetta@home 16/04/2021 17:42:30 Project requested delay of 31 seconds
13474 16/04/2021 17:42:30 [work_fetch] Request work fetch: RPC complete
13475 Rosetta@home 16/04/2021 17:42:30 work fetch suspended by user

Paul.
ID: 101324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,534,243
RAC: 9,744
Message 101328 - Posted: 16 Apr 2021, 20:01:50 UTC - in response to Message 101324.  

1 PC will not request work ,even if other waiting units and 1 executing suspended.
BOINC 7.16.6 on Ubuntu 20.04 newly built.
WCG CPU units working OK.
Can anyone help interpret and debug.
Log extract with work_request_debug on below:

13412 Rosetta@home 16/04/2021 17:42:21 work fetch resumed by user
13413 16/04/2021 17:42:21 [work_fetch] Request work fetch: project work fetch resumed by user
13414 16/04/2021 17:42:22 choose_project(): 1618591342.513777
13415 16/04/2021 17:42:22 [work_fetch] ------- start work fetch state -------
13416 16/04/2021 17:42:22 [work_fetch] target work buffer: 103680.00 + 8640.00 sec
13417 16/04/2021 17:42:22 [work_fetch] --- project states ---
13418 Rosetta@home 16/04/2021 17:42:22 [work_fetch] REC 0.000 prio -0.000 can request work
13419 Einstein@Home 16/04/2021 17:42:22 [work_fetch] REC 0.168 prio -0.000 can't request work: "no new tasks" requested via Manager
13420 World Community Grid 16/04/2021 17:42:22 [work_fetch] REC 197.084 prio -24.357 can't request work: scheduler RPC backoff (106.08 sec)
13421 WUProp@Home 16/04/2021 17:42:22 [work_fetch] REC 0.001 prio -0.000 can't request work: non CPU intensive
13422 16/04/2021 17:42:22 [work_fetch] --- state for CPU ---
13423 16/04/2021 17:42:22 [work_fetch] shortfall 272807.27 nidle 0.00 saturated 38425.40 busy 0.00
13424 Rosetta@home 16/04/2021 17:42:22 [work_fetch] share 1.000
13425 Einstein@Home 16/04/2021 17:42:22 [work_fetch] share 0.000
13426 World Community Grid 16/04/2021 17:42:22 [work_fetch] share 0.000
13427 16/04/2021 17:42:22 [work_fetch] --- state for Intel GPU ---
13428 16/04/2021 17:42:22 [work_fetch] shortfall 112320.00 nidle 1.00 saturated 0.00 busy 0.00
13429 Rosetta@home 16/04/2021 17:42:22 [work_fetch] share 0.000 no applications
13430 Einstein@Home 16/04/2021 17:42:22 [work_fetch] share 0.000
13431 World Community Grid 16/04/2021 17:42:22 [work_fetch] share 0.000
13432 16/04/2021 17:42:22 [work_fetch] ------- end work fetch state -------
13433 Rosetta@home 16/04/2021 17:42:22 choose_project: scanning
13434 Rosetta@home 16/04/2021 17:42:22 can fetch CPU
13435 Rosetta@home 16/04/2021 17:42:22 can't fetch Intel GPU: no applications
13436 Einstein@Home 16/04/2021 17:42:22 choose_project: scanning
13437 Einstein@Home 16/04/2021 17:42:22 skip: "no new tasks" requested via Manager
13438 WUProp@Home 16/04/2021 17:42:22 choose_project: scanning
13439 WUProp@Home 16/04/2021 17:42:22 skip: non CPU intensive
13440 World Community Grid 16/04/2021 17:42:22 choose_project: scanning
13441 World Community Grid 16/04/2021 17:42:22 skip: scheduler RPC backoff
13442 16/04/2021 17:42:22 [work_fetch] No project chosen for work fetch
13443 Rosetta@home 16/04/2021 17:42:25 update requested by user
13444 16/04/2021 17:42:25 [work_fetch] Request work fetch: project updated by user
13445 Rosetta@home 16/04/2021 17:42:27 piggyback_work_request()
13446 16/04/2021 17:42:27 [work_fetch] ------- start work fetch state -------
13447 16/04/2021 17:42:27 [work_fetch] target work buffer: 103680.00 + 8640.00 sec
13448 16/04/2021 17:42:27 [work_fetch] --- project states ---
13449 Rosetta@home 16/04/2021 17:42:27 [work_fetch] REC 0.000 prio -0.000 can request work
13450 Einstein@Home 16/04/2021 17:42:27 [work_fetch] REC 0.168 prio -0.000 can't request work: "no new tasks" requested via Manager
13451 World Community Grid 16/04/2021 17:42:27 [work_fetch] REC 197.084 prio -24.357 can't request work: scheduler RPC backoff (101.04 sec)
13452 WUProp@Home 16/04/2021 17:42:27 [work_fetch] REC 0.001 prio -0.000 can't request work: non CPU intensive
13453 16/04/2021 17:42:27 [work_fetch] --- state for CPU ---
13454 16/04/2021 17:42:27 [work_fetch] shortfall 272837.98 nidle 0.00 saturated 38420.16 busy 0.00
13455 Rosetta@home 16/04/2021 17:42:27 [work_fetch] share 1.000
13456 Einstein@Home 16/04/2021 17:42:27 [work_fetch] share 0.000
13457 World Community Grid 16/04/2021 17:42:27 [work_fetch] share 0.000
13458 16/04/2021 17:42:27 [work_fetch] --- state for Intel GPU ---
13459 16/04/2021 17:42:27 [work_fetch] shortfall 112320.00 nidle 1.00 saturated 0.00 busy 0.00
13460 Rosetta@home 16/04/2021 17:42:27 [work_fetch] share 0.000 no applications
13461 Einstein@Home 16/04/2021 17:42:27 [work_fetch] share 0.000
13462 World Community Grid 16/04/2021 17:42:27 [work_fetch] share 0.000
13463 16/04/2021 17:42:27 [work_fetch] ------- end work fetch state -------
13464 Rosetta@home 16/04/2021 17:42:27 piggyback: resource CPU
13465 Rosetta@home 16/04/2021 17:42:27 [work_fetch] using MC shortfall 0.000000 instead of shortfall 272837.976021
13466 Rosetta@home 16/04/2021 17:42:27 [work_fetch] set_request() for CPU: ninst 4 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 0.00 req_secs 0.00
13467 Rosetta@home 16/04/2021 17:42:27 piggyback: resource Intel GPU
13468 Rosetta@home 16/04/2021 17:42:27 piggyback: can't fetch Intel GPU: no applications
13469 Rosetta@home 16/04/2021 17:42:27 [work_fetch] request: CPU (0.00 sec, 0.00 inst) Intel GPU (0.00 sec, 0.00 inst)
13470 Rosetta@home 16/04/2021 17:42:27 Sending scheduler request: Requested by user.
13471 Rosetta@home 16/04/2021 17:42:27 Not requesting tasks: don't need (CPU: ; Intel GPU: )
13472 Rosetta@home 16/04/2021 17:42:30 Scheduler request completed
13473 Rosetta@home 16/04/2021 17:42:30 Project requested delay of 31 seconds
13474 16/04/2021 17:42:30 [work_fetch] Request work fetch: RPC complete
13475 Rosetta@home 16/04/2021 17:42:30 work fetch suspended by user

Paul.


A couple of things first your cache size on the pc come into play which that list says nothing about ie the 'store at least___days of work' and the 'store up to an additional ___days of work' if you already have enough work from your other projects then Rosetta can't send you any more work or your cache would overfill. The other thing is you seem to be using a zero resource share for alot of projects and a ONE for Rosetta, there's not a whole of wiggle room there so I would bump up Rosetta to 100 and then you should get Rosetta work almost everytime they have work for your pc again depending on your cache sizes.
ID: 101328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 527
Message 101330 - Posted: 16 Apr 2021, 20:54:20 UTC - in response to Message 101328.  

Thanks Mikey.
Work share is 900 WCG and 100 Rosetta with 1.2 plus .1 days work.
Currently about .5 day loaded as WCG units limited by number to 8.
Also I tried suspending waiting WCG units and 1 running unit and still not requesting.

Paul.
ID: 101330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1467
Credit: 14,313,891
RAC: 16,169
Message 101331 - Posted: 16 Apr 2021, 21:04:17 UTC - in response to Message 101324.  

1 PC will not request work ,even if other waiting units and 1 executing suspended.
BOINC will not request work for a project when a Task is suspended because it has no way of knowing when it will be unsuspended, and no way of knowing if that Task can be returned in time, let alone any new work.

With the number of projects you are running, the large size of your cache, and the limited number of cores/threads your systems have, and the Resource share settings you are using, it's very unlikely that Rosetta will be running on your systems at all times.

With the number of projects you are attached to you would be better off with no cache at all. I'd suggest 0.01 days and 0.0 additional days.
If you wish to do more Rosetta work, you need to increase it's Resource Share (from memory the largest possible value is 1000), or reduce the Resource Share of the others (WCG doesn't follow the usual BOINC method for adjusting this, so i've no idea how you would accomplish that). Keep in mind Resource share is a ratio- not a percentage.



The other problem is a Rosetta one- for some time now new Tasks have been misconfigured to require way more RAM than they actually need to run, resulting in many system with smaller amounts of system RAM no longer being able to get work, or only being able to get a limited number of Tasks, even if they have plenty of available cores/threads. Several of your systems fall in to this category. Increasing the amount of RAM available to BOINC may help.
In your account, Computing preferences, Memory
         When computer is in use, use at most 95 %
     When computer is not in use, use at most 95 %
Leave non-GPU tasks in memory while suspended Leave unselected.
                  Page/swap file: use at most 75 %
But until the researchers sending out these mis-configured Tasks fix it at their end, it's going to remain a problem.



Also your most recently added system needs to have the BOINC Benchmarks run in order for it to receive a reasonable amount of Credit for work done; it's still using the default values.
Grant
Darwin NT
ID: 101331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 527
Message 101334 - Posted: 16 Apr 2021, 21:16:12 UTC - in response to Message 101331.  

Thanks Grant.
Only tasks I have are 8 WCG, 4 running so I tried suspending those to force Rosetta to fetch.
System has been running for several days, just forced a benchmark to be sure, still won't request.
Other systems request and get some units but often see memory or disk message as modest PCs.

Paul.
ID: 101334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1467
Credit: 14,313,891
RAC: 16,169
Message 101335 - Posted: 16 Apr 2021, 21:29:58 UTC - in response to Message 101334.  

Only tasks I have are 8 WCG, 4 running so I tried suspending those to force Rosetta to fetch.
System has been running for several days, just forced a benchmark to be sure, still won't request.
The default Runtime for Rosetta work is 8 hours, the deadlines are 3 days. With that many WCG tasks loaded up, with only 4 cores/threads, it's not going to request more work until it can be sure it will be able to return it in time, and it won't request more Seti work if has to complete more work for other projects in order to meet you Resource Share settings. All it is doing is trying to meet the the settings you have made.
The smaller your cache, and the less projects you run, the sooner your Resource Share settings will be met (as in weeks). The larger the cache & the more projects, the longer it will take (as in months).

Micromanaging this actually makes them worse. Set your cache to zero, change you Resource Share settings to favour Rosetta, then once it has cleared the present backlog of WCG work it will (when it can) start doing more Rosetta work. Just let it do it's thing once you have set your preferences. The fact that there are issues with the configured requirements for Rosetta Tasks at present is just gong to result in less Rosetta work being done at times, then times when it's mostly Rosetta work.
But over time, your Resource Share settings will be met. But whether that time frame is weeks or many months, depends on the settings you choose & whether or you you micromanage things.
Grant
Darwin NT
ID: 101335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 527
Message 101337 - Posted: 16 Apr 2021, 22:02:21 UTC - in response to Message 101335.  

Sorted.
I had edited app_config to limit Rosetta to 1 task running with BoincTasks.
This was OK on earlier versions of BOINC but this version silently failed.

Paul.
ID: 101337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1467
Credit: 14,313,891
RAC: 16,169
Message 101339 - Posted: 16 Apr 2021, 23:03:28 UTC - in response to Message 101337.  
Last modified: 16 Apr 2021, 23:07:00 UTC

Sorted.
I had edited app_config to limit Rosetta to 1 task running with BoincTasks.
Yeah, i guess that would do it.
Although i can't see any sign in the work request logs where it mentions such a limit. The only thing is this line
13465 Rosetta@home 16/04/2021 17:42:27 [work_fetch] using MC shortfall 0.000000 instead of shortfall 272837.976021
Where it replaces the 272837.976021 shortfall with a value of 0, ie no shortfall. So no need for new work.
Grant
Darwin NT
ID: 101339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 527
Message 101371 - Posted: 19 Apr 2021, 19:21:31 UTC - in response to Message 101339.  

It was not clear to me in the logs but checking showed app_config entries with GPU sections added and mangled.
That was OK in previous BOINC versions for CPU projects but bad format caused 7.16.6 to not request work.
ID: 101371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : BOINC not requesting work



©2024 University of Washington
https://www.bakerlab.org