Rosetta 4.0+

Message boards : Number crunching : Rosetta 4.0+

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19

AuthorMessage
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 22 Aug 06
Posts: 25
Credit: 1,011,580
RAC: 0
Message 96265 - Posted: 8 May 2020, 14:52:17 UTC - in response to Message 96162.  
Last modified: 8 May 2020, 14:53:41 UTC

As expected it canceled hundreds of tasks.
The cache instructions seem to be followed, I don't have hundreds of tasks anymore.

I'm afraid I spoke too soon : it started to request to many tasks again without changing anything to my small cache, i still have 120 waiting to run and it already canceled 160 again because of the deadline in the past few days... with only 17 valid tasks in the log...

So it is still requesting tasks way above the cache setting :(
ID: 96265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 522
Credit: 31,711,375
RAC: 106,032
Message 96267 - Posted: 8 May 2020, 16:05:39 UTC - in response to Message 96265.  

I'm afraid I spoke too soon : it started to request to many tasks again without changing anything to my small cache, i still have 120 waiting to run

It looks like that is your machine with BOINC 7.16.6. I had the same problem on WCG after I upgraded BOINC from 7.14.2 to the next version, whatever it was. It went berserk and downloaded work units until it reached the 10 day limit (or got exhausted, whichever came first). I ended up with hundreds of work units.

It is apparently due to a change in the BOINC scheduler. But the servers don't necessarily know how to deal with it, at least until they "learn". I posted about it on the WCG forum a few months ago.
I never found a good solution, except to manually control the downloads. After a while, it starts working again. Good luck.
ID: 96267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 20
Credit: 1,256,443
RAC: 0
Message 96268 - Posted: 8 May 2020, 16:11:45 UTC - in response to Message 96265.  

Try exiting boinc and all tasks. Edit your Boinc preferences on the project to use 8 cpus out of 24. Roughly 33%. Start up Boinc.
ID: 96268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 755
Credit: 5,317,037
RAC: 22,026
Message 96278 - Posted: 8 May 2020, 20:00:21 UTC - in response to Message 96268.  
Last modified: 8 May 2020, 20:02:42 UTC

Try exiting boinc and all tasks. Edit your Boinc preferences on the project to use 8 cpus out of 24. Roughly 33%. Start up Boinc.
No need to exit BOINC to do that, just make the changes on the Web site, Update them. Then the next time the BOINC Manager contacts the server (or you click on update) it will get the new settings.



Having said that, hopefully this will be noticed by those that can do something, so they can check their changes- it shouldn't be occurring with the latest work allocation changes,
Grant
Darwin NT
ID: 96278 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 159
Credit: 4,894,900
RAC: 9,183
Message 96279 - Posted: 8 May 2020, 20:44:00 UTC - in response to Message 96267.  

I'm afraid I spoke too soon : it started to request to many tasks again without changing anything to my small cache, i still have 120 waiting to run

It looks like that is your machine with BOINC 7.16.6. I had the same problem on WCG after I upgraded BOINC from 7.14.2 to the next version, whatever it was. It went berserk and downloaded work units until it reached the 10 day limit (or got exhausted, whichever came first). I ended up with hundreds of work units.

It is apparently due to a change in the BOINC scheduler. But the servers don't necessarily know how to deal with it, at least until they "learn". I posted about it on the WCG forum a few months ago.
I never found a good solution, except to manually control the downloads. After a while, it starts working again. Good luck.


I changed the 10 day limit in the cc_config file down to 1 day because I didn’t like the was one project would run away with the machine after not having WUs for a while. I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is.
ID: 96279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 522
Credit: 31,711,375
RAC: 106,032
Message 96280 - Posted: 8 May 2020, 21:27:39 UTC - in response to Message 96279.  

I changed the 10 day limit in the cc_config file down to 1 day because I didn’t like the was one project would run away with the machine after not having WUs for a while. I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is.

Mine was set for the default (0.1 + 0.5 days). It ignored that. But it seems to have finally collapsed once it reached the 10-day limit, which maybe is part of the server code.
ID: 96280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 755
Credit: 5,317,037
RAC: 22,026
Message 96283 - Posted: 9 May 2020, 1:06:27 UTC - in response to Message 96279.  

I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is.
The larger the cache, the longer it takes to determine how long different Tasks on different applications on different projects run for. Until it sorts that out, there's no way it can meet your Resource share settings.
The smaller the cache, the sooner it can get things sorted.
Grant
Darwin NT
ID: 96283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 159
Credit: 4,894,900
RAC: 9,183
Message 96288 - Posted: 9 May 2020, 3:33:05 UTC - in response to Message 96280.  

I changed the 10 day limit in the cc_config file down to 1 day because I didn’t like the was one project would run away with the machine after not having WUs for a while. I’m not certain that this also controls the time it takes to learn a machine’s throughput but I suspect it is.

Mine was set for the default (0.1 + 0.5 days). It ignored that. But it seems to have finally collapsed once it reached the 10-day limit, which maybe is part of the server code.


Not the buffer size :-

<rec_half_life_days>X</rec_half_life_days>
A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs.
ID: 96288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 22 Aug 06
Posts: 25
Credit: 1,011,580
RAC: 0
Message 96301 - Posted: 9 May 2020, 11:57:06 UTC

@Jonathan & Grant : " Edit your Boinc preferences on the project to use 8 cpus out of 24. Roughly 33%"

I obviously don't want to do this, I want the 24 cores to be used, not only 8 out of 24 (I wouldn't rent such a host in that case).

I limit rosetta via an app_config to 6 now (I found out even 8 was too much for the 8 GB of the machine...) and all the rest is crunching with universe tasks at the moment. I suspect this might be the reason why the rosetta cache is too big, maybe it actually calculates a required number with 24 cores and not 8 or 6 ? but still, with the very small cache I have set it doesn't make much sense.

But I assume it will self-regulate after some time, now it has 118 on-going tasks (and 95 recently canceled for deadline), this is much less than the 1000 I had at the very beginning (when I had a bigger cache). And anyway it is a standard boinc behavior to cancel unprocessed tasks at the deadline, so let it be.
ID: 96301 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 522
Credit: 31,711,375
RAC: 106,032
Message 96305 - Posted: 9 May 2020, 12:38:17 UTC - in response to Message 96288.  

Not the buffer size :-

<rec_half_life_days>X</rec_half_life_days>
A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs.

OK, I see what you are saying, but I am not sure why you set that larger. I want the estimated time to converge faster.
So I routinely set my mine as follows when installing BOINC:
<rec_half_life_days>1.000000</rec_half_life_days>

That was not the source of my problem. It was some incompatibility between the new BOINC (after 7.14.2) and the server.
It worked OK on some projects, and not others. I have not seen the problem for a while now, so it eventually corrects itself.
ID: 96305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 159
Credit: 4,894,900
RAC: 9,183
Message 96309 - Posted: 9 May 2020, 14:39:33 UTC - in response to Message 96305.  
Last modified: 9 May 2020, 14:40:46 UTC

Not the buffer size :-

<rec_half_life_days>X</rec_half_life_days>
A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs.

OK, I see what you are saying, but I am not sure why you set that larger. I want the estimated time to converge faster.
So I routinely set my mine as follows when installing BOINC:
<rec_half_life_days>1.000000</rec_half_life_days>

That was not the source of my problem. It was some incompatibility between the new BOINC (after 7.14.2) and the server.
It worked OK on some projects, and not others. I have not seen the problem for a while now, so it eventually corrects itself.


I have also set mine down to 1 day (the set it larger was part of the official documentation) and I didn’t know whether it would help you so I made the suggestion on the off chance that it would.
ID: 96309 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
furukitsune

Send message
Joined: 19 Mar 16
Posts: 8
Credit: 3,852,696
RAC: 2,825
Message 97902 - Posted: 4 Jul 2020, 14:24:43 UTC
Last modified: 4 Jul 2020, 14:33:50 UTC

getting error on all v4.20 tasks, which still validate:

https://boinc.bakerlab.org/result.php?resultid=1214389007

Extracting in slot directory: minirosetta_database.zip
error: cannot create ./minirosetta_database/scoring/qsar/shape_histogram_data.js
Permission denied

also happened on other versions.
all other files in minirosetta database unpack w/o problems.
can copy minirosetta to another directory and unpack this file no problem.

running win7, javascript is enabled.
file seems to be numbers only.

any suggestions?


fk
ID: 97902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 156
Credit: 2,515,335
RAC: 29,557
Message 97904 - Posted: 4 Jul 2020, 16:58:38 UTC - in response to Message 97902.  

furukitsune wrote:
error: cannot create ./minirosetta_database/scoring/qsar/shape_histogram_data.js
Permission denied
Are you running security software which might be blocking this file? If so, check the logs for that software. If it is blocking it, consider telling it to ignore the BOINC application and/or data directory.
ID: 97904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 41
Credit: 142,095
RAC: 207
Message 98075 - Posted: 14 Jul 2020, 18:54:54 UTC - in response to Message 87456.  

Project server status shows work has been available, but I have not received any Rosetta tasks for over a week.

Why is that?

Steven Gaber
Oldsmar, FL
ID: 98075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 755
Credit: 5,317,037
RAC: 22,026
Message 98076 - Posted: 14 Jul 2020, 19:10:50 UTC - in response to Message 98075.  

Project server status shows work has been available, but I have not received any Rosetta tasks for over a week.

Why is that?
See my response to this question you made in another thread.
Grant
Darwin NT
ID: 98076 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 16 · 17 · 18 · 19

Message boards : Number crunching : Rosetta 4.0+



©2020 University of Washington
https://www.bakerlab.org