Posts by Bryn Mawr

1) Message boards : Number crunching : Not getting any python work (Message 102948)
Posted 4 days ago by Bryn Mawr
Post:
You say that you *can’t* use app_config to limit the number of work units but I’ve been doing so for a couple of years with zero problems.
If you check out several of the other threads that Greg_BE has posted to, you would see that using max_concurrent has caused huge amounts of problems in the past- basically the Scheduler allocates 100s (or more) Tasks, when only a few are actually needed.
While it might not cause problems with your systems & projects, it does with Gre_BE here at Rosetta on one of his systems.
As jim1348 posted, he has also had similar issues, when using project_max_concurrent.

There is a bug (rare as it is) with max_concurrent/project_max_concurrent that has yet to be resolved.


I have obviously seen the bug reports you reference and they usually take the form don’t do it, it always causes problems.

For the sake of balance and to inform those tying to track down the bug I wanted to register the fact that, in some circumstances, it works ok and is stable.
2) Message boards : Number crunching : Not getting any python work (Message 102944)
Posted 5 days ago by Bryn Mawr
Post:
OK, that is useful. It may happen only when running multiple work units (or at least more than two).
In that case, smaller memory may be better. You can't use an app_config to limit the number of work units until they get the download bug fixed.

I can run Rosetta in a second BOINC instance and limit it to one or two work units at a time, but that affects both the pythons and the non-pythons.
They need to give us some way to select them.

Thanks.


You say that you *can’t* use app_config to limit the number of work units but I’ve been doing so for a couple of years with zero problems.

Each of my projects has :-

<app_config>
<project_max_concurrent>N</project_max_concurrent>
</app_config>


and it limits the processing as required with no runaway downloads.
3) Questions and Answers : Windows : No New Tasks for PC (Message 102922)
Posted 7 days ago by Bryn Mawr
Post:
Hi, I am running Rosetta@Home on two computers, one 2016 MacBook Pro, and one PC Laptop purchased and set up last year. My Macbook has been downloading and running tasks successfully, however, my PC has never downloaded any Rosetta@Home or Ralph@home tasks. I am not sure why this is, or how to fix it, and would appreciate any support that could be offered.


According to Rosetta’s records the pc has processed 65 WUs over the time it’s been connected so we need to find out what has changed since it did so.

First place to look is the computer preferences page, especially the sections that deal with suspending activity and memory allocation. Could you list the values you have set?
4) Message boards : Number crunching : Lots of computer time, little credit (Message 102879)
Posted 15 days ago by Bryn Mawr
Post:


And how can CPU's be "slowed" if each core is doing its own thing?
All RAH tasks have their own cores as do the other projects.
The GPU stuff has it own core to monitor its work and one core is free for system and web.
So what exactly is "slowing" things down?


A task does not have exclusive access to a core, the work is threaded and a task can be interrupted at any time depending on its priority and Boinc runs at minimal priority thus the cpu time will be shorter than the elapsed time to the extent of the interruptions it experiences.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 102858)
Posted 17 days ago by Bryn Mawr
Post:
Bryn Mawr - added the half_life, will sit back and see what happens.
Current WCG is dying in credits, guess I will have to pump that one up higher in %
Or just let things be until they have a chance to settle down- with 8 active projects, even with the changed half life value, i'd expect you're looking at a couple of weeks. One week bare minimum.
Then adjust Resource share as necessary.



Ok..will do.
It's 5 active.
I thought I had 2 GPU projects, but it seems just one at the moment.
So its 3-4 CPU projects.


I recently (6 weeks ago) added a 5th project (6 if you include Ralph which very rarely has work) because 3 of the projects were out of work / broken at the same time.

One of my crunchers is now back to running smoothly whilst the other still has the occasional lump or bump as one project or another grabs a bit extra but is almost there.



I had more than a lump and a bump before I tried dividing up the computer.
Like now, WCG is really really down close to dead and now that I opened things back up it still is down, but the results I checked are pending. So there is hope.


That’s the project, not your machine. I’ve just had two days of low WCG credits and the shortfall turned up this morning - c’est la vie.
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 102852)
Posted 19 days ago by Bryn Mawr
Post:
Bryn Mawr - added the half_life, will sit back and see what happens.
Current WCG is dying in credits, guess I will have to pump that one up higher in %
Or just let things be until they have a chance to settle down- with 8 active projects, even with the changed half life value, i'd expect you're looking at a couple of weeks. One week bare minimum.
Then adjust Resource share as necessary.



Ok..will do.
It's 5 active.
I thought I had 2 GPU projects, but it seems just one at the moment.
So its 3-4 CPU projects.


I recently (6 weeks ago) added a 5th project (6 if you include Ralph which very rarely has work) because 3 of the projects were out of work / broken at the same time.

One of my crunchers is now back to running smoothly whilst the other still has the occasional lump or bump as one project or another grabs a bit extra but is almost there.
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 102843)
Posted 23 days ago by Bryn Mawr
Post:


Initially I left things alone and then credits got all out of whack, then I ran into issues with tasks taking up days on end and nothing else getting done by other projects and just a bunch of yoyo stuff going on.
So I took back control. Again, everything was working fine until this bug showed up. And that just showed up. Maybe after updating to the latest BOINC.

Anyway..I'll mess around with things until I find the right mix.
No need to clog up this thread.


As Grant has said, the more you mess around with things the worse the situation will become.

Set rec_half_life to 1, sit back and chill for a month and the system will follow your project shares.
8) Message boards : Number crunching : 1.03 (vbox64) is out for rosetta python projects (Message 102735)
Posted 29 days ago by Bryn Mawr
Post:
Hopefully they don't switch exclusively to vbox - or give users the ability to choose workunit types. There are doubtlessly countless machines that do not have virtual box installed and likely will not do so as they simply sit and crunch with users not checking the forums.

How much memory does each WU consume with these new python WUs? Do they finally use SSE/Avx? I don't think I have gotten one yet.
thanks


If my machine are anything to go be, if you don’t have box installed it won’t download python tasks.
9) Message boards : Number crunching : 1.03 (vbox64) is out for rosetta python projects (Message 102734)
Posted 29 days ago by Bryn Mawr
Post:
I doubt we'll be out of normal Rosetta units for a while yet because I believe the Robetta server takes requests in from around the world and distributes them on r@h. I wouldn't expect all those users to start using trrosetta yet, but I might well be wrong. Would be good to hear from someone in the project.


The problem is the selection of tasks to queue for delivery to us users.

There may be millions of tasks waiting but there are only ever 29k tasks queued and that can (and have been) be skewed to be all python.
10) Message boards : Number crunching : rosetta python projects (vbox64) (Message 102676)
Posted 18 Sep 2021 by Bryn Mawr
Post:
I'm not a lucky man....
<error_code>-119 (md5 checksum failed for file)</error_code>
<error_message>MD5 check failed</error_message>




Would you please include a link to one of the tasks that is causing this?

Do you have any other work from Rosetta?

Here is a crazy idea, set the project to 'no new tasks'. Finish all the non python work you have and then click on reset the project button. All I am thinking here is that it will maybe randomly reattach at a better point in the server queue. It is not a real fix to the problem, but it will take you back to a clean state.

Also just for the heck of it, be sure that you have the latest VM and extension pack. Again, may or may not help with the problem, but its worth a try.


The server queue currently has 29k python tasks and just 2 Rosetta 4.20 tasks so it’s no go until that changes.
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 102673)
Posted 18 Sep 2021 by Bryn Mawr
Post:
Oh well, a few more hours & i'll be out of work again, even though this time there's still millions available.
I don't want to jinx things, but work appears to be flowing again.
Complains about the lack of VirtualBox messages keep occurring, but at least i can get work again.


I’m with you, I will not be running virtualbox or the python tasks so if that screws up running normal Rosetta so be it.
12) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 102671)
Posted 18 Sep 2021 by Bryn Mawr
Post:


I was making a comparison between Rosetta and other projects I participated in.

It would be better if the credit system were on a level playing field.

S. Gaber


It would be massively better and in an ideal world we would have a way of enforcing it but in the real world where Boinc is open source software we cannot.
13) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 102624)
Posted 15 Sep 2021 by Bryn Mawr
Post:
So it takes 8 hours to complete one R@H task. BOINC sent me 16 tasks, all with the same deadline, three days hence on Septmber 17, 2021. There are only 72 hours in those thre days. Right now I have 15 uncompleted tasks, at 8 hours each, totalling 120 hours to complete. Ain't gonna happen. I could do it in maybe 5 days.


Which means that Boinc has supplied exactly the number of tasks that your machine could do in 3 days - 72 hours on each of 2 processors = 144 hours capacity.


That would be true if I didn't have three other projects to run. If I do all of those Rosetta tasks on time, my other projects would suffer.


I agree that Boinc has not taken that into account but I stand by the statement as made.
14) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 102622)
Posted 15 Sep 2021 by Bryn Mawr
Post:
So it takes 8 hours to complete one R@H task. BOINC sent me 16 tasks, all with the same deadline, three days hence on Septmber 17, 2021. There are only 72 hours in those thre days. Right now I have 15 uncompleted tasks, at 8 hours each, totalling 120 hours to complete. Ain't gonna happen. I could do it in maybe 5 days.


Which means that Boinc has supplied exactly the number of tasks that your machine could do in 3 days - 72 hours on each of 2 processors = 144 hours capacity.
15) Message boards : Number crunching : Ralph Config (Message 102502)
Posted 30 Aug 2021 by Bryn Mawr
Post:
Well I’ve learnt a lesson tonight, after having Ralph active on my rigs for some while I finally pulled some work down. Whilst I have a limit set for Rosetta there was no such limit for Ralph so it collected 24 WUs and started all of them, promptly suspending half a dozen for lack of memory and just as promptly locking the machine solid.

I rebooted and copied the app_config across before it locked up again. A second reboot and it’s run ok since.

Moral? Don’t let any project free reign to take more cores than you can possibly process - after a drought or if the other projects run out of work you’ll be in trouble.
16) Message boards : Number crunching : No Work Units Fetched (Message 102467)
Posted 26 Aug 2021 by Bryn Mawr
Post:
What I don’t understand is why you stop the project for the interim rather than just leave it be.

I stop the download of the R@H work 'cause i don't like to have mixed different project wu's on my pc.
So i'm crunching other projects until Rosetta's return


Ah, I understand although I find that a mixture of projects is more efficient than filling all cores with the same type of WU.
17) Message boards : Number crunching : No Work Units Fetched (Message 102458)
Posted 25 Aug 2021 by Bryn Mawr
Post:
Add TN-Grid alongside (as I have done) but why switch and stop Rosetta?

Usually, the "no-wus status" is, at minimum, from 4 days.
I'll return when wus are on the road....


What I don’t understand is why you stop the project for the interim rather than just leave it be.

It’s doing no harm sitting there waiting for more work to turn up.
18) Message boards : Number crunching : No Work Units Fetched (Message 102438)
Posted 23 Aug 2021 by Bryn Mawr
Post:
And we're out of work again,

Switch to TN-Grid


Add TN-Grid alongside (as I have done) but why switch and stop Rosetta?
19) Message boards : Number crunching : Why is it when there is an outage or some other issue there is never any news? (Message 102403)
Posted 16 Aug 2021 by Bryn Mawr
Post:
I know that the tech guru of RAH left some years ago, but honestly, now this has become a real hit and miss project. With all the big banner logos on the home page you would think they would have someone on call to fix these kind of things.

How long have we been "dry" now? A few days and no news?
Insane.
I've been around since 2006 and ever since the "Guru" left, this projects attention to technical details has gone down and the silence out of the group is deafening.

Most of now just shrug and let our other projects take over, but this has always been my top project. Now I am beginning to think less of it due to these issues.


I would have to agree, even a simple “no more work until next month” would set expectations.
20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 102339)
Posted 5 Aug 2021 by Bryn Mawr
Post:
This is probably an issue for the BOINC board, but since I know you jamokes I will ask here first.

On one of my hosts, the BOINC manager will not launch. I try to run it as I have in the past, but nothing happens at all.

The host seems to be processing work, based on the CPU usage. But no joy with the manager program.

Any suggestions? I suppose that I can uninstall/reinstall; will that disturb anything?


Look for a file in your home directory, 5 bytes long and called Boinc-Manager-xxx or similar. If you find it, delete it - it’s a lock file.


Next 20



©2021 University of Washington
https://www.bakerlab.org