Posts by CIA

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101921)
Posted 25 May 2021 by CIA
Post:
To everyone fighting in here and calling names: This is bug reporting thread. Please take your argument elsewhere. Nothing you all are posting has anything to do with the point of this particular thread.
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101304)
Posted 14 Apr 2021 by CIA
Post:
No WU's will start. Some examples: Pre-helical-bundles
TMWFY3V

Have tried aborting the first batch, but the second one, of 15 WU's, also did not run.

This has happened, once before, a couple of weeks, ago.

jm


This happened to me. Initially they appeared to be hung on "waiting to start" but I let them sit for awhile (about 15 minutes) and they did eventually start on their own. Let them sit for a bit and see if the same happens to you.

When I say "I let them sit for awhile" I mean I tinkered with them doing all the normal diagnostics (Suspend, resume, change to run always etc). After tinkering apparently did nothing to help, I put all my settings back at my normal defaults. As I was pondering what to do next I got distracted by something un-related and walked away from my machine. When I returned about 15 minutes later they had all started up on their own. So I guess patience might be the trick.
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101292)
Posted 13 Apr 2021 by CIA
Post:
/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample.

I am also running on a Mac. The mini protein_relax8 units also do complete after ~18.7 hours and provide credit; however, the credit is in the "two-hundred" range for 67,000+ seconds of work. So, I've gone in and aborted all of the "ready to start" mini protein_relax8 units and now I have all pre-helical-bundles_round1_attempt1 queued up.



Yea, I suppose I could do that but I'm honestly here for the science and if Reddit has taught me anything, internet points aren't worth anything. 8-). The long WU's are producing results, and that might be helpful to researchers. So I let them run.

I've got a few long units now going on my 36hr boxes, they are coming up on the 46hour cutoff, I wonder what results they will provide.
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101263)
Posted 12 Apr 2021 by CIA
Post:

Pretty much all of my mini protein_relax8 units are seconds (meaning they failed on another machine before I got them), and almost all of them are completing but taking 18 hours to do so. They are creating very few decoys.

Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1366333671
Have you changed the setting to allow 18 hours? Because all mine are sticking to the 8 hours. I'm getting 50% of the mini protein_relax8 completing in 8 hours, and the other 50% failing, usually taking 5 hours to do so.


During the latest drought I had this machine set to 36 hours, but Friday when it became clear the drought has ended I set it back to its normal default 8 hour runtime. So it's running for the standard 8hr and then 10 additional hours on top as others have mentioned before the auto-cutoff happens.

All my other machines are set to 36 hours, and while none of them have completed any of these longer units, some of them are showing signs it will happen to them also. For example on one machine I have a miniprotein WU that is only 57% done 22 hours in. I have a feeling it's going to crunch for 46 hours (set time limit +10hr cutoff).


/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101260)
Posted 12 Apr 2021 by CIA
Post:
Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien.

Anyone else? Or do I start checking my hardware?

Its not just you. I've got 29 that failed across a number of machines. They are all miniprotein_relax8 series that have died after running for an hour.
Same here, and on prehelical (although I didn't check the error type).



Pretty much all of my mini protein_relax8 units are seconds (meaning they failed on another machine before I got them), and almost all of them are completing but taking 18 hours to do so. They are creating very few decoys.

Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1366333671
6) Message boards : Number crunching : Rosetta work units won't start or run (Message 101189)
Posted 9 Apr 2021 by CIA
Post:
Just to chime in here on the 'Ready to Start' situation...

I noticed a few days ago that I was when I would get a work unit or 2 (during the most recent drought) they would stick on "ready to start' for quite some time before actually starting. I was tinkering with one machine trying to get them to start up, but then I got distracted. A few minutes later, the WU's just started up on their own. So basically, while odd, I think patience might be the solution.

Let them sit for awhile a see if they do start up. If they haven't started in 15 minutes then something else is probably wrong.
7) Message boards : Number crunching : A Change to the Default? (Message 100456)
Posted 20 Jan 2021 by CIA
Post:
I’ve had one sebvN_jp_TCRB task take over 11 hours to finish, and I’ve got three others still running after more than 10 (against a default run time of 8)


I have most of my crunchers set to 36 hour tasks. I've had a couple of the Sebv ones roll through that you mentioned. Even though I have my WU's set to 36 hours, the Sebv ones are all ending early (about ~24hours in). They only produce 3 decoys in that time.
8) Message boards : Number crunching : Shortcomings of Apple M1 Mac mini (Message 100351)
Posted 8 Jan 2021 by CIA
Post:
It's looking very much like a limitation of Apple's Rosetta for non-native applications, as when using a native application it can use all of the available RAM if it needs to.
A pity BOINC can't make full use of the available resources till a native BOINC & supporting science applications are developed. I suspect there will be quite a wait till that happens.


Exactly. That said, even though this is all running on emulation, based on the numbers I've seen so far if I was to just run one core on one task the points awarded are impressive. None of my other machines (dedicated 24/7 crunchers) crack 500 points per 8 hour task, most are in the 300's. The M1 mini, under emulation, is putting up 500+, some even 700-800 point tasks. Under emulation! The potential once we get a native client is huge.

This weekend I might set it to run with 25% of the CPUs (2 of 8) and see what it does when it's not fighting for memory. Monday I need to hand it over to the new owner though.


Does anyone know who handles the Mac client for Rosetta, and who handles the Mac client for BOINC? I'd be willing to start a kickstarter to get them a basic Mini to test code. They are only $699.
9) Message boards : Number crunching : Shortcomings of Apple M1 Mac mini (Message 100347)
Posted 8 Jan 2021 by CIA
Post:
I had the machine set to 90% on both. I did a fresh restart of the machine, and only launched BOINC and Activity Monitor (to see CPU and memory use). I had BOINC in the foreground (as the active app), and ran nothing else for the entire day and as I mentioned, stayed consistent between 7-8GB of memory used, never more.
And what is your Use at most xxx % of the CPUs set to?


Seeing is believing, here's the prefs as set up on the M1 based Mac Mini. Again, the tasks it did complete put up really good points, it just didn't utilize the full potential of the machine. After a fresh reboot, it was the only app running besides Activity monitor for 24 hours.

10) Message boards : Number crunching : Shortcomings of Apple M1 Mac mini (Message 100345)
Posted 8 Jan 2021 by CIA
Post:
Just a little update after running it for most of the day. It eventually settled down and pretty much only runs on 3 of the 8 cores and never goes over 7.5GB of RAM, even when nothing else is running. (Again, 16GB machine) All the other tasks are either "waiting to start" or "waiting for memory".
In your Rosetta account, Preferences, When and how BOINC uses your computer, Computing preferences, what are your memory settings?
eg
    When computer is in use, use at most 95 %
When computer is not in use, use at most 95 %


I had the machine set to 90% on both. I did a fresh restart of the machine, and only launched BOINC and Activity Monitor (to see CPU and memory use). I had BOINC in the foreground (as the active app), and ran nothing else for the entire day and as I mentioned, stayed consistent between 7-8GB of memory used, never more.

Currently I'm encoding something in Premiere while running out the last task, and memory use is bumping up against the 16GB max, so it can use more memory when other things are going on. The last task running is one of those rb monsters, taking up over 2.5GB of memory at around 65% complete.
11) Message boards : Number crunching : Shortcomings of Apple M1 Mac mini (Message 100336)
Posted 8 Jan 2021 by CIA
Post:
Just a little update after running it for most of the day. It eventually settled down and pretty much only runs on 3 of the 8 cores and never goes over 7.5GB of RAM, even when nothing else is running. (Again, 16GB machine) All the other tasks are either "waiting to start" or "waiting for memory".

The numbers (per core) that it puts up once it finishes a WU are impressive, especially considering it's running everything under emulation. Unfortunately, until the memory issue is solved with a native app it's a lot of power going to waste. It does crunch, just not to it's full potential.

I do look forward to seeing the full potential once a native version of BOINC and Rosetta are released for Apple Silicon going down the road.
12) Message boards : Number crunching : Shortcomings of Apple M1 Mac mini (Message 100333)
Posted 7 Jan 2021 by CIA
Post:
Here's a screenshot showing CPU, memory etc usage on the M1 Mini. I'm guessing the first 4 cores are the low power ones.

13) Message boards : Number crunching : Shortcomings of Apple M1 Mac mini (Message 100332)
Posted 7 Jan 2021 by CIA
Post:
I'm just going to chime in here, as we just took delivery of a M1 based MacMini for our front desk person here at work, but I intercepted it to "test" prior to deployment as a basic outlook/word processing box.

Our machine has 16GB of RAM, 512GB HD. Last night I started Rosetta, but also was compressing an hour long video (ProRes 422 --> H.265) using Apple Compressor, I have to say, the power of this machine doing both tasks is impressive. With BOINC set to 'always run', the video encode was still twice as fast as my classic 12c/24t MacPro, even when the MacPro is only encoding and doing nothing else.

So anyway, the first few tasks have a longer CPU time then you would normally see for a machine dedicated to 100% BOINC/Rosetta because of the background encode. Also I was just reminded in this thread about the memory options in the computing pref menu. I now have the machine set to use 90% of the memory all the time. For the next 24 hours I'm going to just run Boinc alone, and see how it does as it settles in. The first 8 tasks were not really clean as the memory was set low and the computer was also encoding video.

It will be interesting to see how the tasks are handled with 4 of the 8 cores being low power, highly efficient, vs the other 4 being high power, high speed. Of the 8 tasks downloaded and running, it will sometimes run all 8 at once, but most of the time I see "waiting for memory" on a few tasks. Currently as I look over at it, 5 of 8 tasks are running, the rest have no memory. Activity Monitor shows only 8GB of RAM being actively used (out of 16 available), so the emulation software running behind the scenes must be sucking up quite a bit.

It would be nice to see a M1 native version of the BOINC client, and the Rosetta application (Not to be confused with the Apple Rosetta2 emulation layer) to see what this machine can really do.
As others have mentioned, even running full bore last night with Rosetta and full GPU video compression happening, you couldn't hear the machine at all, the fan didn't seem to be working hard (or at all) and the Mini was barely warm to the touch.

If you want to see it in action over the next day or so, machine is here: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=5980699
14) Message boards : Number crunching : 3832 new hosts per day? (Message 100026)
Posted 14 Dec 2020 by CIA
Post:
If a two hour task completes 10 models, then an eight hour task is more likely to complete about 40 models, not 15.

Note that not all tasks can complete in two hours. With such a short runtime preference, you are more likely to see tasks running longer than the preference. When you look at credits, you really must consider the amount of actual CPU time, not the number of work units, and not just the runtime preference.

There are no "missing results". So, set your preferences in a way that works for you and your machine.

If you use Dr. Baker's analogy of exploring a planet's surface for the highest or lowest elevation on the planet, then each model is one of the explorers. They start their exploration from a random point on the planet. When a work unit has enough time to begin another model, that next model will be started at another random point on the planet, with no regard to the first model or what it found. If you drop 10,000 explorers on the planet, your success in finding the true highest or lowest elevation would essentially be proportional to the surface area of the planet. If 10,000 explorers is adequate for Mars, you might need 100,000 for Saturn. So, when they feel they have a Saturn-sized protein for study, they might create more work units. But, as you point out, they have no way to predict exactly how many models will result. If they approach the end of work units coming back in and still only have 80,000 results, then they create more work units to obtain the 100,000 results desired.

Having said that, once they see the results, they can sometimes give hints to future explorers, or essentially drop more of them near the Himalayas. So they might create a secondary batch of work units, which are designed to concentrate the focus based on what was learned on the first round.



This is a terrific explanation, thank you. Based on this explanation I think I'm going to bump my WU processing time (on my dedicated 24/7 zero cache machines) from 24 hours to 36.
15) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 98883)
Posted 8 Sep 2020 by CIA
Post:
Since we are off topic anyway, I have 9 machines running on my account. They run Rosetta 24/7. Back in March when I set these 9 machines up, they had the default 8hr runtime, but these days 8 of the 9 are now set to 24hr runtimes. They've been on 24hr runtimes for about 3 months, and returned hundreds of tasks while at that time limit. Boinc still shows fresh tasks as 8 hours on them though, so I have my cache at 0 to avoid getting dozens of WU's I can't possibly finish before the deadline. Once they start crunching they gradually adjust the "Time Remaining" and "Elapsed" time to = 24 hours, but when they are new they still show 8.

Do I just need to detach and re-attach to Rosetta to get it to see these machines now have a 24hr runtime set on them?
16) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 98779)
Posted 7 Sep 2020 by CIA
Post:
Units that bounced back to the server for whatever reason (not started on time, not finished on time etc) will trickle in and out for the next day or so and really clear all the actual work. Given the holiday weekend I'd not expect a proper refill until Tuesday at the earliest. Hopefully I'm wrong on that.

If you have work queued up, might want to up your WU time to 24hrs so they can keep crunching longer until the drought ends.


How does that work? Does a WU get given to me with 50 things to do, my PC does as many of those 50 as it can in 8 hours, then sends those answers back, then the server puts the uncompleted work back together in new WUs?

And why does Rosetta do it this way when no other project does? I can see the advantage that everyone knows exactly how long something will take, whether they have a fast or a slow computer, but it seems a bit complicated from their end.



I was referring to WU's as a whole, not what's inside them. Often for whatever reason you will see machines that are (for example) dual core laptops with 96 WU's in their queue. There's no way they are finishing them all in the 3 day window so the WU's get bounced back once the timers run out and they then are sent out to someone else. That's what I was referring to.
17) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 98770)
Posted 6 Sep 2020 by CIA
Post:
Units that bounced back to the server for whatever reason (not started on time, not finished on time etc) will trickle in and out for the next day or so and really clear all the actual work. Given the holiday weekend I'd not expect a proper refill until Tuesday at the earliest. Hopefully I'm wrong on that.

If you have work queued up, might want to up your WU time to 24hrs so they can keep crunching longer until the drought ends.
18) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 98749)
Posted 4 Sep 2020 by CIA
Post:
Looks like we are running low on Tasks again....
19) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 98303)
Posted 24 Jul 2020 by CIA
Post:
Why not just create a new project called "RosettaBIG" and be very clear in the project description that it's essentially the same as Rosetta, but intended for machines that can run 24/7 and have lots of RAM.

"If you are unsure if your computer is qualified for this project, please join the normal, original Rosetta project instead."

You might not get a ton of participants, but the people who do sign up would have big iron machines and are willing to donate lots (24/7) of CPU time.
20) Message boards : Number crunching : Why are my 'Remaining' time estimates so far off? (Message 98164)
Posted 17 Jul 2020 by CIA
Post:
Just as a heads up, you are running an older version of BOINC. I suggest you download the current version. https://boinc.berkeley.edu/download_all.php


Next 20



©2021 University of Washington
https://www.bakerlab.org