Posts by Greg_BE

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104384)
Posted 1 day ago by Profile Greg_BE
Post:
Total queued jobs: 2,589,661
In progress: 53,882
Successes last 24h: 34,678

that's what the page says.
Pretty small numbers against the 2 mill.
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104383)
Posted 1 day ago by Profile Greg_BE
Post:
Most projects don't take nearly as much as the pythons or LHC of course.

I like memory, but beyond 64 GB you have stability problems, since you have to use all four slots.
Sometimes it works, but you often have to juggle memory around. You may have to spend more than you anticipated.
Two slots is a lot safer.
Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds.


I have 49 and change spread out over 4 slots.
Everything works as it should.
The new drive is 500 gigs and it will be dedicated to BOINC
So there is more than enough room for swap or whatever else BOINC wants to do.
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104382)
Posted 1 day ago by Profile Greg_BE
Post:
Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".
It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once.
You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.
That's why it's "huge"
Yep.
Roughly 1 in 133 is being processed. Compared to Rosetta 4.20 at their peak (20 million queued up, 400k in progress) 1 in 50 being processed.

And given the huge issues with Python Tasks, such as those that sit there not actually using any CPU time so they're not actually being processed, i'd suggest that 1 in 133 value in reality is way, way, waaaay worse than that.



Well you've seen the numbers. People come and try it out and leave.
Others can't get it to work and leave.
Without the staff taking notice or caring, it will be a downward to stable trend of systems instead of upward.
But again, they don't care about numbers, just as long as the work gets done eventually.
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104381)
Posted 1 day ago by Profile Greg_BE
Post:
Pythons for 12 hours? They average 2 hours here.

As for only being able to run a couple at a time, you need a lot of RAM. I can run 5 but it won't do 6 in 16GB.

No, not an individual run for 12 hours. After running a series of them continually.
I didn't say anything about running only two. I usually run at least eight, and am presently running twenty on a Ryzen 3900X with 80 GB of memory.



Holy cow! 80 gigs?!?! That's more than my budget can afford!
It hurt enough to put in 32 on top of the 16 I put in about 4 years ago.

I just can't see investing much more for being a volunteer.
At best another 1080 or better, but that's it.
I have a Ryzen with 64GB, but it's my main computer. Less than that is pitiful by today's standards. It will take 128GB.
I have two Boinc only machines with 36GB in them. I upped them just enough to run LHC.


Once I get my new drive installed this weekend, I should be able to undo the restriction I have right now on python and with the current memory, I should be able to run a few more pythons plus all my other projects or a full load of pythons (16) and have a little bit of memory left over.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104370)
Posted 1 day ago by Profile Greg_BE
Post:
I have posted this many times before. They should make it a sticky if there were any moderator around to do it.

If you are running VirtualBox 6.1.x, you will get the "Vm job unmanageable" problem with the pythons. That is true whether you are running Windows or Linux.
The difference is that it can be fixed with Windows. You go back to VirtualBox 5.2.44
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2

Unfortunately, that does not work on Linux, at least Ubuntu. Firstly, Ubuntu 20.04.3 works only with VBox 6.1.x.
Secondly, even going back to Ubuntu 18.04.6, which allows you to install VBox 5.2.44, still has the problem.

They need to fix it at the project end, by compiling a new Vbox wrapper. They did it on LHC, and it works there. (It has to do with the COM interface, in case you are interested.)

NB: If you reboot frequently, you may not see the problem. It usually occurs after the pythons have been running 12 hours or so, but I have seen it even after a reboot on Ubuntu.



Jim, I went back to 6.1 and I do not have problems.
I can run all my projects there.
Going back to 5.2 is a good place to start for trouble shooting Python VM problms, but this can affect other projects.
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104369)
Posted 1 day ago by Profile Greg_BE
Post:
Pythons for 12 hours? They average 2 hours here.

As for only being able to run a couple at a time, you need a lot of RAM. I can run 5 but it won't do 6 in 16GB.

No, not an individual run for 12 hours. After running a series of them continually.
I didn't say anything about running only two. I usually run at least eight, and am presently running twenty on a Ryzen 3900X with 80 GB of memory.



Holy cow! 80 gigs?!?! That's more than my budget can afford!
It hurt enough to put in 32 on top of the 16 I put in about 4 years ago.

I just can't see investing much more for being a volunteer.
At best another 1080 or better, but that's it.
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104368)
Posted 1 day ago by Profile Greg_BE
Post:
Robetta, as far as I can tell, is separate from Rosetta@home and is used mostly by researchers outside of the Baker Lab/IPD. It's an interface for users who wish to get computing power for their jobs.
Jobs that require the use of Rosetta 4.20 that are submitted to Robetta get sent to Rosetta@home but the rest goes to the other servers that they set up when they launched RoseTTAFold.



Ok..so then where do they get the million something tasks in queue?
But yet there appears to be only a few thousand released?
There has always been a million something in queue, even back when it was just 4.2 alone.
So something doesn't add up.
And that you can't see what is next in line....but yet you can see Robetta?
Plus someone kept quoting Robetta information some time ago as if that was where RAH gets its work.
8) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104355)
Posted 1 day ago by Profile Greg_BE
Post:
https://robetta.bakerlab.org/queue.php?id=&target=&username=&seq=&page=2
Since these Robetta jobs aren't run at Rosetta@home, it doesn't make sense that they would get queued at Rosetta@home.
They seem to be directly sent from Robetta to the Baker Lab cluster and to the HHMI's Janelia Research Campus.

My bet is the 2.6 million tasks on the Rosetta@home queue are all Pythons.
So how come there are so many usernames, it certainly looks like Boinc.



That's not users related to us. That's the person who submitted the protein.
Again...if its Robetta, its not us. That's all there is to it.


I'll have to dig around some more.
I have always known from other sources that Robetta server supplied Rosetta servers.
That Robetta was where everything is stored.
That may not be the case, but it's something that will take some digging around.
The group is not that transparent on how their setup works.
9) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104351)
Posted 2 days ago by Profile Greg_BE
Post:
Ah ha! Well sorry for the misspelling. Quick typing and no thinking after work.
4th dimension does exist...but anyway...again after work..not thinking. Missed the pun.

Now...if you go here: https://robetta.bakerlab.org/queue.php?id=&target=&username=&seq=&page=2 and look at the active tasks at random, you will see that the majority are queued for RoseTTAFold which is the AI.

You can read about it here:
https://www.ipd.uw.edu/2021/07/rosettafold-accurate-protein-structure-prediction-accessible-to-all/

Now bed...
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104346)
Posted 2 days ago by Profile Greg_BE
Post:
Ignore that big fancy number on the front page.
That is what they have in queue for both the AI and RAH of which 99% are AI tasks.
But the point is that was spotted dropping suddenly, so they must have removed some, presumably due to problems. I wish other projects had that number. All we get to see is the little front end buffer on most projects.



But again, that number is mute to this aspect of the project.
It has no bearing on what we do.

Just watch the numbers I quoted. That is all you need to be concerned about. Because that is the work WE get, not the machine.
It just looks cool to say..oh we have 2 million tasks queued up, but when you dig deeper on Robetta, then you see, AI, AI, AI,AI.....Rosetta,AI,AI,AI,AI maybe a Rosetta.
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104345)
Posted 2 days ago by Profile Greg_BE
Post:
Check this out from a 4.2 task today

File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)

Gees...really?!?!?
Looks rather like something tried to use the 4th dimension. Does your processor not support that function?

I have no graphics on my CPU.
Besides, the program is supposed to take care of any graphics or whatever.
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104342)
Posted 2 days ago by Profile Greg_BE
Post:
Check this out from a 4.2 task today

<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_01_17_185861_181891_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 1 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_01_17_185861_181891_ab_t000__robetta.zip -frag3 rb_01_17_185861_181891_ab_t000__robetta.200.3mers.index.gz -fragA rb_01_17_185861_181891_ab_t000__robetta.200.9mers.index.gz -fragB rb_01_17_185861_181891_ab_t000__robetta.200.5mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1484534
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
------------------------ Begin developer's backtrace -------------------------
BACKTRACE:
------------------------- End developer's backtrace --------------------------


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>

Gees...really?!?!?
13) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104341)
Posted 2 days ago by Profile Greg_BE
Post:
They make snapshots even if the app is stuck in a loop going nowhere,
I have seen 30+ snapshots with only 5 minits of cpu time wasters.

By the way what happened to 700,000 workunits vanished from the front page que?
its down to only 1.8 million
are they trying to find the buggy one`s
Interesting, now up to 2.2 million. I'll try grabbing some and see what happens.



Ignore that big fancy number on the front page.
That is what they have in queue for both the AI and RAH of which 99% are AI tasks.

Get to the next layer deep where it breaks down 4.2 and python.

This is the real number for us lowly PC crunchers:
Application Unsent In progress Runtime of last 100 tasks in hours: average, min, max Users in last 24 hours
Rosetta 0 61887 6.62 (0.28 - 51.23) 2600
rosetta python projects 4999 13547 4.59 (0.71 - 57.86) 1059
14) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104327)
Posted 3 days ago by Profile Greg_BE
Post:
And wow, you were in bed for under 8 hours.


yeah and I am paying for that.

I didn't think of looking in the VM for info.
If I get stuck next time, I'll have a look.
15) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104324)
Posted 3 days ago by Profile Greg_BE
Post:
This is weird https://boinc.bakerlab.org/rosetta/result.php?resultid=1464088106
1.5 days processing for 20 minutes or so cpu time.

So heres the breakdown

022-01-18 10:33:11 (15556): Status Report: Elapsed Time: '15314.521130'
2022-01-18 10:33:11 (15556): Status Report: CPU Time: '29.109375'


2022-01-18 00:17:37 (15156): Creating new snapshot for VM.
2022-01-18 00:17:42 (15156): Deleting stale snapshot.
2022-01-18 00:17:43 (15156): Checkpoint completed.
2022-01-18 00:21:45 (15156): VM state change detected. (old = 'running', new = 'paused')
2022-01-18 00:22:01 (15156): Powering off VM.
2022-01-18 00:22:01 (15156): Successfully stopped VM

(end of my day so I shut down via suspend, shut down client (leave in memory), exit BOINC

Now I restart:
2022-01-18 08:12:38 (15556): VM state change detected. (old = 'poweredoff', new = 'running')
2022-01-18 08:12:38 (15556): Status Report: Elapsed Time: '9314.493395'
2022-01-18 08:12:38 (15556): Status Report: CPU Time: '18.328125'
2022-01-18 08:12:38 (15556): Preference change detected
2022-01-18 08:12:38 (15556): Setting CPU throttle for VM. (100%)
2022-01-18 08:12:38 (15556): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 600 seconds))
2022-01-18 08:32:02 (15556): Creating new snapshot for VM.
2022-01-18 08:32:12 (15556): Deleting stale snapshot.

Then this point
022-01-18 10:33:11 (15556): Status Report: Elapsed Time: '15314.521130'
2022-01-18 10:33:11 (15556): Status Report: CPU Time: '29.109375'

here is 6 hrs
022-01-18 12:31:57 (15556): Status Report: Elapsed Time: '21314.549383'
2022-01-18 12:31:57 (15556): Status Report: CPU Time: '39.125000'
2022-01-18 12:37:42 (15556): Creating new snapshot for VM.
2022-01-18 12:37:43 (15556): Deleting stale snapshot.

2022-01-18 14:28:55 (15556): Status Report: Elapsed Time: '27314.711735'
2022-01-18 14:28:55 (15556): Status Report: CPU Time: '49.218750'

2022-01-18 16:09:27 (15556): Status Report: Elapsed Time: '33315.182032'
2022-01-18 16:09:27 (15556): Status Report: CPU Time: '59.093750'

2022-01-18 18:11:20 (15556): Status Report: Elapsed Time: '39315.521685'
2022-01-18 18:11:20 (15556): Status Report: CPU Time: '68.562500'

Something went nuts, but does not show up in the report:

2022-01-18 19:27:47 (15556): Checkpoint completed.
2022-01-18 19:33:12 (11508): Detected: vboxwrapper 26202
2022-01-18 19:33:12 (11508): Detected: BOINC client v7.16.20
2022-01-18 19:33:13 (11508): Detected: VirtualBox VboxManage Interface (Version: 6.1.30)
2022-01-18 19:33:13 (11508): Feature: Checkpoint interval offset (88 seconds)
2022-01-18 19:33:13 (11508): Detected: Minimum checkpoint interval (600.000000 seconds)
2022-01-18 19:33:13 (11508): Restore from previously saved snapshot.
2022-01-18 19:33:14 (11508): Restore completed.


2022-01-18 19:33:19 (11508): Status Report: Elapsed Time: '43879.012785'
2022-01-18 19:33:19 (11508): Status Report: CPU Time: '75.46875


2022-01-18 21:13:48 (11508): Status Report: Elapsed Time: '49879.776962'
2022-01-18 21:13:48 (11508): Status Report: CPU Time: '86.453125'

2022-01-18 22:59:05 (11508): Status Report: Elapsed Time: '55880.065147'
2022-01-18 22:59:05 (11508): Status Report: CPU Time: '96.125000'

2022-01-19 00:02:14 (11508): VM state change detected. (old = 'running', new = 'paused')
2022-01-19 00:02:44 (11508): Powering off VM.
2022-01-19 00:02:44 (11508): Successfully stopped VM.

*End of day 1*


Start day 2

2022-01-19 07:58:26 (16032): VM state change detected. (old = 'poweredoff', new = 'running')
2022-01-19 07:58:26 (16032): Status Report: Elapsed Time: '58981.617149'
2022-01-19 07:58:26 (16032): Status Report: CPU Time: '100.656250'

022-01-19 10:00:34 (16032): Status Report: Elapsed Time: '64981.857656'
2022-01-19 10:00:34 (16032): Status Report: CPU Time: '112.250000'

022-01-19 11:46:01 (16032): Status Report: Elapsed Time: '70982.433000'
2022-01-19 11:46:01 (16032): Status Report: CPU Time: '122.140625'

022-01-19 13:26:46 (16032): Status Report: Elapsed Time: '76982.663074'
2022-01-19 13:26:46 (16032): Status Report: CPU Time: '132.531250'

2022-01-19 15:11:43 (16032): Status Report: Elapsed Time: '82982.833196'
2022-01-19 15:11:43 (16032): Status Report: CPU Time: '142.390625'

2022-01-19 17:17:08 (16032): Status Report: Elapsed Time: '88982.986887'
2022-01-19 17:17:08 (16032): Status Report: CPU Time: '152.312500'

2022-01-19 19:05:11 (16032): Status Report: Elapsed Time: '94983.557718'
2022-01-19 19:05:11 (16032): Status Report: CPU Time: '161.968750'

This is where I take the time to look and see how things are going and say WTF! 2 days! Come on! ABORT
16) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104312)
Posted 4 days ago by Profile Greg_BE
Post:
I had selected also Biology but I still would get only Milkyway and Asteroid. I have two GPU boards running Einstein, a GTX 1060 with 3 GB RAM and a GTX 1650 with 4 GB RAM. My latest PC has a AMD Ryzen 4500U which has graphic capabilities, but it is slower compared to the two GTX boards. There was a period when Gravitational Waves GPU tasks required more than 3 GB RAM and I had to buy the GTX 1650 board.
Tullio



I have been running Einstein with 1050 as my most powerful GPU back when I started, now I can use a 1080 I got a few years ago.
These work fine for what I want to do.
I've already put enough money into this system, no need to upgrade anything that still works.

I think I have about 500 sitting here next to me. New full case to be able to handle the expanded radiator set me back a good chunk.
New CPU hurt. Burned the other one.
1080 was 2nd hand from a graphics design company server, was cheap as far as GPU's go.
1050 has been with me forever.
New MOBO and new digital PSU from awhile back. I burned up one of those cheaper ones. Forgot who that was by.
It all adds up.
As much as I envy the xenon guys, there is no way I will ever be able to afford that.
My dream is a threadripper, But then that's a new MOBO again. Forget it.


I chose this project and Einstein because they are both based in my old home state of Washington (not DC).
LIGO is out at Hanford and that is about 80 minutes or so from my parents place.
This one, well I used to live in Seattle, so I read about this in the Seattle Times and joined up.
It's a shame to see them shove us off to the side, but I guess that's what happens when you get to be big time.
17) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104308)
Posted 4 days ago by Profile Greg_BE
Post:
Here are their numbers: https://boinc.netsoft-online.com/e107_plugins/boinc/bp.php?project=6

Mostly going down
18) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104297)
Posted 5 days ago by Profile Greg_BE
Post:
Welcome to the button pressers club , sigh . . . .

Interesting point in that so many big crunchers have stopped crunching for rosetta that my xeon system is now 9th in `top computers`
that's about 60 top crunchers that had big RAC gone.
The python`s that ate rosetta
sounds like the title for a short story
or a logo for a T shirt


I'm buried so deep I can't find my system. I'm not even in the top 200....so i'm out in the wasteland somewhere
19) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104284)
Posted 6 days ago by Profile Greg_BE
Post:
And the "fun" continues.
6 plus hours for 25 mins cpu time. 4 hrs plus for completion.
CPU usage .41% or less
Just jettisoned 5 tasks.
20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 104277)
Posted 6 days ago by Profile Greg_BE
Post:
I can't believe the CEO doesn't care about 6 in 7 machines not doing his research. I'll be emailing the admin dude if I don't get a response soon.

I concluded some time ago that the great Dr. David Baker (who should get a Nobel Prize) turned over the management of Rosetta to others a long time ago.
And with the advent of AI, he can do more in-house anyway. As long as enough of the peripheral work is done by others, that is good enough for him.

He has more valuable uses for his time. And we have other projects if we want them.



EXACTLY

The minute he stopped writing blogs here about the research, I knew he had moved on.
When the grad students stopped writing here and the mod left I knew this was going to be a side project
Then with all the problems, DEK doesn't respond to anything and we have to figure it out on our own.
We have a spammer in the new section, they don't care.
We have problems, they don't care.
This is just like TACC, its a side project now.
Out of those 2 million tasks, just 5,000 are assigned here.

So why be surprised? Us old guys are used to this. We have seen the decline over the years.
If you got one machine that can run Python keep that running and put your other machines to work on something that does work. I would have given up long ago and moved on.


Next 20



©2022 University of Washington
https://www.bakerlab.org