Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 160 · 161 · 162 · 163 · 164 · 165 · 166 . . . 306 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104319 - Posted: 19 Jan 2022, 17:29:11 UTC - in response to Message 104316.  
Last modified: 19 Jan 2022, 17:30:11 UTC

Rosetta 4.20 here again. I am running 5 of them and 1 rosetta python.
Tullio
They must come in regular small bursts, because there's always a fair amount running according to server status. I've only got 5 pythons on the only computer that will run them. For some reason it refuses to run 6 (it has 6 cores), even if there's loads of RAM left.
ID: 104319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tullio

Send message
Joined: 10 May 20
Posts: 63
Credit: 630,125
RAC: 0
Message 104320 - Posted: 19 Jan 2022, 17:45:11 UTC

I can run 2 rosetta pythons at most on my 12 GB RAM. If thre is a third it will be waiting for memory. My Intel i5 9400F has 3 cores that is six processors.
Tullio
ID: 104320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104322 - Posted: 19 Jan 2022, 18:03:26 UTC - in response to Message 104320.  

I can run 2 rosetta pythons at most on my 12 GB RAM. If thre is a third it will be waiting for memory. My Intel i5 9400F has 3 cores that is six processors.
Tullio
You could be right, they probably ask for more RAM than they actually use, just in case. I forgot that machine with 6 cores only had 16GB. I stole some of it to put in my new Ryzen. Must have 64GB on my gaming machine! It's currently running 5 pythons using 11.5/16GB. But it doesn't say "waiting for memory" like I've seen before. It just doesn't start them. And I'm sure I've seen it under half utilizing the memory and not starting one.
ID: 104322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104324 - Posted: 19 Jan 2022, 19:55:26 UTC

This is weird https://boinc.bakerlab.org/rosetta/result.php?resultid=1464088106
1.5 days processing for 20 minutes or so cpu time.

So heres the breakdown

022-01-18 10:33:11 (15556): Status Report: Elapsed Time: '15314.521130'
2022-01-18 10:33:11 (15556): Status Report: CPU Time: '29.109375'


2022-01-18 00:17:37 (15156): Creating new snapshot for VM.
2022-01-18 00:17:42 (15156): Deleting stale snapshot.
2022-01-18 00:17:43 (15156): Checkpoint completed.
2022-01-18 00:21:45 (15156): VM state change detected. (old = 'running', new = 'paused')
2022-01-18 00:22:01 (15156): Powering off VM.
2022-01-18 00:22:01 (15156): Successfully stopped VM

(end of my day so I shut down via suspend, shut down client (leave in memory), exit BOINC

Now I restart:
2022-01-18 08:12:38 (15556): VM state change detected. (old = 'poweredoff', new = 'running')
2022-01-18 08:12:38 (15556): Status Report: Elapsed Time: '9314.493395'
2022-01-18 08:12:38 (15556): Status Report: CPU Time: '18.328125'
2022-01-18 08:12:38 (15556): Preference change detected
2022-01-18 08:12:38 (15556): Setting CPU throttle for VM. (100%)
2022-01-18 08:12:38 (15556): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 600 seconds))
2022-01-18 08:32:02 (15556): Creating new snapshot for VM.
2022-01-18 08:32:12 (15556): Deleting stale snapshot.

Then this point
022-01-18 10:33:11 (15556): Status Report: Elapsed Time: '15314.521130'
2022-01-18 10:33:11 (15556): Status Report: CPU Time: '29.109375'

here is 6 hrs
022-01-18 12:31:57 (15556): Status Report: Elapsed Time: '21314.549383'
2022-01-18 12:31:57 (15556): Status Report: CPU Time: '39.125000'
2022-01-18 12:37:42 (15556): Creating new snapshot for VM.
2022-01-18 12:37:43 (15556): Deleting stale snapshot.

2022-01-18 14:28:55 (15556): Status Report: Elapsed Time: '27314.711735'
2022-01-18 14:28:55 (15556): Status Report: CPU Time: '49.218750'

2022-01-18 16:09:27 (15556): Status Report: Elapsed Time: '33315.182032'
2022-01-18 16:09:27 (15556): Status Report: CPU Time: '59.093750'

2022-01-18 18:11:20 (15556): Status Report: Elapsed Time: '39315.521685'
2022-01-18 18:11:20 (15556): Status Report: CPU Time: '68.562500'

Something went nuts, but does not show up in the report:

2022-01-18 19:27:47 (15556): Checkpoint completed.
2022-01-18 19:33:12 (11508): Detected: vboxwrapper 26202
2022-01-18 19:33:12 (11508): Detected: BOINC client v7.16.20
2022-01-18 19:33:13 (11508): Detected: VirtualBox VboxManage Interface (Version: 6.1.30)
2022-01-18 19:33:13 (11508): Feature: Checkpoint interval offset (88 seconds)
2022-01-18 19:33:13 (11508): Detected: Minimum checkpoint interval (600.000000 seconds)
2022-01-18 19:33:13 (11508): Restore from previously saved snapshot.
2022-01-18 19:33:14 (11508): Restore completed.


2022-01-18 19:33:19 (11508): Status Report: Elapsed Time: '43879.012785'
2022-01-18 19:33:19 (11508): Status Report: CPU Time: '75.46875


2022-01-18 21:13:48 (11508): Status Report: Elapsed Time: '49879.776962'
2022-01-18 21:13:48 (11508): Status Report: CPU Time: '86.453125'

2022-01-18 22:59:05 (11508): Status Report: Elapsed Time: '55880.065147'
2022-01-18 22:59:05 (11508): Status Report: CPU Time: '96.125000'

2022-01-19 00:02:14 (11508): VM state change detected. (old = 'running', new = 'paused')
2022-01-19 00:02:44 (11508): Powering off VM.
2022-01-19 00:02:44 (11508): Successfully stopped VM.

*End of day 1*


Start day 2

2022-01-19 07:58:26 (16032): VM state change detected. (old = 'poweredoff', new = 'running')
2022-01-19 07:58:26 (16032): Status Report: Elapsed Time: '58981.617149'
2022-01-19 07:58:26 (16032): Status Report: CPU Time: '100.656250'

022-01-19 10:00:34 (16032): Status Report: Elapsed Time: '64981.857656'
2022-01-19 10:00:34 (16032): Status Report: CPU Time: '112.250000'

022-01-19 11:46:01 (16032): Status Report: Elapsed Time: '70982.433000'
2022-01-19 11:46:01 (16032): Status Report: CPU Time: '122.140625'

022-01-19 13:26:46 (16032): Status Report: Elapsed Time: '76982.663074'
2022-01-19 13:26:46 (16032): Status Report: CPU Time: '132.531250'

2022-01-19 15:11:43 (16032): Status Report: Elapsed Time: '82982.833196'
2022-01-19 15:11:43 (16032): Status Report: CPU Time: '142.390625'

2022-01-19 17:17:08 (16032): Status Report: Elapsed Time: '88982.986887'
2022-01-19 17:17:08 (16032): Status Report: CPU Time: '152.312500'

2022-01-19 19:05:11 (16032): Status Report: Elapsed Time: '94983.557718'
2022-01-19 19:05:11 (16032): Status Report: CPU Time: '161.968750'

This is where I take the time to look and see how things are going and say WTF! 2 days! Come on! ABORT
ID: 104324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104325 - Posted: 19 Jan 2022, 20:39:54 UTC - in response to Message 104324.  

Welcome to the club. ALL tasks for 6 of my machines do that. 1 in 50 tasks for my "good" machine do that. Whatever the bug is, it can be visible sometimes on some hardware and always on other hardware. I think we can't see enough information unless we're inside the VM. Is that possible?

And wow, you were in bed for under 8 hours.
ID: 104325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104327 - Posted: 19 Jan 2022, 23:08:23 UTC - in response to Message 104325.  

And wow, you were in bed for under 8 hours.


yeah and I am paying for that.

I didn't think of looking in the VM for info.
If I get stuck next time, I'll have a look.
ID: 104327 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 104328 - Posted: 20 Jan 2022, 1:30:30 UTC

They make snapshots even if the app is stuck in a loop going nowhere,
I have seen 30+ snapshots with only 5 minits of cpu time wasters.

By the way what happened to 700,000 workunits vanished from the front page que?
its down to only 1.8 million
are they trying to find the buggy one`s
ID: 104328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2132
Credit: 41,484,592
RAC: 17,238
Message 104329 - Posted: 20 Jan 2022, 3:11:44 UTC - in response to Message 104291.  

Just checking in because I had a fair few Rosetta 4.20 tasks come down.
But I think they already ran out...

I'm useful like that
YOU!! You stole them! I wanted those. I'm going to hunt you down, and I mean physically!

I actually did. Full buffer on both machines I have near me before mentioning it.
No need to thank me.
I'll make tea - do you take sugar?
ID: 104329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104335 - Posted: 20 Jan 2022, 17:18:23 UTC - in response to Message 104329.  

Just checking in because I had a fair few Rosetta 4.20 tasks come down.
But I think they already ran out...

I'm useful like that
YOU!! You stole them! I wanted those. I'm going to hunt you down, and I mean physically!

I actually did. Full buffer on both machines I have near me before mentioning it.
No need to thank me.
I'll make tea - do you take sugar?
I don't like hot drinks. Orange juice or vodka please, or both.
ID: 104335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104336 - Posted: 20 Jan 2022, 17:18:52 UTC - in response to Message 104328.  

They make snapshots even if the app is stuck in a loop going nowhere,
I have seen 30+ snapshots with only 5 minits of cpu time wasters.

By the way what happened to 700,000 workunits vanished from the front page que?
its down to only 1.8 million
are they trying to find the buggy one`s
Interesting, now up to 2.2 million. I'll try grabbing some and see what happens.
ID: 104336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104339 - Posted: 20 Jan 2022, 17:46:00 UTC - in response to Message 104336.  

Well that didn't work, I tried 3 machines. Two of them failed pythons (no CPU time) and the other took four 4.20 tasks and got a computation error!
ID: 104339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104341 - Posted: 20 Jan 2022, 19:05:46 UTC - in response to Message 104336.  
Last modified: 20 Jan 2022, 19:18:05 UTC

They make snapshots even if the app is stuck in a loop going nowhere,
I have seen 30+ snapshots with only 5 minits of cpu time wasters.

By the way what happened to 700,000 workunits vanished from the front page que?
its down to only 1.8 million
are they trying to find the buggy one`s
Interesting, now up to 2.2 million. I'll try grabbing some and see what happens.



Ignore that big fancy number on the front page.
That is what they have in queue for both the AI and RAH of which 99% are AI tasks.

Get to the next layer deep where it breaks down 4.2 and python.

This is the real number for us lowly PC crunchers:
Application Unsent In progress Runtime of last 100 tasks in hours: average, min, max Users in last 24 hours
Rosetta 0 61887 6.62 (0.28 - 51.23) 2600
rosetta python projects 4999 13547 4.59 (0.71 - 57.86) 1059
ID: 104341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104342 - Posted: 20 Jan 2022, 19:06:28 UTC

Check this out from a 4.2 task today

<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_01_17_185861_181891_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 1 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_01_17_185861_181891_ab_t000__robetta.zip -frag3 rb_01_17_185861_181891_ab_t000__robetta.200.3mers.index.gz -fragA rb_01_17_185861_181891_ab_t000__robetta.200.9mers.index.gz -fragB rb_01_17_185861_181891_ab_t000__robetta.200.5mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1484534
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
------------------------ Begin developer's backtrace -------------------------
BACKTRACE:
------------------------- End developer's backtrace --------------------------


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>

Gees...really?!?!?
ID: 104342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104343 - Posted: 20 Jan 2022, 19:33:14 UTC - in response to Message 104341.  
Last modified: 20 Jan 2022, 19:33:48 UTC

Ignore that big fancy number on the front page.
That is what they have in queue for both the AI and RAH of which 99% are AI tasks.
But the point is that was spotted dropping suddenly, so they must have removed some, presumably due to problems. I wish other projects had that number. All we get to see is the little front end buffer on most projects.
ID: 104343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104344 - Posted: 20 Jan 2022, 19:35:13 UTC - in response to Message 104342.  

Check this out from a 4.2 task today

File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)

Gees...really?!?!?
Looks rather like something tried to use the 4th dimension. Does your processor not support that function?
ID: 104344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104345 - Posted: 20 Jan 2022, 19:48:38 UTC - in response to Message 104344.  

Check this out from a 4.2 task today

File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)

Gees...really?!?!?
Looks rather like something tried to use the 4th dimension. Does your processor not support that function?

I have no graphics on my CPU.
Besides, the program is supposed to take care of any graphics or whatever.
ID: 104345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104346 - Posted: 20 Jan 2022, 19:51:40 UTC - in response to Message 104343.  

Ignore that big fancy number on the front page.
That is what they have in queue for both the AI and RAH of which 99% are AI tasks.
But the point is that was spotted dropping suddenly, so they must have removed some, presumably due to problems. I wish other projects had that number. All we get to see is the little front end buffer on most projects.



But again, that number is mute to this aspect of the project.
It has no bearing on what we do.

Just watch the numbers I quoted. That is all you need to be concerned about. Because that is the work WE get, not the machine.
It just looks cool to say..oh we have 2 million tasks queued up, but when you dig deeper on Robetta, then you see, AI, AI, AI,AI.....Rosetta,AI,AI,AI,AI maybe a Rosetta.
ID: 104346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104347 - Posted: 20 Jan 2022, 19:55:34 UTC - in response to Message 104345.  

I have no graphics on my CPU.
Besides, the program is supposed to take care of any graphics or whatever.
I was trying to make a joke. It said the angle wasn't within the normal 360 degrees.
ID: 104347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,097,379
RAC: 17,254
Message 104348 - Posted: 20 Jan 2022, 19:57:44 UTC - in response to Message 104346.  

Ignore that big fancy number on the front page.
That is what they have in queue for both the AI and RAH of which 99% are AI tasks.
But the point is that was spotted dropping suddenly, so they must have removed some, presumably due to problems. I wish other projects had that number. All we get to see is the little front end buffer on most projects.



But again, that number is mute to this aspect of the project.
Actually the word is moot, not that I ever use that strange sounding word. I also have an aversion to the word dupe.

It has no bearing on what we do.

Just watch the numbers I quoted. That is all you need to be concerned about. Because that is the work WE get, not the machine.
It just looks cool to say..oh we have 2 million tasks queued up, but when you dig deeper on Robetta, then you see, AI, AI, AI,AI.....Rosetta,AI,AI,AI,AI maybe a Rosetta.
Admittedly it doesn't tell you which of the two apps it is, but it's more meaningful than the tiny number, which is just the first bunch in their RAM buffer or whatever. The main number is what we always used to look at to see how much work was left. You could see there was a month's supply etc.
ID: 104348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104351 - Posted: 20 Jan 2022, 23:33:40 UTC

Ah ha! Well sorry for the misspelling. Quick typing and no thinking after work.
4th dimension does exist...but anyway...again after work..not thinking. Missed the pun.

Now...if you go here: https://robetta.bakerlab.org/queue.php?id=&target=&username=&seq=&page=2 and look at the active tasks at random, you will see that the majority are queued for RoseTTAFold which is the AI.

You can read about it here:
https://www.ipd.uw.edu/2021/07/rosettafold-accurate-protein-structure-prediction-accessible-to-all/

Now bed...
ID: 104351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 160 · 161 · 162 · 163 · 164 · 165 · 166 . . . 306 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org