Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 66 · 67 · 68 · 69 · 70 · 71 · 72 . . . 310 · Next

AuthorMessage
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98129 - Posted: 16 Jul 2020, 17:29:21 UTC - in response to Message 98126.  

cheapest per FLOP over the next few years.
Fairly thorough recent discussion on that topic in The most efficient cruncher rig possible
ID: 98129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98132 - Posted: 16 Jul 2020, 18:58:52 UTC - in response to Message 98129.  

cheapest per FLOP over the next few years.
Fairly thorough recent discussion on that topic in The most efficient cruncher rig possible


I use a spreadsheet and insert all the CPUs and GPUs available on Ebay. Well not all, just the 30 most common ones. You can insert cost of CPU, motherboard, RAM, electricity, etc.
ID: 98132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,560,787
RAC: 9,320
Message 98151 - Posted: 17 Jul 2020, 1:20:43 UTC
Last modified: 17 Jul 2020, 1:35:15 UTC

Not a problem or technical issue with the website, but a little information

After the task outage last month I guess people re-prioritised other projects, understandably.
Since tasks became available, hosts returning tasks, which had dropped from 750kday to 200kday and are now back up to 500kday.
The current number of tasks queued waiting to run is now up at 17 million - the highest number I can ever recall seeing. 6-7m was usual

So if you've switched some hosts away while we were out of work, now is exactly the time to bring them back.

Pass it on.

Now I'm back at work after lockdown (since a month ago) I've added back 2 PCs and I discovered Android Rosetta tasks are now back working on my phone (v7.4.53)

And, for what little it's worth, I've added ~2.2% to the overclock on my 2 main PCs over the last couple of days - every little helps
ID: 98151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98152 - Posted: 17 Jul 2020, 4:00:59 UTC - in response to Message 98132.  

cheapest per FLOP over the next few years.
Fairly thorough recent discussion on that topic in The most efficient cruncher rig possible


I use a spreadsheet and insert all the CPUs and GPUs available on Ebay. Well not all, just the 30 most common ones. You can insert cost of CPU, motherboard, RAM, electricity, etc.

Any discussion of how much faster memory helps? About all I've been able to find so far us that at least for Rosetta@home, it does help. If it makes a difference, my computer uses DDR4 memory.
ID: 98152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayalot72

Send message
Joined: 27 Mar 20
Posts: 2
Credit: 259,104
RAC: 486
Message 98156 - Posted: 17 Jul 2020, 12:32:03 UTC
Last modified: 17 Jul 2020, 12:34:46 UTC

Having some sort of issue where multiple instances of rosetta_4.20_windows_x86_64.exe are open seemingly independent of BOINC. Closing BOINC, suspending the project, etc. don't get rid of these processes. They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time. What are these?

ID: 98156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,790,281
RAC: 3,640
Message 98157 - Posted: 17 Jul 2020, 12:47:42 UTC - in response to Message 98152.  

Any discussion of how much faster memory helps? About all I've been able to find so far us that at least for Rosetta@home, it does help.

Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu).
ID: 98157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,619
RAC: 15
Message 98159 - Posted: 17 Jul 2020, 13:38:58 UTC

I seem to generate 50% errors on tasks. Only running one task at a time. No other projects have any appreciable errors.
139 (0x0000008B) Unknown error code and nothing else of much use in output except got signal 11.

<core_client_version>7.17.0</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip fp200714_fbfb_pair46_X_4_f_e0_239_X_0001_0001_rlx_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2057750
Using database: database_357d5d93529_n_methyl/minirosetta_database

</stder

None have been the scaffold task recently talked about in the thread. Anybody have any ideas?
ID: 98159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98160 - Posted: 17 Jul 2020, 14:50:46 UTC - in response to Message 98156.  
Last modified: 17 Jul 2020, 15:18:10 UTC

Having some sort of issue where multiple instances of rosetta_4.20_windows_x86_64.exe are open seemingly independent of BOINC. Closing BOINC, suspending the project, etc. don't get rid of these processes. They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time. What are these?

rosetta_4.20_windows_x86_64.exe is the program that actually does the work for the current Rosetta@home tasks. They NEED a lot of CPU to do their work. A separate copy is needed for each of the Rosetta@home tasks currently running on your computer.

As for the 30% CPU time setting, I've thought of two possibilities:

1. This program ignores that setting.

2. The way of timing that setting is inside the program, and does not release the CPU.

Does anyone else here know which, if either, of these possibilities is correct?
ID: 98160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98161 - Posted: 17 Jul 2020, 15:10:18 UTC - in response to Message 98159.  
Last modified: 17 Jul 2020, 15:15:10 UTC

I seem to generate 50% errors on tasks. Only running one task at a time. No other projects have any appreciable errors.
139 (0x0000008B) Unknown error code and nothing else of much use in output except got signal 11.

<core_client_version>7.17.0</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>

[snip]

None have been the scaffold task recently talked about in the thread. Anybody have any ideas?

Signal 11 under Linux is also known as segmentation fault.

It means that the program tried to access some memory location that the program should not have had access to.

This is often because the program wrote something other than a memory address to a location that was supposed to contain the memory address of something it should have been able to reach.

Another cause is that the program tried to use something as the address of something it could reach, without ever setting that supposed address to anything, and therefore using whatever was there before the program started as a memory address.

This is not something you can fix - it needs to be done by someone with more knowledge of the internals of the program.

Do we have a moderator here who can tell a developer to try that workunit with more debugging enabled, in order to see more about what went wrong?

You might be able to help the developer by posting a pointer to at least one of these failed tasks.
ID: 98161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,619
RAC: 15
Message 98162 - Posted: 17 Jul 2020, 15:35:47 UTC
Last modified: 17 Jul 2020, 15:47:56 UTC

Well, the first set of errors I assume were caused by changing my runtime preferences for tasks midstream while they were running. Seems you shouldn't change from 4 hours to 8 hours and back to 4 hours.

But all the recent errors were on tasks downloaded with the 4 hour runtime from the start.

https://boinc.bakerlab.org/rosetta/result.php?resultid=1222334855
https://boinc.bakerlab.org/rosetta/result.php?resultid=1222430421
https://boinc.bakerlab.org/rosetta/result.php?resultid=1222469090
https://boinc.bakerlab.org/rosetta/result.php?resultid=1222561787
https://boinc.bakerlab.org/rosetta/result.php?resultid=1222594710

This task has an invalid pointer error. https://boinc.bakerlab.org/rosetta/result.php?resultid=1221693374
000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
Same with this task. https://boinc.bakerlab.org/rosetta/result.php?resultid=1222039913

[Edit] Doing some Googling, I come up with the program tried to get some memory allocated but did not do it properly.
ID: 98162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayalot72

Send message
Joined: 27 Mar 20
Posts: 2
Credit: 259,104
RAC: 486
Message 98163 - Posted: 17 Jul 2020, 15:40:36 UTC - in response to Message 98160.  

It'a w/e, I've fixed it. Suspended tasks on BOINC, killed every Rosetta task I could find, resumed Rosetta.

I've had BOINC for quite a while now, it's not going to just suddenly ignore my CPU settings. It looks more to me like BOINC had some rogue tasks for unknown reasons.
ID: 98163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98169 - Posted: 17 Jul 2020, 20:12:41 UTC - in response to Message 98151.  
Last modified: 17 Jul 2020, 20:13:26 UTC

After the task outage last month I guess people re-prioritised other projects, understandably.
Since tasks became available, hosts returning tasks, which had dropped from 750kday to 200kday and are now back up to 500kday.
The current number of tasks queued waiting to run is now up at 17 million - the highest number I can ever recall seeing. 6-7m was usual
So if you've switched some hosts away while we were out of work, now is exactly the time to bring them back.


I really don't understand why people do that. I have all my computers set to run at least two projects. If one goes wrong or runs out of work, it will run entirely the other one with no intervention from myself. When it's fixed, it'll go back to doing it at the proportion I've set (and in fact tries to make up lost ground by doing more of the one that was broken for a while. You could even have Rosetta at weight 1,000,000 and another project at 1.

Having some sort of issue where multiple instances of rosetta_4.20_windows_x86_64.exe are open seemingly independent of BOINC. Closing BOINC, suspending the project, etc. don't get rid of these processes. They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time. What are these?



30% of cores works better. Then you run less tasks at full speed instead of lots of tasks slower.
ID: 98169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98172 - Posted: 17 Jul 2020, 20:26:28 UTC - in response to Message 98169.  

30% of cores works better. Then you run less tasks at full speed instead of lots of tasks slower.

With hyper threading, each full core is divided into two "virtual" cores, each with its own instruction stream, referred to as a "thread".
That allows the hardware to be used more efficiently, so that it is idle less of the time, but each of the two threads runs more slowly than if only one were used per core.

Typically, you get about 30% greater output using 100% of the cores than when using only 50%, even though each work unit runs faster in the latter case.
ID: 98172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98174 - Posted: 17 Jul 2020, 20:47:17 UTC - in response to Message 98172.  

With hyper threading, each full core is divided into two "virtual" cores, each with its own instruction stream, referred to as a "thread".
That allows the hardware to be used more efficiently, so that it is idle less of the time, but each of the two threads runs more slowly than if only one were used per core.

Typically, you get about 30% greater output using 100% of the cores than when using only 50%, even though each work unit runs faster in the latter case.


True, but I don't want to lose that 30%.

And Boinc calls threads cores. Maybe it can't detect which they are?
ID: 98174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98176 - Posted: 17 Jul 2020, 20:56:17 UTC - in response to Message 98174.  
Last modified: 17 Jul 2020, 21:12:55 UTC

True, but I don't want to lose that 30%.

And Boinc calls threads cores. Maybe it can't detect which they are?

You ARE losing by using less than the full number. I reserve cores to support a GPU, or to support other desktop use, but for dedicated machines I allow the maximum number possible.
BOINC detects virtual cores, since they appear the same as a real core to the operating system. So your i5-8600K shows up as 12 cores on BOINC, even though it has only 6 real cores.

EDIT: It is sometimes useful to limit the number of cores in order to limit memory use. For example, each Rosetta work unit should be allocated at least 1 GB (preferably more). But if you have only 8 GB of memory in a 12-core machine, then by limiting BOINC to use only 50% of the cores, you only need 6 GB for Rosetta. There are sometimes other reasons, but for maximum output, you use as many cores as possible.
ID: 98176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98177 - Posted: 17 Jul 2020, 21:14:19 UTC - in response to Message 98174.  
Last modified: 17 Jul 2020, 21:26:25 UTC

[snip]

True, but I don't want to lose that 30%.

And Boinc calls threads cores. Maybe it can't detect which they are?

Threads has two meaning for programs. One is for the use of virtual cores. The other is for setting up a list of things to be done that will not interfere with any other member of the list, so that if any member of the list is currently running but encounters a reason why it must wait, any other member of the list can take over the CPU core so that the program continues to make progress during that wait.

This does not require using multiple CPU cores, but it is still possible for more than one CPU core to each be working on one member of the list at the same time. BOINC applications that do this and allow the program to use more than one CPU core at once are unpopular and therefore seldom used.

BOINC tends to use the second meaning instead.

Hyperthreading means that each physical CPU core has two sets of registers, and can therefore do a very quick switch from working for one program to working for another if the first program needs to wait for accessing main memory, which is slower than the CPU for almost all computers these days. This make each physical core act almost like two cores, except for some timing issues. These two core are called virtual cores.
ID: 98177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98178 - Posted: 17 Jul 2020, 21:16:43 UTC - in response to Message 98176.  
Last modified: 17 Jul 2020, 21:23:30 UTC

You ARE losing by using less than the full number.


I know, which is why I don't do that. I treat a thread as I would a core. The jiggery pokery inside the CPU is none of my business :-)

I reserve cores to support a GPU


I stopped doing that, as it doesn't make much difference. GPU threads are given higher priority on the CPU, so they always seem to get what they need.

What does help though (especially if your GPU is better than your CPU) is running more than one task on the GPU at once, then it can take two cores to help it out instead of one, and can also work on the task which doesn't need CPU at that point while the other one is stalled from the GPU's point of view.

or to support other desktop use


I pause Boinc completely for games, and the GPU for watching video. Done automatically by exclusive applications. Otherwise, full power. Well er.... except when Tthrottle slows it down for overheating or because I don't like over 50% fan noise in the lounge.

BOINC detects virtual cores, since they appear the same as a real core to the operating system. So your i5-8600K shows up as 12 cores on BOINC, even though it has only 6 real cores.


Pah, my Xeons have 24 :-)

Anyway you're wrong on 2 counts :-P
My i5 has 6 cores and no HT, so 6 threads too.
And Windows can tell if they're threads or cores. On my Xeons, it lists in the task manager "cores 12, logical processors 24".

It is sometimes useful to limit the number of cores in order to limit memory use. For example, each Rosetta work unit should be allocated at least 1 GB (preferably more). But if you have only 8 GB of memory in a 12-core machine, then by limiting BOINC to use only 50% of the cores, you only need 6 GB for Rosetta. There are sometimes other reasons, but for maximum output, you use as many cores as possible.


I just let Boinc handle it. I tell Boinc to use only 80% RAM. If there isn't enough, there isn't enough, and some tasks sit waiting. If I see that happening regularly, I consider buying more RAM.
ID: 98178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98179 - Posted: 17 Jul 2020, 21:26:16 UTC - in response to Message 98177.  
Last modified: 17 Jul 2020, 21:26:34 UTC

BOINC applications that use the first meaning and allow the program to use more than one CPU core at once are unpopular and therefore seldom used.


Why on earth would they be unpopular? I prefer it, it means I have less tasks running at once, but am still using the whole processor. LHC (Atlas) and Milkyway (Nbody) do it.
ID: 98179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 98181 - Posted: 17 Jul 2020, 21:35:16 UTC - in response to Message 98179.  

BOINC applications that use the first meaning and allow the program to use more than one CPU core at once are unpopular and therefore seldom used.

Why on earth would they be unpopular? I prefer it, it means I have less tasks running at once, but am still using the whole processor. LHC (Atlas) and Milkyway (Nbody) do it.

Unclear , but all the BOINC project my computer participates in (over a dozen) don't use it. One is them is Milkyway, but I don't recall ever getting an Nbody task from them.
ID: 98181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 98183 - Posted: 17 Jul 2020, 21:52:17 UTC - in response to Message 98181.  

Unclear, but all the BOINC project my computer participates in (over a dozen) don't use it. One is them is Milkyway, but I don't recall ever getting an Nbody task from them.


Nbody runs only on CPU, if you use CPU for Milkyway, half your tasks should be Nbody. If you only use GPU, you will only get Seperation.

I thought the lack of multi-threaded projects was the difficulty coding it (or impossibility if everything depends on the result of the last calculation).
ID: 98183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 66 · 67 · 68 · 69 · 70 · 71 · 72 . . . 310 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org