Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 233 · 234 · 235 · 236 · 237 · 238 · 239 . . . 309 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107080 - Posted: 3 Oct 2022, 21:54:02 UTC - in response to Message 107078.  
Last modified: 3 Oct 2022, 21:57:13 UTC

because if you look at the images you will see that it is called python.
GPU grid runs python.
The specific task name is Python apps for GPU hosts.
I wish you would just stay with the conversation and not argue with me over names.
You said WCG and Folding, you didn't say GPUgrid, if you're going to give me incorrect information I can't help you.


It's all ok...we figured it out.
I was trying to make dinner and answer the two of you hitting me with questions at the same time and try to get images uploaded. All this after work. So excuse me if I was jumbling answers.

We know now what the issue is and I figured that was probably the case last night when I opened up the tab in task manager for BOINC. Was shocked at how many processes were running with GPU Python.

Anyway...I'll dump GPU Grid when it finishes and see how the system balances out after that.
ID: 107080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,526,036
RAC: 10,392
Message 107081 - Posted: 4 Oct 2022, 2:04:07 UTC - in response to Message 107079.  

WCG is back. But they keep coughing up those damn transient errors now and then. I would have thought they had that fixed by now the way they are hyping being back up and running.
No errors here. I'm on 2 million credit per day with zero failed tasks.

Tasks are running fine from WCG. What Greg's talking about is the downloading of tasks throwing up http transient errors by the bucketful.
I'm getting the same and it's driving me crackers.
They did solve it a week or two back, but it's returned with a vengeance in the last several days.
It's taking longer to get a successful download of all the task components than the OPN GPU tasks are taking to run here
ID: 107081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aurum

Send message
Joined: 12 Jul 17
Posts: 32
Credit: 38,158,977
RAC: 0
Message 107083 - Posted: 4 Oct 2022, 14:05:05 UTC - in response to Message 107080.  
Last modified: 4 Oct 2022, 14:07:10 UTC

Was shocked at how many processes were running with GPU Python.
Anyway...I'll dump GPU Grid when it finishes and see how the system balances out after that.
PythonGPU works good if run right. Do not try to run it on a CPU with less than 32 threads. I've tried 24 threads and it's very slow.
Run 2 PythonGPU WUs and nothing else is best. The 2 PythonGPU WUs play well together and can share those CPU threads. If you try to run a different project with a PythonGPU WU it'll have annoying quirks. Since most of the work is actually done on the CPU it can use less powerful GPUs, e.g. 1080, than the acemd4 WU needs. Here's the app_config I use on an i9-10980XE with a 3060 Ti:
<app_config>
<!-- i9-10980XE   18c36t   2x16=32 GB   L3 Cache 24.75 MB  3060 Ti -->
    <app>
        <name>PythonGPU</name>
        <plan_class>cuda1131</plan_class>
        <gpu_versions>
            <cpu_usage>32</cpu_usage>
            <gpu_usage>0.5</gpu_usage>
        </gpu_versions>
        <max_concurrent>2</max_concurrent>
        <fraction_done_exact/>
    </app>
</app_config>

ID: 107083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107084 - Posted: 4 Oct 2022, 14:15:49 UTC

I've given up on GPUgrid. Despite countless promises they refuse to produce tasks to run on AMD GPUs, which is what I have exclusively, due to them being faster for the cost than Nvidia, which severely cripples double precision, which is needed for a lot of projects, not just Milkyway. Some of the calculations for most projects are DP, a lot more than the 1/64 nonsense that Nvidia give us.
ID: 107084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107085 - Posted: 4 Oct 2022, 15:06:47 UTC

I've also given up on Folding@Home. Nice idea, because there's not much biology on Boinc. But if Folding aren't willing to join the rest of us on Boinc, I can no l0nger be bothered. It's too much effort to run a different program alongside Boinc, which has no clue what other projects are running on the processors and GPUs, so balancing loads is ridiculous.
ID: 107085 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107086 - Posted: 4 Oct 2022, 18:20:19 UTC - in response to Message 107085.  

I've also given up on Folding@Home. Nice idea, because there's not much biology on Boinc. But if Folding aren't willing to join the rest of us on Boinc, I can no l0nger be bothered. It's too much effort to run a different program alongside Boinc, which has no clue what other projects are running on the processors and GPUs, so balancing loads is ridiculous.



I have no problem with FAH on my system.
Seems to balance fine with BOINC
ID: 107086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107087 - Posted: 4 Oct 2022, 18:23:04 UTC

Is this guy done with 4.2 already? 0 tasks in queue!
FFS...this is nuts.
I guess 4.2 is small batches of stuff.
Where is all the rest of the stuff that Robetta has stocked up?
All for the in house systems?
ID: 107087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 107088 - Posted: 4 Oct 2022, 18:25:00 UTC - in response to Message 107086.  

Just divide 100 by number of cores and substract resulting number from 100 and put number in use at most x% of the cpus
Example 100/8=12.5
100-12.5=87.5
ID: 107088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107089 - Posted: 4 Oct 2022, 18:33:27 UTC - in response to Message 107086.  

I've also given up on Folding@Home. Nice idea, because there's not much biology on Boinc. But if Folding aren't willing to join the rest of us on Boinc, I can no l0nger be bothered. It's too much effort to run a different program alongside Boinc, which has no clue what other projects are running on the processors and GPUs, so balancing loads is ridiculous.


I have no problem with FAH on my system.
Seems to balance fine with BOINC
I don't see how. If I run two projects in Boinc, it chooses to run one or the other. If I run Folding aswell, Folding doesn't know when Boinc managed to get WCG tasks.
ID: 107089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107090 - Posted: 4 Oct 2022, 18:34:25 UTC - in response to Message 107088.  

Just divide 100 by number of cores and substract resulting number from 100 and put number in use at most x% of the cpus
Example 100/8=12.5
100-12.5=87.5
Until you have 24 cores and get recurring decimals.... I have to remember Boinc rounds down for CPUs and up for GPUs!
ID: 107090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107091 - Posted: 4 Oct 2022, 18:44:53 UTC - in response to Message 107089.  

I've also given up on Folding@Home. Nice idea, because there's not much biology on Boinc. But if Folding aren't willing to join the rest of us on Boinc, I can no l0nger be bothered. It's too much effort to run a different program alongside Boinc, which has no clue what other projects are running on the processors and GPUs, so balancing loads is ridiculous.


I have no problem with FAH on my system.
Seems to balance fine with BOINC
I don't see how. If I run two projects in Boinc, it chooses to run one or the other. If I run Folding aswell, Folding doesn't know when Boinc managed to get WCG tasks.



I don't know...maybe because I am not using CPU for FAH. I have so many CPU projects there is no room for anything else. So I put FAH in GPU only mode.

Since this project doesn't have anything going on, I am going back to GPU and see what that does to the other projects I run in CPU. If it cuts them in half then I will terminate GPU and find something else.

But this all takes time since I run only 14 hrs a day.
ID: 107091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107092 - Posted: 4 Oct 2022, 19:00:46 UTC - in response to Message 107091.  

But this all takes time since I run only 14 hrs a day.
Lightweight, I run 10 machines 24 hours a day :-)
ID: 107092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,526,036
RAC: 10,392
Message 107094 - Posted: 6 Oct 2022, 2:00:42 UTC

Ooh, new batch of tasks again? No idea how big yet
We could get used to this
ID: 107094 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
srettie
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 10 Jul 17
Posts: 9
Credit: 50,961
RAC: 0
Message 107096 - Posted: 6 Oct 2022, 4:27:40 UTC - in response to Message 107094.  

Ooh, new batch of tasks again? No idea how big yet
We could get used to this


I certainly hope so!

Looks like things are running okay so far. This is batch 2. ~5mil total for this experiment on the way.
ID: 107096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,781,025
RAC: 4,962
Message 107097 - Posted: 6 Oct 2022, 9:02:45 UTC - in response to Message 107096.  

Looks like things are running okay so far. This is batch 2. ~5mil total for this experiment on the way.


Great!!
ID: 107097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,526,036
RAC: 10,392
Message 107099 - Posted: 6 Oct 2022, 10:29:36 UTC - in response to Message 107096.  

Ooh, new batch of tasks again? No idea how big yet
We could get used to this

I certainly hope so!

Looks like things are running okay so far. This is batch 2. ~5mil total for this experiment on the way.

Because there was a slight gap between the previous batch ending and the new one being issued, my systems switched over to my backup project, so I haven't started running any yet.
But the previous batch all ran successfully - no reason to think this one will be any different.

Ideally, I don't want to run my backup project, so if there are any other researchers needing to have work done, send them our way and we'll get a flow going without interruptions.
Things work more reliably on both sides of the server divide that way
[hint][/hint]
ID: 107099 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 107101 - Posted: 6 Oct 2022, 23:52:35 UTC - in response to Message 107096.  

Ooh, new batch of tasks again? No idea how big yet
We could get used to this


I certainly hope so!

Looks like things are running okay so far. This is batch 2. ~5mil total for this experiment on the way.

Nice to have lots of R4.2 work without problematic pythons eating my ssd :-)
ID: 107101 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107102 - Posted: 6 Oct 2022, 23:55:35 UTC - in response to Message 107101.  
Last modified: 6 Oct 2022, 23:55:52 UTC

Nice to have lots of R4.2 work without problematic pythons eating my ssd :-)
Those things last forever despite what people say. Anyway, SSDs suck, I use NVME.
ID: 107102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,586,757
RAC: 973
Message 107103 - Posted: 7 Oct 2022, 0:29:45 UTC - in response to Message 107102.  

[quote]... SSDs suck, I use NVME.

You might benefit from a good google search on NVMe. It's not what you seem to think it is.
ID: 107103 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107106 - Posted: 7 Oct 2022, 12:04:15 UTC - in response to Message 107103.  

[quote]... SSDs suck, I use NVME.

You might benefit from a good google search on NVMe. It's not what you seem to think it is.
I know exactly what they are and have several, they're much faster. An SSD over SATA is far too slow, because of the SATA.

Are you from the Android forum?
ID: 107106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 233 · 234 · 235 · 236 · 237 · 238 · 239 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org