Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 244 · 245 · 246 · 247 · 248 · 249 · 250 . . . 280 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107467 - Posted: 17 Oct 2022, 4:56:07 UTC - in response to Message 107466.  

Some BOINC projects are now using a very high initial estimate for the run time, possibly in order to get an especially high priority for their tasks. GPUGRID is one of them. You need to watch how the estimated time to completion drops in order to see if that happened,
No, it's just Boinc, it doesn't have a clue how to add up, it's always doing stupid stuff. Like leaving everything to the very last minute and returning tasks late.
ID: 107467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107468 - Posted: 17 Oct 2022, 5:16:10 UTC - in response to Message 107466.  

Some BOINC projects are now using a very high initial estimate for the run time, possibly in order to get an especially high priority for their tasks. GPUGRID is one of them. You need to watch how the estimated time to completion drops in order to see if that happened,
If that was true, it would only work once. Boinc then adjusts using the duration correction factor. So, no point in a project doing so.
ID: 107468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,912,321
RAC: 2,674
Message 107469 - Posted: 17 Oct 2022, 14:47:31 UTC - in response to Message 107468.  

Some BOINC projects are now using a very high initial estimate for the run time, possibly in order to get an especially high priority for their tasks. GPUGRID is one of them. You need to watch how the estimated time to completion drops in order to see if that happened,
If that was true, it would only work once. Boinc then adjusts using the duration correction factor. So, no point in a project doing so.

What I'm seeing with GPUGRID work does not agree with you. The tasks start out with an expected runtime of hundreds of days (well past the deadline) and actually finish in less than two days, over and over.

One possibility I've thought of - the duration correction factor works for CPU tasks, but does not work for GPU tasks. I've only seen this odd behavior on GPU tasks, GPUGRID seldom offers any CPU tasks.
ID: 107469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107470 - Posted: 17 Oct 2022, 16:41:53 UTC - in response to Message 107469.  

Some BOINC projects are now using a very high initial estimate for the run time, possibly in order to get an especially high priority for their tasks. GPUGRID is one of them. You need to watch how the estimated time to completion drops in order to see if that happened,
If that was true, it would only work once. Boinc then adjusts using the duration correction factor. So, no point in a project doing so.

What I'm seeing with GPUGRID work does not agree with you. The tasks start out with an expected runtime of hundreds of days (well past the deadline) and actually finish in less than two days, over and over.

One possibility I've thought of - the duration correction factor works for CPU tasks, but does not work for GPU tasks. I've only seen this odd behavior on GPU tasks, GPUGRID seldom offers any CPU tasks.



I run GPU grid as well. I think what screws up BOINC time remaining is that although it is a GPU task it is running a ton of CPU processes and this backwards running is what BOINC does not understand.
I am 80.80% done with the current task in 2 days and 8 hours and 42 minutes, remaining time to completion is 11 days 11 hours and 16 minutes. Somehow that does make sense.
ID: 107470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107471 - Posted: 17 Oct 2022, 16:43:06 UTC

Cosmology, I used their fine print to get to the University's Astronomy Department and emailed them asking if they could find the right person for the BOINC side.
No response so far. Doubt there ever will be.
ID: 107471 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107472 - Posted: 17 Oct 2022, 23:02:23 UTC - in response to Message 107260.  

For those of you interested in other BOINC projects:

collatz has shut down, and expects a few months before it is ready to restart.

SETI@home has shut down, and plans to restart only when they get a new source of data.

cosmology@home appears to have a server problem - they've been unreachable for about a day.

RNA World stopped creating workunits over a year ago, but is still trying to get their last 20 or so workunits finished. The remaining workunits are expected to run for months each.



So whats left out there for science? Denis is down with a unexpected model issue or something along those lines. WCG is barely getting by and not all projects are online and seem to kick up transient http errors by the dozens. SiDock is the only really new project kicking butt.
I run a few other projects as well along with GPU Grid.
ID: 107472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107473 - Posted: 18 Oct 2022, 1:04:06 UTC - in response to Message 107469.  

What I'm seeing with GPUGRID work does not agree with you. The tasks start out with an expected runtime of hundreds of days (well past the deadline) and actually finish in less than two days, over and over.

One possibility I've thought of - the duration correction factor works for CPU tasks, but does not work for GPU tasks. I've only seen this odd behavior on GPU tasks, GPUGRID seldom offers any CPU tasks.
Complain to GPU Grid and suggest you might leave if they mess around. Stealing CPU time from other projects is theft.
ID: 107473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107474 - Posted: 18 Oct 2022, 1:06:09 UTC - in response to Message 107471.  

Cosmology, I used their fine print to get to the University's Astronomy Department and emailed them asking if they could find the right person for the BOINC side.
No response so far. Doubt there ever will be.
Well I assume there's an Astronomy department still alive and kicking, and they'll pass it on or tell you something's not in use any more. It seems odd to me though they'd leave a server switched on with nobody doing anything. However there must be, because twice recently I've seen it grind to a halt due to lack of disk space, then suddenly there was disk space, that can only mean human intervention.
ID: 107474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107475 - Posted: 18 Oct 2022, 1:10:56 UTC - in response to Message 107472.  
Last modified: 18 Oct 2022, 1:11:36 UTC

So whats left out there for science? Denis is down with a unexpected model issue or something along those lines. WCG is barely getting by and not all projects are online and seem to kick up transient http errors by the dozens. SiDock is the only really new project kicking butt.
I run a few other projects as well along with GPU Grid.
I run these:

Climate prediction: rare CPU tasks at the moment, but lots more coming with a new program they're writing to use multiple cores to allow more accurate research, and virtualbox so the linux stuff can be run on windows.

Cosmology

Denis

Einstein - new radio waves research by a PhD student

LHC - constant supply of single core and multicore VB work.

Milkyway

QuChemPedia - interesting chemistry research

Rosetta

Sidock

TN-Grid - genetics

Universe

WCG - loads of work now, and I've never had a single error you speak of
ID: 107475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,912,321
RAC: 2,674
Message 107476 - Posted: 18 Oct 2022, 5:07:38 UTC - in response to Message 107473.  

What I'm seeing with GPUGRID work does not agree with you. The tasks start out with an expected runtime of hundreds of days (well past the deadline) and actually finish in less than two days, over and over.

One possibility I've thought of - the duration correction factor works for CPU tasks, but does not work for GPU tasks. I've only seen this odd behavior on GPU tasks, GPUGRID seldom offers any CPU tasks.
Complain to GPU Grid and suggest you might leave if they mess around. Stealing CPU time from other projects is theft.

They aren't using much CPU time - only GPU time.

They aren't increasing the GPU time they get - only getting it sooner then otherwise.
ID: 107476 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107477 - Posted: 18 Oct 2022, 5:15:54 UTC - in response to Message 107476.  

They aren't using much CPU time - only GPU time.
Which is worse, and clearly what I meant.

They aren't increasing the GPU time they get - only getting it sooner then otherwise.
And disturbing your queue of work so things could get returned late.

Climate prediction do the opposite and refuse to change, they shoot themselves in the foot. They want the tasks back in 1 month, but they set the deadline at 1 year. So Boinc doesn't do them as urgently as it should and they get the tasks back late.
ID: 107477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107479 - Posted: 18 Oct 2022, 6:58:41 UTC - in response to Message 107475.  
Last modified: 18 Oct 2022, 7:10:29 UTC

Cosmology

Denis - he's not creating any work right now and has gone silent.
Tasks ready to send 0
Tasks in progress 0

Einstein - new radio waves research by a PhD student - already got that one

LHC - constant supply of single core and multicore VB work. - Running ATLAS

Milkyway - Got thhis

QuChemPedia - interesting chemistry research - Can't run this, does not work on my system. Might try again.

Rosetta - duh

Sidock - got this

TN-Grid - genetics _ interesting, will have a look ***Is this CPU or GPU or both?***

Universe - used to run that.

WCG - loads of work now, and I've never had a single error you speak of

Only getting OPN because I am trying to fill my GPU queue. CPU has enough
10/18/2022 8:50:47 AM | World Community Grid | Temporarily failed download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt: transient HTTP error
10/18/2022 8:50:47 AM | World Community Grid | Backing off 00:10:09 on download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt
(and repeat 15 times) - aborted. They didn't do a blinking thing over night which is normal when they are stalled out with transient. I think I abort more stalled downloads than I do work. Seems to be a GPU task thing. CPU works just fine. Almost thing its time to stuff WCG for a month and come back. Another group that is in over their heads and can't fix stuff.
ID: 107479 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107481 - Posted: 18 Oct 2022, 7:27:18 UTC - in response to Message 107479.  
Last modified: 18 Oct 2022, 7:27:47 UTC

QuChemPedia - interesting chemistry research - Can't run this, does not work on my system. Might try again.
I don't remember having problems, but they're out of work just now so I can't test, and I haven't run it for so long my tasks aren't in Boinctasks history or server history.

TN-Grid - genetics _ interesting, will have a look ***Is this CPU or GPU or both?***
CPU only.

WCG - loads of work now, and I've never had a single error you speak of

Only getting OPN because I am trying to fill my GPU queue. CPU has enough
10/18/2022 8:50:47 AM | World Community Grid | Temporarily failed download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt: transient HTTP error
10/18/2022 8:50:47 AM | World Community Grid | Backing off 00:10:09 on download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt
(and repeat 15 times) - aborted. They didn't do a blinking thing over night which is normal when they are stalled out with transient. I think I abort more stalled downloads than I do work. Seems to be a GPU task thing. CPU works just fine. Almost thing its time to stuff WCG for a month and come back. Another group that is in over their heads and can't fix stuff.
So your only problem is you have to retry downloads? You need to run "boinccmd --network_available" from a batch file and/or task scheduler. This will retry all downloads. You can set it to run as often as you like.
ID: 107481 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107484 - Posted: 18 Oct 2022, 10:32:54 UTC - in response to Message 107481.  
Last modified: 18 Oct 2022, 10:36:59 UTC

QuChemPedia - interesting chemistry research - Can't run this, does not work on my system. Might try again.
I don't remember having problems, but they're out of work just now so I can't test, and I haven't run it for so long my tasks aren't in Boinctasks history or server history.

TN-Grid - genetics _ interesting, will have a look ***Is this CPU or GPU or both?***
CPU only. - bah..there is enough CPU stuff out there and I have plenty of work for CPU. I'm looking for GPU now. If Denis comes back something has to go.

WCG - loads of work now, and I've never had a single error you speak of

Only getting OPN because I am trying to fill my GPU queue. CPU has enough
10/18/2022 8:50:47 AM | World Community Grid | Temporarily failed download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt: transient HTTP error
10/18/2022 8:50:47 AM | World Community Grid | Backing off 00:10:09 on download of ab73b5548cff860eccd4f3bd6afcc1b4.pdbqt
(and repeat 15 times) - aborted. They didn't do a blinking thing over night which is normal when they are stalled out with transient. I think I abort more stalled downloads than I do work. Seems to be a GPU task thing. CPU works just fine. Almost thing its time to stuff WCG for a month and come back. Another group that is in over their heads and can't fix stuff.
So your only problem is you have to retry downloads? You need to run "boinccmd --network_available" from a batch file and/or task scheduler. This will retry all downloads. You can set it to run as often as you like.


I tried looking that up..I don't really understand or know what to do to make downloads retry every 5 mins or something
ID: 107484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107485 - Posted: 18 Oct 2022, 10:39:50 UTC
Last modified: 18 Oct 2022, 10:40:49 UTC

TN is only using 29.51%. GPU grid is done processing. Now just a lucky WCG and a PrimeGrid are running.
So whats up with that?

Oh and since I was having trouble with WCG GPU, I opened it up to CPU, but still only get GPU if they succeed in downloading.
ID: 107485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 234
Credit: 336,960
RAC: 833
Message 107486 - Posted: 18 Oct 2022, 10:41:03 UTC - in response to Message 107484.  

cd /d e:\Program Files\BOINC

:loop

boinccmd --network_available

TIMEOUT /T 300 /nobreak
goto loop
ID: 107486 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107487 - Posted: 18 Oct 2022, 10:48:16 UTC - in response to Message 107486.  

cd /d e:\Program Files\BOINC

:loop

boinccmd --network_available

TIMEOUT /T 300 /nobreak
goto loop
Which I placed in a batch file and told Windows to run at startup and minimised.
ID: 107487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,807,931
RAC: 5,060
Message 107488 - Posted: 18 Oct 2022, 10:49:54 UTC - in response to Message 107485.  

TN is only using 29.51%.
Do you mean a TNGrid task only uses 29.51% of a CPU core?

Oh and since I was having trouble with WCG GPU, I opened it up to CPU, but still only get GPU if they succeed in downloading.
Their settings on the server are probably still fucked, they were last time I tried to change them, so you can't make a computer get different sorts of tasks than you usually do.
ID: 107488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107489 - Posted: 18 Oct 2022, 11:18:43 UTC - in response to Message 107487.  

cd /d e:\Program Files\BOINC

:loop

boinccmd --network_available

TIMEOUT /T 300 /nobreak
goto loop
Which I placed in a batch file and told Windows to run at startup and minimised.



what do you call this file?
ID: 107489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 514
Message 107490 - Posted: 18 Oct 2022, 11:20:02 UTC

So WCG is screwed up, Cosmology as well. What's going on?
I went back to MOO. I used to do them long ago, but looks like my account was deleted at some point in time.
ID: 107490 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 244 · 245 · 246 · 247 · 248 · 249 · 250 . . . 280 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org