Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 211 · 212 · 213 · 214 · 215 · 216 · 217 . . . 309 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 106235 - Posted: 17 May 2022, 8:32:36 UTC - in response to Message 106227.  

The important lines were:

2022-05-16 17:21:17 (2912): Status Report: Elapsed Time: '6000.901087'
2022-05-16 17:21:17 (2912): Status Report: CPU Time: '13.968750'

Basically, the emulated operating system encountered an error and didn't report what it was. It then started waiting for another command, but the task didn't have any, so it did almost nothing for a few hours.
Have the programmers not heard of something called "timeout"?
ID: 106235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 106236 - Posted: 17 May 2022, 8:34:32 UTC - in response to Message 106229.  
Last modified: 17 May 2022, 8:35:32 UTC

OLDER? Really? Ryzen 3700x here, barely 2 years old in my system.
ROTFPMSL at your computer being insulted.

Project doesn't give a S--- about failures. As long as they get the data somehow from someone and if its just one task somewhere that dies...oh well.
There's a lot less pythons in the queue than there was. Either we've crunched them way faster than I thought we would, or they've been deleting some, or many have failed. Perhaps the next batch will have improvements.
ID: 106236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 106238 - Posted: 17 May 2022, 19:07:48 UTC - in response to Message 106236.  

OLDER? Really? Ryzen 3700x here, barely 2 years old in my system.
ROTFPMSL at your computer being insulted. <-- yeah, don't you know microchips have feelings?

Project doesn't give a S--- about failures. As long as they get the data somehow from someone and if its just one task somewhere that dies...oh well.
There's a lot less pythons in the queue than there was. Either we've crunched them way faster than I thought we would, or they've been deleting some, or many have failed. Perhaps the next batch will have improvements.
<-- what is the source of these pythons tasks and has anybody ever seen the output from them?
ID: 106238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 106239 - Posted: 18 May 2022, 13:49:58 UTC
Last modified: 18 May 2022, 13:50:56 UTC

The machine I'm trying to run python on keeps getting banned, despite completing most of them successfully. Is there ever an end to problems here?

And yes I know they have feelings, that's why I buy "broken" GPUs on Ebay and try to get them to do something. Actually that's probably cruel as the poor things thought they'd retired.
ID: 106239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 106240 - Posted: 18 May 2022, 18:10:27 UTC - in response to Message 106239.  

The machine I'm trying to run python on keeps getting banned, despite completing most of them successfully. Is there ever an end to problems here?

And yes I know they have feelings, that's why I buy "broken" GPUs on Ebay and try to get them to do something. Actually that's probably cruel as the poor things thought they'd retired.



Probably your GPU's are complaining loudly and the RAH server is taking sympathy.
Maybe you insulted it to many times?
Or your just lucky 13.
ID: 106240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 106241 - Posted: 19 May 2022, 5:43:05 UTC

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.
ID: 106241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 106244 - Posted: 19 May 2022, 14:14:53 UTC - in response to Message 106241.  

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.

Note that it would then take forever to determine if they actually last forever or not.
ID: 106244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 106245 - Posted: 19 May 2022, 14:22:03 UTC - in response to Message 106244.  

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.

Note that it would then take forever to determine if they actually last forever or not.
Forever is not a fixed time. It could be 5 times longer than normal for example.
ID: 106245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 106247 - Posted: 19 May 2022, 18:44:55 UTC - in response to Message 106245.  
Last modified: 19 May 2022, 18:47:07 UTC

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.

Note that it would then take forever to determine if they actually last forever or not.
Forever is not a fixed time. It could be 5 times longer than normal for example.



You know those McDonald's trackers for your table? (well here in EU we have them)
I looked at the underside last night, made in Thailand assembled in China.
And since most stuff these days is or was made in China, its cheap and throw away.
The GPU mfg's would not be in business if their cards lasted forever.
Cars used to last forever, but the "forever" went away in the 80s I think when we switched over to make things as cheap as possible and charge regular price to make more profit and get the consumer to buy more.
ID: 106247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 106248 - Posted: 20 May 2022, 5:56:26 UTC - in response to Message 106247.  

You know those McDonald's trackers for your table? (well here in EU we have them)
I looked at the underside last night, made in Thailand assembled in China.
And since most stuff these days is or was made in China, its cheap and throw away.
The GPU mfg's would not be in business if their cards lasted forever.
Cars used to last forever, but the "forever" went away in the 80s I think when we switched over to make things as cheap as possible and charge regular price to make more profit and get the consumer to buy more.
Damn, closed browser after checking preview thinking I'd posted it, so the following is shorter as I'm lazy.

What's a McD tracker? I'm the UK but rarely go there. Cars last me 20 years now, used to be 10. GPUs get replaced for the latest game. If the old one keeps value, those gamers have more money to buy the new one. If something breaks you don't buy from the same make again and write a nasty review, so making shit quality stuff harms your company.
ID: 106248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
raymond

Send message
Joined: 27 Apr 20
Posts: 1
Credit: 418,877
RAC: 0
Message 106249 - Posted: 20 May 2022, 22:57:35 UTC

Why am I getting a notice "Waiting to contact project servers"?
ID: 106249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,402,382
RAC: 19,540
Message 106250 - Posted: 21 May 2022, 0:54:11 UTC - in response to Message 106249.  

Why am I getting a notice "Waiting to contact project servers"?
No idea.
In the Advanced view of BOINC Manager, select the Projects tab, select Rosetta & click on Update.
Then check in Tools, Event log & see what messages are there.
Grant
Darwin NT
ID: 106250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 106262 - Posted: 24 May 2022, 22:33:14 UTC

I've got one weird Python task that's been running now for 26hrs, but it is using the CPU 25.5hrs and has checkpointed regularly - most recently 8 minutes ago.
I've got no idea why it won't end itself.
Does the watchdog no longer work?
CPU time 1d 02:32:21
CPU time since checkpoint 00:08:12
Elapsed time 1d 01:32:50
Estimated time remaining 01:05:35
Fraction done 95.897%
Virtual memory size 98.97 MB
Working set size 2.79 GB

I'm going to abort it now and see what it reports
It should show here aagb-HPR_pp-NMPHE-GPN_pp-BPRO_pp_6_2605012_6_1
ID: 106262 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 106263 - Posted: 24 May 2022, 22:36:22 UTC - in response to Message 106262.  

does .out file in c:programdataboincslots[slot number here]shared change?
ID: 106263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 106264 - Posted: 24 May 2022, 22:59:39 UTC - in response to Message 106262.  

CPU time 1d 02:32:21
CPU time since checkpoint 00:08:12
Elapsed time 1d 01:32:50
Estimated time remaining 01:05:35
Fraction done 95.897%
Virtual memory size 98.97 MB
Working set size 2.79 GB

I'm going to abort it now and see what it reports
It should show here aagb-HPR_pp-NMPHE-GPN_pp-BPRO_pp_6_2605012_6_1

Apologies, it's this task, not the one shown above
aagb-PHE_pp-mPIP-GGLY-mB3LEU_3_2686388_6_0
Run time 1 days 2 hours 37 min 11 sec
CPU time 1 days 2 hours 37 min 11 sec
Validate state Invalid
Application version rosetta python projects v1.03 (vbox64)
windows_x86_64
Peak working set size 96.44 MB
Peak swap size 195.97 MB
Peak disk usage 7,948.44 MB

Can anyone spot the error in the task?
ID: 106264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 106265 - Posted: 24 May 2022, 23:05:55 UTC - in response to Message 106263.  

does .out file in c:programdataboincslots[slot number here]shared change?

Sorry, I didn't see this, but neither do I know what .out file I should look at, nor what slot it was running in, nor know if or how it might've changed.
Task aborted now - I assume the info has gone now?
ID: 106265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 106266 - Posted: 24 May 2022, 23:08:41 UTC - in response to Message 106264.  

CPU time 1d 02:32:21
CPU time since checkpoint 00:08:12
Elapsed time 1d 01:32:50
Estimated time remaining 01:05:35
Fraction done 95.897%
Virtual memory size 98.97 MB
Working set size 2.79 GB

I'm going to abort it now and see what it reports
It should show here aagb-HPR_pp-NMPHE-GPN_pp-BPRO_pp_6_2605012_6_1

Apologies, it's this task, not the one shown above
aagb-PHE_pp-mPIP-GGLY-mB3LEU_3_2686388_6_0
Run time 1 days 2 hours 37 min 11 sec
CPU time 1 days 2 hours 37 min 11 sec
Validate state Invalid
Application version rosetta python projects v1.03 (vbox64)
windows_x86_64
Peak working set size 96.44 MB
Peak swap size 195.97 MB
Peak disk usage 7,948.44 MB

Can anyone spot the error in the task?

No error I can spot before this line, then several:

Hypervisor System Log:

However, these can be due to the abort.

It may be a task that ran much longer than expected, without anything going wrong. If so, just letting it run enough longer would have let it finish.
ID: 106266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 106267 - Posted: 24 May 2022, 23:19:07 UTC - in response to Message 106265.  

does .out file in c:programdataboincslots[slot number here]shared change?

Sorry, I didn't see this, but neither do I know what .out file I should look at, nor what slot it was running in, nor know if or how it might've changed.
Task aborted now - I assume the info has gone now?

To find the slot number click on the task in the tasks column, them on properties.

The info is gone shortly after the output files are uploaded and the task is reported as finished.

The probable change to look for is any change to the dates and size of the .out file.

If there is more than out .out file in the slot directory, look for changes in the dates or size in all of them.
ID: 106267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 106268 - Posted: 24 May 2022, 23:21:45 UTC
Last modified: 24 May 2022, 23:22:30 UTC

you can copy out file twice waiting several minutes between copies and then compare two copies with winmerge .
ID: 106268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 106269 - Posted: 25 May 2022, 0:19:29 UTC

Looks like Rosetta 4.2 just got a batch of `miniprotein in , grab them while they iz hot
front page job que went up by millions .
ID: 106269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 211 · 212 · 213 · 214 · 215 · 216 · 217 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org