Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 100 · 101 · 102 · 103 · 104 · 105 · 106 . . . 128 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1693
Credit: 31,654,546
RAC: 19,869
Message 101411 - Posted: 21 Apr 2021, 0:08:50 UTC - in response to Message 101390.  

Using the number of tasks In Progress as a proxy for how successful people are at downloading tasks
In March, the figure was 550k
When all the problems began, the figure dropped to around 318k - a loss of 41%
Today the figure is around 360k - loss reduced to 34.5%

Currently 384k in progress - loss reduced to 30%
<guessing> maybe back-up project tasks are being replaced by Rosetta? Every little helps
ID: 101411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1693
Credit: 31,654,546
RAC: 19,869
Message 101412 - Posted: 21 Apr 2021, 0:39:34 UTC - in response to Message 101382.  

I've been in contact with Project admins and this was a deliberate change, not a misconfiguration.
It's been looked at more closely and brought down to a figure nearer 4Gb - hopefully we see the result of that soon.
I note In Progress tasks are edging up, but let's see how that pans out.

There was obviously a need for that change, but I don't know what it is.
I've asked if a brief note can be posted to explain what they're working on that requires the increase.
No idea when or if that will happen.

I noticed the dud tasks have stopped coming down. Well done for getting them removed.

I thought the increased memory and disk space requirement was deliberate, The project clearly think they'll have some work that needs that much memory and/or disk space. Pity for the machines that don't have more than 4GB but I guess it can't be helped unless they want to split tasks into small or large types and have different queues of work. Probably a lot of work on the project side to implement for not much gain. I've taken my 4GB Pi4's out of my Pi cluster.

There had been some talk of larger tasks for more capable machines in the past. You may well be right that it was an attempt to provide them.
But on machines with lower available resources, they seem not to get anything rather than only being offered low-resource-reqt tasks.
And now it seems <everything> needs large resources.

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
ID: 101412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1074
Credit: 12,210,727
RAC: 23,052
Message 101414 - Posted: 21 Apr 2021, 7:33:34 UTC - in response to Message 101391.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.
Grant
Darwin NT
ID: 101414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1074
Credit: 12,210,727
RAC: 23,052
Message 101415 - Posted: 21 Apr 2021, 7:41:06 UTC - in response to Message 101403.  

SSD Endurance Experiment
I've read many articles complaining that SSDs last nowhere near as long as HDDs. A few HDDs do fail unexpectedly, but SSDs wear out, because they have a finite number of writes. They cannot possibly last longer than that time.
And as i indicated with that link i posted, you are talking about decades for normal drives under normal usage conditions.
Depends what you mean by normal. Mine has a security camera recording onto it, two graphics cards and a 24 core CPU doing Boinc, I record TV to it, .... I guess there are some people who just play solitaire and use email, those might last that long.
The disk I/O from BONC projects is bugger all as a factor of DWPD (Drive Writes Per Day), even for a system with 64 cores/128 threads all in use.
And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
Grant
Darwin NT
ID: 101415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1074
Credit: 12,210,727
RAC: 23,052
Message 101416 - Posted: 21 Apr 2021, 7:50:04 UTC - in response to Message 101412.  
Last modified: 21 Apr 2021, 8:08:58 UTC

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

eg
Default settings
                                         Run only the selected applications Rosetta v4: yes
                                                                            Rosetta v5: no
If no work for selected applications is available, accept work from other applications? no


Settings for those that choose to run large RAM Tasks.
                                         Run only the selected applications Rosetta v4: yes
                                                                            Rosetta v5: yes
If no work for selected applications is available, accept work from other applications? no

People can also choose to run just the one type, but do the other type if their preferred type isn't available at the time they request work by setting the bottom line "If no work..." to yes,

When a Work Unit is created, the researcher flags which application needs to be used to process it- Regular or large RAM requirement. That way any Task that requires large amounts of RAM, will only go to systems that are capable of handling it (if the user pays attention to the requirements before selecting the option to do those types of Tasks....).


Of course when they move beyond v4, they'd need to go to v6 for regular Tasks, and v7 for large RAM Tasks, and update the Rosetta preferences page, and let people know what's happening before hand.
Grant
Darwin NT
ID: 101416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 0
Message 101420 - Posted: 21 Apr 2021, 17:28:44 UTC - in response to Message 101405.  

[quote]From Sid Celery 31 Mar9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before...hmmmmm light bulb going off icon needed!!!


The only reboots I've had is that criminally auto-rebooting Windows 10. I've thwarted that though. My updates are "managed by my organisation" or so it thinks.


That's funny....you actually thinking MS gives a crap about what YOU, or your organization, wants to do with THEIR software. I hope it works for you I really really do but past history suggests MS just ups the priority of their updates and you get unwanted ones anyway because it serves their tracking needs.
It's my computer and they can't make me do anything, including pay for it.
ID: 101420 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 0
Message 101421 - Posted: 21 Apr 2021, 17:30:35 UTC - in response to Message 101411.  

Using the number of tasks In Progress as a proxy for how successful people are at downloading tasks
In March, the figure was 550k
When all the problems began, the figure dropped to around 318k - a loss of 41%
Today the figure is around 360k - loss reduced to 34.5%

Currently 384k in progress - loss reduced to 30%
<guessing> maybe back-up project tasks are being replaced by Rosetta? Every little helps
It could also be people manually doing other things. I sometimes like to concentrate on one project. If that runs out of work, I'll pick another and might not be back for a while. Somebody just knocked me into 3rd place elsewhere, this will not do, back in a week....
ID: 101421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 0
Message 101422 - Posted: 21 Apr 2021, 17:32:32 UTC - in response to Message 101415.  

SSD Endurance Experiment
I've read many articles complaining that SSDs last nowhere near as long as HDDs. A few HDDs do fail unexpectedly, but SSDs wear out, because they have a finite number of writes. They cannot possibly last longer than that time.
And as i indicated with that link i posted, you are talking about decades for normal drives under normal usage conditions.
Depends what you mean by normal. Mine has a security camera recording onto it, two graphics cards and a 24 core CPU doing Boinc, I record TV to it, .... I guess there are some people who just play solitaire and use email, those might last that long.
The disk I/O from BONC projects is bugger all as a factor of DWPD (Drive Writes Per Day), even for a system with 64 cores/128 threads all in use.
And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!
ID: 101422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 0
Message 101423 - Posted: 21 Apr 2021, 17:35:34 UTC - in response to Message 101416.  
Last modified: 21 Apr 2021, 17:35:54 UTC

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

eg
Default settings
                                         Run only the selected applications Rosetta v4: yes
                                                                            Rosetta v5: no
If no work for selected applications is available, accept work from other applications? no


Settings for those that choose to run large RAM Tasks.
                                         Run only the selected applications Rosetta v4: yes
                                                                            Rosetta v5: yes
If no work for selected applications is available, accept work from other applications? no

People can also choose to run just the one type, but do the other type if their preferred type isn't available at the time they request work by setting the bottom line "If no work..." to yes,

When a Work Unit is created, the researcher flags which application needs to be used to process it- Regular or large RAM requirement. That way any Task that requires large amounts of RAM, will only go to systems that are capable of handling it (if the user pays attention to the requirements before selecting the option to do those types of Tasks....).


Of course when they move beyond v4, they'd need to go to v6 for regular Tasks, and v7 for large RAM Tasks, and update the Rosetta preferences page, and let people know what's happening before hand.
An excellent idea, although doesn't the server know how much RAM I have? It does, I just checked in the likes of this:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3746264
It could only send large tasks to people who have say at least 16GB of RAM, or even xGB RAM per core.
ID: 101423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 616
Credit: 10,661,208
RAC: 4,664
Message 101425 - Posted: 21 Apr 2021, 20:31:20 UTC

I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 101425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1863
Credit: 5,980,047
RAC: 123
Message 101427 - Posted: 21 Apr 2021, 22:52:55 UTC - in response to Message 101420.  

[quote]From Sid Celery 31 Mar9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before...hmmmmm light bulb going off icon needed!!!


The only reboots I've had is that criminally auto-rebooting Windows 10. I've thwarted that though. My updates are "managed by my organisation" or so it thinks.


That's funny....you actually thinking MS gives a crap about what YOU, or your organization, wants to do with THEIR software. I hope it works for you I really really do but past history suggests MS just ups the priority of their updates and you get unwanted ones anyway because it serves their tracking needs.


It's my computer and they can't make me do anything, including pay for it.


LOL!!! They are Borg and you will be assimilated!!!
ID: 101427 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 87
Credit: 1,976,106
RAC: 10,453
Message 101428 - Posted: 21 Apr 2021, 23:21:00 UTC - in response to Message 101416.  

let people know what's happening before hand.

HA HA HA HA HA!! [wipes tears from eyes] HOO HEE HA HA HA!!!
ID: 101428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 87
Credit: 1,976,106
RAC: 10,453
Message 101429 - Posted: 21 Apr 2021, 23:29:04 UTC - in response to Message 101422.  

wear levelling


I don't know exactly how they work,

It was literally right there in the post that you quoted.
ID: 101429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1693
Credit: 31,654,546
RAC: 19,869
Message 101433 - Posted: 22 Apr 2021, 1:01:40 UTC - in response to Message 101414.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.

You make a good point tbf.
I'm getting even more errors atm, but without the rebooting of the PC.
Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now.
Time for some tweaking #brb
ID: 101433 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1085
Credit: 12,386,752
RAC: 2,130
Message 101434 - Posted: 22 Apr 2021, 1:13:22 UTC - in response to Message 101425.  
Last modified: 22 Apr 2021, 1:22:41 UTC

I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001.

These tasks?

https://boinc.bakerlab.org/rosetta/results.php?hostid=3117659&offset=0&show_names=0&state=6&appid=

If so, the task logs appear to show that attempting to extract one input file each from the database failed, probably because they weren't in the database.
ID: 101434 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1693
Credit: 31,654,546
RAC: 19,869
Message 101436 - Posted: 22 Apr 2021, 1:48:09 UTC - in response to Message 101416.  

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

eg

Neat idea. I'll let you tell them...
ID: 101436 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1693
Credit: 31,654,546
RAC: 19,869
Message 101437 - Posted: 22 Apr 2021, 1:49:24 UTC - in response to Message 101433.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.

You make a good point tbf.
I'm getting even more errors atm, but without the rebooting of the PC.
Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now.
Time for some tweaking #brb

And seconds after posting, my PC blue-screened. Almost like it knew I was talking about it.
Tweak done - let's see how it goes.
ID: 101437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1085
Credit: 12,386,752
RAC: 2,130
Message 101438 - Posted: 22 Apr 2021, 4:44:14 UTC - in response to Message 101416.  

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

I'd prefer to see the large RAM tasks marked by adding a letter to the application name, so that it does not interfere with the version numbering. This allows adding a different letter for yet another RAM size. Otherwise, a good idea.
ID: 101438 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 616
Credit: 10,661,208
RAC: 4,664
Message 101439 - Posted: 22 Apr 2021, 6:24:47 UTC - in response to Message 101434.  

>>> These tasks?

Err, yes, err, obviously...

>>> If so, the task logs appear to show that attempting to extract one input file each from the database failed, probably because they weren't in the database.

Indeed, it was a problem with Rosetta@home, which is why I mentioned it in the thread called "Problems and Technical Issues with Rosetta@home"...
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 101439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1074
Credit: 12,210,727
RAC: 23,052
Message 101441 - Posted: 22 Apr 2021, 7:01:55 UTC - in response to Message 101422.  
Last modified: 22 Apr 2021, 7:09:19 UTC

And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!
Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.
HDDs that spend all their time thrashing tend to die very young.

You did see where i wrote about leaving sufficient free space? At least 30%? If there is only 20% free space on the drive (SSD or HDD) it is for all intents and purposes full and should be replaced with a much larger unit, or more spaced freed up.
Grant
Darwin NT
ID: 101441 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 100 · 101 · 102 · 103 · 104 · 105 · 106 . . . 128 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2021 University of Washington
https://www.bakerlab.org