Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 100 · 101 · 102 · 103 · 104 · 105 · 106 . . . 311 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 101422 - Posted: 21 Apr 2021, 17:32:32 UTC - in response to Message 101415.  

SSD Endurance Experiment
I've read many articles complaining that SSDs last nowhere near as long as HDDs. A few HDDs do fail unexpectedly, but SSDs wear out, because they have a finite number of writes. They cannot possibly last longer than that time.
And as i indicated with that link i posted, you are talking about decades for normal drives under normal usage conditions.
Depends what you mean by normal. Mine has a security camera recording onto it, two graphics cards and a 24 core CPU doing Boinc, I record TV to it, .... I guess there are some people who just play solitaire and use email, those might last that long.
The disk I/O from BONC projects is bugger all as a factor of DWPD (Drive Writes Per Day), even for a system with 64 cores/128 threads all in use.
And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!
ID: 101422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 101423 - Posted: 21 Apr 2021, 17:35:34 UTC - in response to Message 101416.  
Last modified: 21 Apr 2021, 17:35:54 UTC

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

eg
Default settings
                                         Run only the selected applications Rosetta v4: yes
                                                                            Rosetta v5: no
If no work for selected applications is available, accept work from other applications? no


Settings for those that choose to run large RAM Tasks.
                                         Run only the selected applications Rosetta v4: yes
                                                                            Rosetta v5: yes
If no work for selected applications is available, accept work from other applications? no

People can also choose to run just the one type, but do the other type if their preferred type isn't available at the time they request work by setting the bottom line "If no work..." to yes,

When a Work Unit is created, the researcher flags which application needs to be used to process it- Regular or large RAM requirement. That way any Task that requires large amounts of RAM, will only go to systems that are capable of handling it (if the user pays attention to the requirements before selecting the option to do those types of Tasks....).


Of course when they move beyond v4, they'd need to go to v6 for regular Tasks, and v7 for large RAM Tasks, and update the Rosetta preferences page, and let people know what's happening before hand.
An excellent idea, although doesn't the server know how much RAM I have? It does, I just checked in the likes of this:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3746264
It could only send large tasks to people who have say at least 16GB of RAM, or even xGB RAM per core.
ID: 101423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 2
Message 101425 - Posted: 21 Apr 2021, 20:31:20 UTC

I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 101425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,217,610
RAC: 822
Message 101427 - Posted: 21 Apr 2021, 22:52:55 UTC - in response to Message 101420.  

[quote]From Sid Celery 31 Mar9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before...hmmmmm light bulb going off icon needed!!!


The only reboots I've had is that criminally auto-rebooting Windows 10. I've thwarted that though. My updates are "managed by my organisation" or so it thinks.


That's funny....you actually thinking MS gives a crap about what YOU, or your organization, wants to do with THEIR software. I hope it works for you I really really do but past history suggests MS just ups the priority of their updates and you get unwanted ones anyway because it serves their tracking needs.


It's my computer and they can't make me do anything, including pay for it.


LOL!!! They are Borg and you will be assimilated!!!
ID: 101427 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 101428 - Posted: 21 Apr 2021, 23:21:00 UTC - in response to Message 101416.  

let people know what's happening before hand.

HA HA HA HA HA!! [wipes tears from eyes] HOO HEE HA HA HA!!!
ID: 101428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 101429 - Posted: 21 Apr 2021, 23:29:04 UTC - in response to Message 101422.  

wear levelling


I don't know exactly how they work,

It was literally right there in the post that you quoted.
ID: 101429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 101433 - Posted: 22 Apr 2021, 1:01:40 UTC - in response to Message 101414.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.

You make a good point tbf.
I'm getting even more errors atm, but without the rebooting of the PC.
Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now.
Time for some tweaking #brb
ID: 101433 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 101434 - Posted: 22 Apr 2021, 1:13:22 UTC - in response to Message 101425.  
Last modified: 22 Apr 2021, 1:22:41 UTC

I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001.

These tasks?

https://boinc.bakerlab.org/rosetta/results.php?hostid=3117659&offset=0&show_names=0&state=6&appid=

If so, the task logs appear to show that attempting to extract one input file each from the database failed, probably because they weren't in the database.
ID: 101434 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 101436 - Posted: 22 Apr 2021, 1:48:09 UTC - in response to Message 101416.  

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

eg

Neat idea. I'll let you tell them...
ID: 101436 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 101437 - Posted: 22 Apr 2021, 1:49:24 UTC - in response to Message 101433.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.

You make a good point tbf.
I'm getting even more errors atm, but without the rebooting of the PC.
Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now.
Time for some tweaking #brb

And seconds after posting, my PC blue-screened. Almost like it knew I was talking about it.
Tweak done - let's see how it goes.
ID: 101437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 101438 - Posted: 22 Apr 2021, 4:44:14 UTC - in response to Message 101416.  

I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.
Perhaps it needs info from the host requesting tasks first. But I'm guessing again.
There's a simple quick & dirty method that would be easy for the project to implement.
The present application is v 4.2x
The project compiles another copy, exactly the same, and calls it v5.2x and uses that one for processing large RAM requirement Tasks.

In the Rosetta@home preferences they give the option of which version to run. The default for current & new users is v4.2x
People can choose to also process large RAM tasks by selecting v5.2x

I'd prefer to see the large RAM tasks marked by adding a letter to the application name, so that it does not interfere with the version numbering. This allows adding a different letter for yet another RAM size. Otherwise, a good idea.
ID: 101438 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 2
Message 101439 - Posted: 22 Apr 2021, 6:24:47 UTC - in response to Message 101434.  

>>> These tasks?

Err, yes, err, obviously...

>>> If so, the task logs appear to show that attempting to extract one input file each from the database failed, probably because they weren't in the database.

Indeed, it was a problem with Rosetta@home, which is why I mentioned it in the thread called "Problems and Technical Issues with Rosetta@home"...
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 101439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 101441 - Posted: 22 Apr 2021, 7:01:55 UTC - in response to Message 101422.  
Last modified: 22 Apr 2021, 7:09:19 UTC

And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!
Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.
HDDs that spend all their time thrashing tend to die very young.

You did see where i wrote about leaving sufficient free space? At least 30%? If there is only 20% free space on the drive (SSD or HDD) it is for all intents and purposes full and should be replaced with a much larger unit, or more spaced freed up.
Grant
Darwin NT
ID: 101441 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 101442 - Posted: 22 Apr 2021, 7:07:56 UTC - in response to Message 101425.  

I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001.

I've had 1 of the pre_helical_bundles_round1_attempt1_ Tasks do that.

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_9pf3ry4f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_9pf3ry4f.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_9pf3ry4f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2417999
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: [ERROR] Unable to open constraints file: e8132c30c9ee547672281ce157b2ec8d_0001.MSAcst
ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457
BOINC:: Error reading and gzipping output datafile: default.out
13:08:50 (9888): called boinc_finish(1)

</stderr_txt>
]]>



I've had another 50+ that completed & Validated without issue.
Grant
Darwin NT
ID: 101442 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 101447 - Posted: 22 Apr 2021, 17:45:30 UTC - in response to Message 101427.  

That's funny....you actually thinking MS gives a crap about what YOU, or your organization, wants to do with THEIR software. I hope it works for you I really really do but past history suggests MS just ups the priority of their updates and you get unwanted ones anyway because it serves their tracking needs.
It's my computer and they can't make me do anything, including pay for it.
LOL!!! They are Borg and you will be assimilated!!!
Gates looks like one.
ID: 101447 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 101448 - Posted: 22 Apr 2021, 17:47:36 UTC - in response to Message 101429.  

wear levelling
I don't know exactly how they work,
It was literally right there in the post that you quoted.
It did not say how the drive internally replaces broken sectors. But if it was done well, the disk would last for decades, gradually getting smaller.
ID: 101448 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 101449 - Posted: 22 Apr 2021, 17:48:47 UTC - in response to Message 101433.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.

You make a good point tbf.
I'm getting even more errors atm, but without the rebooting of the PC.
Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now.
Time for some tweaking #brb
Never ever overclock. A lot of pain and fustration for 10% more power and 50% less lifespan.
ID: 101449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 101450 - Posted: 22 Apr 2021, 17:50:46 UTC - in response to Message 101441.  

And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!
Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.
HDDs that spend all their time thrashing tend to die very young.

You did see where i wrote about leaving sufficient free space? At least 30%? If there is only 20% free space on the drive (SSD or HDD) it is for all intents and purposes full and should be replaced with a much larger unit, or more spaced freed up.
Nah, I wait until it goes red in the file manager, then get round to cleaning it up sometime within the next week or two. It's not full until it's full.
ID: 101450 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 101452 - Posted: 23 Apr 2021, 7:44:23 UTC - in response to Message 101449.  

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
Haven't all those tasks been aborted by the server now?
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.

You make a good point tbf.
I'm getting even more errors atm, but without the rebooting of the PC.
Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now.
Time for some tweaking #brb
Never ever overclock. A lot of pain and frustration for 10% more power and 50% less lifespan.

I happen to like pain and frustration...
I know what you mean, but I'm finding a lot more improvement than that. And given my last PC lasted about 7yrs while running oc pretty much 247 throughout, I don't think it's true about how it reduces the longevity of the CPU
In fact, when I've been experimenting and the oc gets knocked out, I'm seeing what base clock looks like and it's not a pretty sight.
Each to their own, but I'm staying weird on this one
ID: 101452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,217,610
RAC: 822
Message 101456 - Posted: 23 Apr 2021, 12:36:08 UTC - in response to Message 101450.  

And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!
Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.
HDDs that spend all their time thrashing tend to die very young.

You did see where i wrote about leaving sufficient free space? At least 30%? If there is only 20% free space on the drive (SSD or HDD) it is for all intents and purposes full and should be replaced with a much larger unit, or more spaced freed up.


Nah, I wait until it goes red in the file manager, then get round to cleaning it up sometime within the next week or two. It's not full until it's full.


So in essence you are causing your harddrives to die sooner by not taking care of them properly? Why would you wait until it gets into the red when you know that causes alot of thrashing as bits of files are spread out everywhere across the drive causing alot of wear and tear on your drives? With the process you've described earlier about what it takes to get a new drive and machine up and running why not just use a bigger harddrive and give it years and years of use before you have to them go to an even bigger one or clean it up. You could even setup a swap drive on an SSD or fast regular harddrive and let all the thrashing take place over there. In a perfect World you could even setup a swap drive in Ram but that would take alot of money in todays World of expensive DDR4 memory. Even DDR3 memory is not cheap when you get up into the 8gb sticks.
ID: 101456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 100 · 101 · 102 · 103 · 104 · 105 · 106 . . . 311 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org