Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 100 · 101 · 102 · 103 · 104 · 105 · 106 . . . 309 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new!The disk I/O from BONC projects is bugger all as a factor of DWPD (Drive Writes Per Day), even for a system with 64 cores/128 threads all in use.Depends what you mean by normal. Mine has a security camera recording onto it, two graphics cards and a 24 core CPU doing Boinc, I record TV to it, .... I guess there are some people who just play solitaire and use email, those might last that long.And as i indicated with that link i posted, you are talking about decades for normal drives under normal usage conditions.SSD Endurance ExperimentI've read many articles complaining that SSDs last nowhere near as long as HDDs. A few HDDs do fail unexpectedly, but SSDs wear out, because they have a finite number of writes. They cannot possibly last longer than that time. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
An excellent idea, although doesn't the server know how much RAM I have? It does, I just checked in the likes of this:I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.There's a simple quick & dirty method that would be easy for the project to implement. https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3746264 It could only send large tasks to people who have say at least 16GB of RAM, or even xGB RAM per core. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 4 |
I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
[quote]From Sid Celery 31 Mar9 Apr LOL!!! They are Borg and you will be assimilated!!! |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
let people know what's happening before hand. HA HA HA HA HA!! [wipes tears from eyes] HOO HEE HA HA HA!!! |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
wear levelling It was literally right there in the post that you quoted. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.I've had a couple of miniprotein_relax8_ error out after a while with a similar error messageHaven't all those tasks been aborted by the server now? You make a good point tbf. I'm getting even more errors atm, but without the rebooting of the PC. Something bad definitely going on with my machine, but with everything else happening it's been hard for me to determine the cause up to now. Time for some tweaking #brb |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001. These tasks? https://boinc.bakerlab.org/rosetta/results.php?hostid=3117659&offset=0&show_names=0&state=6&appid= If so, the task logs appear to show that attempting to extract one input file each from the database failed, probably because they weren't in the database. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.There's a simple quick & dirty method that would be easy for the project to implement. Neat idea. I'll let you tell them... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.I've had a couple of miniprotein_relax8_ error out after a while with a similar error messageHaven't all those tasks been aborted by the server now? And seconds after posting, my PC blue-screened. Almost like it knew I was talking about it. Tweak done - let's see how it goes. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
I'm sure there's a better way of implementing the provision of appropriately-sized tasks, but no-one's hit on it yet.There's a simple quick & dirty method that would be easy for the project to implement. I'd prefer to see the large RAM tasks marked by adding a letter to the application name, so that it does not interfere with the version numbering. This allows adding a different letter for yet another RAM size. Otherwise, a good idea. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 4 |
>>> These tasks? Err, yes, err, obviously... >>> If so, the task logs appear to show that attempting to extract one input file each from the database failed, probably because they weren't in the database. Indeed, it was a problem with Rosetta@home, which is why I mentioned it in the thread called "Problems and Technical Issues with Rosetta@home"... Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,380,064 RAC: 20,136 |
Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new! HDDs that spend all their time thrashing tend to die very young. You did see where i wrote about leaving sufficient free space? At least 30%? If there is only 20% free space on the drive (SSD or HDD) it is for all intents and purposes full and should be replaced with a much larger unit, or more spaced freed up. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,380,064 RAC: 20,136 |
I've had a couple of work units crash out after 30-40 seconds this afternoon. Exit status 0x00000001. I've had 1 of the pre_helical_bundles_round1_attempt1_ Tasks do that. <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_9pf3ry4f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_9pf3ry4f.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_9pf3ry4f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2417999 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: [ERROR] Unable to open constraints file: e8132c30c9ee547672281ce157b2ec8d_0001.MSAcst ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457 BOINC:: Error reading and gzipping output datafile: default.out 13:08:50 (9888): called boinc_finish(1) </stderr_txt> ]]> I've had another 50+ that completed & Validated without issue. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Gates looks like one.LOL!!! They are Borg and you will be assimilated!!!That's funny....you actually thinking MS gives a crap about what YOU, or your organization, wants to do with THEIR software. I hope it works for you I really really do but past history suggests MS just ups the priority of their updates and you get unwanted ones anyway because it serves their tracking needs.It's my computer and they can't make me do anything, including pay for it. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
It did not say how the drive internally replaces broken sectors. But if it was done well, the disk would last for decades, gradually getting smaller.It was literally right there in the post that you quoted.wear levellingI don't know exactly how they work, |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Never ever overclock. A lot of pain and fustration for 10% more power and 50% less lifespan.They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.I've had a couple of miniprotein_relax8_ error out after a while with a similar error messageHaven't all those tasks been aborted by the server now? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Nah, I wait until it goes red in the file manager, then get round to cleaning it up sometime within the next week or two. It's not full until it's full.Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new! |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Never ever overclock. A lot of pain and frustration for 10% more power and 50% less lifespan.They were still going through Yesterday, but given the low percentage of errors i didn't consider them to be an issue. That you did have such a high number of errors indicated that there was something going on with your system.I've had a couple of miniprotein_relax8_ error out after a while with a similar error messageHaven't all those tasks been aborted by the server now? I happen to like pain and frustration... I know what you mean, but I'm finding a lot more improvement than that. And given my last PC lasted about 7yrs while running oc pretty much 247 throughout, I don't think it's true about how it reduces the longevity of the CPU In fact, when I've been experimenting and the oc gets knocked out, I'm seeing what base clock looks like and it's not a pretty sight. Each to their own, but I'm staying weird on this one |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
Of course if you were to treat a HDD the way you described, you would considerably shorten it's life expectancy as well.And SSDs used for recording video streams 24/7 will also last just as long if they have plenty of free space (30% or more) to allow for garbage collection & wear levelling to occur as needed.I don't know exactly how they work, but let's say I have mine 80% full of stuff that remains there. Then I repeatedly write to the remaining 20%. That 20% will wear out. When there aren't enough spare bits to reallocate, won't it just say "I'm now a smaller disk"? 80% of the drive is pretty much as new! So in essence you are causing your harddrives to die sooner by not taking care of them properly? Why would you wait until it gets into the red when you know that causes alot of thrashing as bits of files are spread out everywhere across the drive causing alot of wear and tear on your drives? With the process you've described earlier about what it takes to get a new drive and machine up and running why not just use a bigger harddrive and give it years and years of use before you have to them go to an even bigger one or clean it up. You could even setup a swap drive on an SSD or fast regular harddrive and let all the thrashing take place over there. In a perfect World you could even setup a swap drive in Ram but that would take alot of money in todays World of expensive DDR4 memory. Even DDR3 memory is not cheap when you get up into the 8gb sticks. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org