Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 99 · 100 · 101 · 102 · 103 · 104 · 105 . . . 276 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,692,114
RAC: 7,360
Message 101369 - Posted: 19 Apr 2021, 18:01:08 UTC - in response to Message 101351.  

The 6.5GB problem goes away on an 8GB machine if you set it to use 100% memory. It never actually uses 100% since everything overestimates. I just changed my old Boinc-only machines [1] and Rosettas downloaded and ran
This is actually a good point.
[Double take] I made a good point?

I found 95% is usually sufficient on my laptop as it happens, but for those for whom it isn't, it's worth going the whole hog
It's a pity Boinc won't accept 150%. Which isn't as silly as it sounds, if you have an NVME for the swapfile, you might not mind it dipping into that occasionally, so you can get another task in.

I know you have your arsey moods, but that doesn't mean I don't take your points on their merit.
Then again, I'm probably wrong...
I don't have moods, people who can't handle my facts or opinions have moods. They're usually American as they're quite soft over there. I just got banned from a forum for pointing out the fact that the average American IQ is only 98, whereas the UK is 100 and Japan is 106.
ID: 101369 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101372 - Posted: 19 Apr 2021, 19:39:40 UTC - in response to Message 101369.  

The 6.5GB problem goes away on an 8GB machine if you set it to use 100% memory. It never actually uses 100% since everything overestimates. I just changed my old Boinc-only machines [1] and Rosettas downloaded and ran
This is actually a good point.
[Double take] I made a good point?
I found 95% is usually sufficient on my laptop as it happens, but for those for whom it isn't, it's worth going the whole hog
It's a pity Boinc won't accept 150%. Which isn't as silly as it sounds, if you have an NVME for the swapfile, you might not mind it dipping into that occasionally, so you can get another task in.

I know you have your arsey moods, but that doesn't mean I don't take your points on their merit.
Then again, I'm probably wrong...
I don't have moods, people who can't handle my facts or opinions have moods. They're usually American as they're quite soft over there. I just got banned from a forum for pointing out the fact that the average American IQ is only 98, whereas the UK is 100 and Japan is 106.

You don't have moods?!
Not only do you have moods, sometimes they're arsey - that is, more than one.
Never mind, though. I wouldn't want you to get moody over my facts and opinions... lol

Let's go back to you making a good point - then everyone's happy
ID: 101372 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101373 - Posted: 19 Apr 2021, 21:01:21 UTC - in response to Message 100924.  

From Brian Nixon, 31 Mar
I've had no issues with insufficient disk space or memory.
This points to a misconfiguration of the new batch of work units, as it seems unlikely it would be the project’s intention to cut off a third of its capacity…

Look in client_state.xml for the rsc_memory_bound and rsc_disk_bound settings of the new work units: they used to be 1,800,000,000 each; to yield the errors people are reporting they must now be set to 7,000,000,000 and 9,000,000,000.

Brian, I looked at my client_state.xml file and, as you speculated(?), those are the figures showing there.

I've been in contact with Project admins and this was a deliberate change, not a misconfiguration.
It's been looked at more closely and brought down to a figure nearer 4Gb - hopefully we see the result of that soon.
I note In Progress tasks are edging up, but let's see how that pans out.

There was obviously a need for that change, but I don't know what it is.
I've asked if a brief note can be posted to explain what they're working on that requires the increase.
No idea when or if that will happen.

But small victories - thanks for your pointer. Well spotted. I didn't appreciate the significance of it at the time you posted.
ID: 101373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 101374 - Posted: 19 Apr 2021, 21:09:09 UTC - in response to Message 101373.  

I've asked if a brief note can be posted to explain what they're working on that requires the increase.
No idea when or if that will happen.
That will be getting blood out of a turnip. It must be their policy not to comment.
There is probably a good reason for it, but it is not entirely apparent to me what it is.
ID: 101374 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101377 - Posted: 19 Apr 2021, 21:49:14 UTC - in response to Message 101374.  

I've asked if a brief note can be posted to explain what they're working on that requires the increase.
No idea when or if that will happen.
That will be getting blood out of a turnip. It must be their policy not to comment.
There is probably a good reason for it, but it is not entirely apparent to me what it is.

You've been here longer than me - I can't say anything...

I speculated that the change might have been a test that got left in the defaults, so asked if it could revert back to what it was.
But it was a change for a reason, so while it could be fine-tuned it still couldn't go back all the way.

When the project started working on SARS-CoVid2 there were some big changes in the size of tasks that didn't always go through successfully, but for all the errors it threw up for us they got significant results too.
None of is have any idea what this change relates to, hence my request.
If they tell us, it'll be understandable to everyone.
I made the point that their technical posts always go down very well, so it's worth taking the time.
Whether they do or not is out of our hands. We wait.
ID: 101377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101378 - Posted: 19 Apr 2021, 21:56:27 UTC - in response to Message 101373.  

From Brian Nixon, 31 Mar
I've had no issues with insufficient disk space or memory.
This points to a misconfiguration of the new batch of work units, as it seems unlikely it would be the project’s intention to cut off a third of its capacity…

Look in client_state.xml for the rsc_memory_bound and rsc_disk_bound settings of the new work units: they used to be 1,800,000,000 each; to yield the errors people are reporting they must now be set to 7,000,000,000 and 9,000,000,000.

Brian, I looked at my client_state.xml file and, as you speculated(?), those are the figures showing there.

I've been in contact with Project admins and this was a deliberate change, not a misconfiguration.
It's been looked at more closely and brought down to a figure nearer 4Gb - hopefully we see the result of that soon.
I note In Progress tasks are edging up, but let's see how that pans out.

There was obviously a need for that change, but I don't know what it is.
I've asked if a brief note can be posted to explain what they're working on that requires the increase.
No idea when or if that will happen.

But small victories - thanks for your pointer. Well spotted. I didn't appreciate the significance of it at the time you posted.

In addition, tasks with the names "miniprotein_relax8" and "_abinitio_1_abinitio_" have been deleted from the queue and another bad batch they noticed before we informed them of these two.
Hopefully we'll all see a lot fewer crashes than we have recently.
I've regularly found my own PCs have rebooted overnight due to these faulty tasks.

If any new ones arise, note the names and they can be looked into if they haven't already noticed them
ID: 101378 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 101379 - Posted: 19 Apr 2021, 23:41:32 UTC - in response to Message 101378.  

[quote]From Sid Celery 31 Mar9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before...hmmmmm light bulb going off icon needed!!!
ID: 101379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101380 - Posted: 20 Apr 2021, 0:25:18 UTC - in response to Message 101379.  

From Sid Celery 9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before... hmmmmm light bulb going off icon needed!!!

It could be a lot of things, but when I check the start of the Event log I'm finding like 44 tasks uploaded and a few coming down and online they all report with Computation errors at that time.
It may be different for others, but it's been taking out every task of mine, good or bad, and crashing the whole PC.

If everything's good tomorrow morning, it'll be because the Server aborted all those tasks today. Let's see if I'm right.
ID: 101380 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PorkyPies

Send message
Joined: 6 Apr 20
Posts: 45
Credit: 1,650,779
RAC: 0
Message 101382 - Posted: 20 Apr 2021, 5:34:11 UTC - in response to Message 101373.  
Last modified: 20 Apr 2021, 5:37:27 UTC

I've been in contact with Project admins and this was a deliberate change, not a misconfiguration.
It's been looked at more closely and brought down to a figure nearer 4Gb - hopefully we see the result of that soon.
I note In Progress tasks are edging up, but let's see how that pans out.

There was obviously a need for that change, but I don't know what it is.
I've asked if a brief note can be posted to explain what they're working on that requires the increase.
No idea when or if that will happen.

I noticed the dud tasks have stopped coming down. Well done for getting them removed.

I thought the increased memory and disk space requirement was deliberate, The project clearly think they'll have some work that needs that much memory and/or disk space. Pity for the machines that don't have more than 4GB but I guess it can't be helped unless they want to split tasks into small or large types and have different queues of work. Probably a lot of work on the project side to implement for not much gain. I've taken my 4GB Pi4's out of my Pi cluster.
MarksRpiCluster
ID: 101382 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,583,258
RAC: 14,559
Message 101385 - Posted: 20 Apr 2021, 7:06:52 UTC - in response to Message 101368.  

SSD Endurance Experiment
I've read many articles complaining that SSDs last nowhere near as long as HDDs. A few HDDs do fail unexpectedly, but SSDs wear out, because they have a finite number of writes. They cannot possibly last longer than that time.
And as i indicated with that link i posted, you are talking about decades for normal drives under normal usage conditions.
Just as some HDDs fail before their time, so to do some SSDs.

For all of the articles that complain about SSD failures, there would be just as many about HDD failures.

SSD vs HDD: Which One is More Reliable?
But in terms of data security, evidence of flash wear appeared after 200TB of writes for TechReport’s Solid State Drives, when their Samsung 840 Series started logging reallocated sectors. As the only TLC candidate in the bunch, this drive was expected to show the first cracks. The 840 Series didn’t encounter actual problems until 300TB, when it failed a hash check during the setup for an unpowered data retention test. The drive went on to pass that test and continue writing, but it recorded a rash of uncorrectable errors around the same time. Uncorrectable errors can compromise data integrity and system stability, so I’d recommend taking drives out of service the moment they appear.

Recalculating the limit until data becomes compromised at 300TB, an SSD like the Samsung 840 Series is theoretically reliable up to 21.4 years. Compare that to the fact that an HD drive is 50% likely to fail after 6 years.
I'll take an SSD over a HDD any day.
Grant
Darwin NT
ID: 101385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101387 - Posted: 20 Apr 2021, 9:51:37 UTC - in response to Message 101380.  

From Sid Celery 9 Apr
I've regularly found my own PCs have rebooted overnight due to these faulty tasks.

I've never considered that being the cause of a reboot before... hmmmmm light bulb going off icon needed!!!

It could be a lot of things, but when I check the start of the Event log I'm finding like 44 tasks uploaded and a few coming down and online they all report with Computation errors at that time.
It may be different for others, but it's been taking out every task of mine, good or bad, and crashing the whole PC.

If everything's good tomorrow morning, it'll be because the Server aborted all those tasks today. Let's see if I'm right.

Partly right.
No re-boot, but my entire cache showing Computation errors and a message in the Event log saying:
20/04/2021 10:05:11 | Rosetta@home | [error] Signature verification failed for database_357d5d93529_n_methyl.zip

and a back-off from re-contacting the server for 24hrs
2 steps forward, one step back...
ID: 101387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,583,258
RAC: 14,559
Message 101389 - Posted: 20 Apr 2021, 10:07:52 UTC

I'd backoff any over clocks for memory & CPU and let things run at stock for a while.
Some of the errors could be due to internet/AV issues eg
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
  <file_name>database_357d5d93529_n_methyl.zip</file_name>
  <error_code>-120 (RSA key check failed for file)</error_code>
  <error_message>signature verification failed</error_message>
</file_xfer_error>
</message>
]]>

But the Tasks that are starting and then erroring out after a while eg
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol fr_cart_fast.xml @fr_flags_bcov2 -in:file:silent miniprotein_relax9_SAVE_ALL_OUT_IGNORE_THE_REST_9fk2oh9e.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip miniprotein_relax9_SAVE_ALL_OUT_IGNORE_THE_REST_9fk2oh9e.zip @miniprotein_relax9_SAVE_ALL_OUT_IGNORE_THE_REST_9fk2oh9e.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3225505
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000004 

Engaging BOINC Windows Runtime Debugger...
Indicate some other issue.






I've had a couple of miniprotein_relax8_ error out after a while with a similar error message
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol fr_cart_fast.xml @fr_flags_bcov2 -in:file:silent miniprotein_relax8_SAVE_ALL_OUT_IGNORE_THE_REST_5mm6sc7p.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip miniprotein_relax8_SAVE_ALL_OUT_IGNORE_THE_REST_5mm6sc7p.zip @miniprotein_relax8_SAVE_ALL_OUT_IGNORE_THE_REST_5mm6sc7p.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1040802
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF736388316 read attempt to address 0xFFFFFFFF

Engaging BOINC Windows Runtime Debugger...
, but 95% or more of them have completed without issue.


And while a few pre_helical_bundles_round1_attempt1_ error out in seconds
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_7tc3qf4n.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_7tc3qf4n.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_7tc3qf4n.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3386203
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: [ERROR] Unable to open constraints file: d13b0a13bd57de6e8dc1565c1b82259f_0001.MSAcst
ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457
BOINC:: Error reading and gzipping output datafile: default.out
10:12:15 (5600): called boinc_finish(1)

</stderr_txt>
]]>

But once again, the vast majority have completed ok.

I've gone from over 150 errors to just 5.
Grant
Darwin NT
ID: 101389 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101390 - Posted: 20 Apr 2021, 10:31:41 UTC - in response to Message 101373.  

From Brian Nixon, 31 Mar
I've had no issues with insufficient disk space or memory.
This points to a misconfiguration of the new batch of work units, as it seems unlikely it would be the project’s intention to cut off a third of its capacity…

Look in client_state.xml for the rsc_memory_bound and rsc_disk_bound settings of the new work units: they used to be 1,800,000,000 each; to yield the errors people are reporting they must now be set to 7,000,000,000 and 9,000,000,000.

Brian, I looked at my client_state.xml file and, as you speculated(?), those are the figures showing there.

I've been in contact with Project admins and this was a deliberate change, not a misconfiguration.
It's been looked at more closely and brought down to a figure nearer 4Gb - hopefully we see the result of that soon.
I note In Progress tasks are edging up, but let's see how that pans out.

After 1 day (a very short amount of time) it appears I'm being too optimistic.

Using the number of tasks In Progress as a proxy for how successful people are at downloading tasks
In March, the figure was 550k
When all the problems began, the figure dropped to around 318k - a loss of 41%
Today the figure is around 360k - loss reduced to 34.5%

Usually it's a good thing to have a large queue of tasks to run. A week ago this figure increased to over 20m tasks.
After the 2 or 3 rogue task-types that were causing all the crashes were removed, this dropped to 19m.
Now it seems like the change to RAM & Disk requirements will only take effect for new tasks added to the queue - the amounts showing in my client_state.xml are largely the same as before.
It may take 7 or 8 weeks for 19m tasks in the current queue to be ploughed through to see the (slightly) lower resource demands. June 2021...

This is me speculating after just 1 day. Hopefully I'm wrong and it's quicker than that.
I'm working on the basis that "bad news early" is better than no news at all.
ID: 101390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101391 - Posted: 20 Apr 2021, 10:59:10 UTC - in response to Message 101389.  

I'd backoff any over clocks for memory & CPU and let things run at stock for a while.
Some of the errors could be due to internet/AV issues eg
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
  <file_name>database_357d5d93529_n_methyl.zip</file_name>
  <error_code>-120 (RSA key check failed for file)</error_code>
  <error_message>signature verification failed</error_message>
</file_xfer_error>
</message>
]]>

Is this directed at me?
If so, yes, I've assumed some of my problems are of my own making. I'm edging things down every couple of days and I've got a particular setting I'm looking to move down a lot the next chance I get.
My temps are abnormally high atm, so I have to fix that.

I've had a couple of miniprotein_relax8_ error out after a while with a similar error message

Haven't all those tasks been aborted by the server now?

And while a few pre_helical_bundles_round1_attempt1_ error out in seconds
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_7tc3qf4n.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_7tc3qf4n.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_7tc3qf4n.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3386203
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: [ERROR] Unable to open constraints file: d13b0a13bd57de6e8dc1565c1b82259f_0001.MSAcst
ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457
BOINC:: Error reading and gzipping output datafile: default.out
10:12:15 (5600): called boinc_finish(1)

</stderr_txt>
]]>

But once again, the vast majority have completed ok.

I've gone from over 150 errors to just 5.

I've reported that as well. Some crash out within 20secs with a Computation error, while others stop short after 7 or 8mins but validated as if nothing went wrong.
But both report errors, which is weird.
ERROR: [ERROR] Unable to open constraints file: e1096e175045f039d630a9b7543a561f_0001.MSAcst
ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457

ID: 101391 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,692,114
RAC: 7,360
Message 101400 - Posted: 20 Apr 2021, 17:40:30 UTC - in response to Message 101372.  

You don't have moods?!
Not only do you have moods, sometimes they're arsey - that is, more than one.
Never mind, though. I wouldn't want you to get moody over my facts and opinions... lol

Let's go back to you making a good point - then everyone's happy
I'm a very calm person actually. The only mood I get in here is amused when people get upset over nothing.
ID: 101400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,692,114
RAC: 7,360
Message 101401 - Posted: 20 Apr 2021, 17:43:01 UTC - in response to Message 101378.  

In addition, tasks with the names "miniprotein_relax8" and "_abinitio_1_abinitio_" have been deleted from the queue and another bad batch they noticed before we informed them of these two.
Hopefully we'll all see a lot less crashes than we have recently.
I've regularly found my own PCs have rebooted overnight due to these faulty tasks.
That's odd, I've never had a computer crash due to a faulty task from any project. A whole machine going down from one program error, that's a Windows XP problem isn't it?
ID: 101401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,692,114
RAC: 7,360
Message 101402 - Posted: 20 Apr 2021, 17:44:13 UTC - in response to Message 101379.  

[quote]From Sid Celery 31 Mar9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before...hmmmmm light bulb going off icon needed!!!
The only reboots I've had is that criminally auto-rebooting Windows 10. I've thwarted that though. My updates are "managed by my organisation" or so it thinks.
ID: 101402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,692,114
RAC: 7,360
Message 101403 - Posted: 20 Apr 2021, 17:46:48 UTC - in response to Message 101385.  
Last modified: 20 Apr 2021, 17:47:09 UTC

SSD Endurance Experiment
I've read many articles complaining that SSDs last nowhere near as long as HDDs. A few HDDs do fail unexpectedly, but SSDs wear out, because they have a finite number of writes. They cannot possibly last longer than that time.
And as i indicated with that link i posted, you are talking about decades for normal drives under normal usage conditions.
Depends what you mean by normal. Mine has a security camera recording onto it, two graphics cards and a 24 core CPU doing Boinc, I record TV to it, .... I guess there are some people who just play solitaire and use email, those might last that long.
ID: 101403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 101405 - Posted: 20 Apr 2021, 21:36:21 UTC - in response to Message 101402.  

[quote]From Sid Celery 31 Mar9 Apr

I've regularly found my own PCs have rebooted overnight due to these faulty tasks.


I've never considered that being the cause of a reboot before...hmmmmm light bulb going off icon needed!!!


The only reboots I've had is that criminally auto-rebooting Windows 10. I've thwarted that though. My updates are "managed by my organisation" or so it thinks.


That's funny....you actually thinking MS gives a crap about what YOU, or your organization, wants to do with THEIR software. I hope it works for you I really really do but past history suggests MS just ups the priority of their updates and you get unwanted ones anyway because it serves their tracking needs.
ID: 101405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,422,922
RAC: 13,431
Message 101410 - Posted: 21 Apr 2021, 0:02:40 UTC - in response to Message 101401.  

In addition, tasks with the names "miniprotein_relax8" and "_abinitio_1_abinitio_" have been deleted from the queue and another bad batch they noticed before we informed them of these two.
Hopefully we'll all see a lot less crashes than we have recently.
I've regularly found my own PCs have rebooted overnight due to these faulty tasks.
That's odd, I've never had a computer crash due to a faulty task from any project. A whole machine going down from one program error, that's a Windows XP problem isn't it?

It never did with my previous PC - and after the removal of these tasks it didn't happen last night either - but while those particular tasks were running and crashing, they took out every other task of any type and the whole PC with it.
Maybe it's just me.

Anyway, it seems to have stopped now
ID: 101410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 99 · 100 · 101 · 102 · 103 · 104 · 105 . . . 276 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org