Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 97 · 98 · 99 · 100 · 101 · 102 · 103 . . . 309 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101285 - Posted: 13 Apr 2021, 16:17:37 UTC - in response to Message 101263.  

/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample.
It's possible workunits are failing on random machines. Universe sends out tasks that occasionally fail on an Android device. They almost always work fine on the next one, and there's no pattern to which devices cause it to fail. Unfortunately with those, "fail" means to run forever, with the % looping back to 0 again. So I have to manually spot one that's taken too long and cancel it.
ID: 101285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101286 - Posted: 13 Apr 2021, 16:21:43 UTC - in response to Message 101264.  

you failed to quote enough text so I knew what the conversation was about.
There was enough for you to recognize that I was replying to you, but not enough for you to remember what we were talking about, from a conversation within the past 24 hours, even though you knew it was you. Got it.
Of course. Seeing a sentence that I've read before makes it very easy to know who was talking to who. But that doesn't mean I can remember what it was in reply to. You either have a better memory than me or are in less conversations at once. I'm in several project forums, the main Boinc forum, games forums, Windows forums, pet forums, DIY forums, along with general purpose forums like newgroups, Reddit, Quora, etc.

Just between us girls
Speak for yourself, I'm above average size.

isn't the real issue here the same as the one with "dood" and "@": you're immensely irritated at some features of my posting style. Including quoting only the essence of an exchange.
No, I just can't remember everything that's ever been written.

I think Letterman said it best: "An old man in a bathrobe on his front porch, shaking his fist at passing cars."
I think you'll find it's me on the receiving end of shaking fists when driving.
ID: 101286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101287 - Posted: 13 Apr 2021, 16:23:18 UTC - in response to Message 101268.  

I've thought of a possible reason why some tasks are set to ask for 6 GB of memory. Quite a bit more is loaded to produce a core dump if they fail, but isn't needed if they don't fail. Not the best idea, but possible.
You say "some", but until I changed my 8GB machines to use 100% RAM, which let the Rosettas in, I got zero tasks for Rosetta at all.
ID: 101287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101288 - Posted: 13 Apr 2021, 16:27:29 UTC - in response to Message 101272.  

My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.
The issue is the photo editor.
That was just an example. Even MS Word and MS Mail can't run at once without severe (get a cup of coffee) swap file activity.

I know several people running Windows 10 systems with 4GB of RAM with no issues (i was one for quite some time myself). Of course if you use software that requires huge amounts of RAM to do the work it needs to do- such as photo editing- then you need a system with the appropriate amount of RAM. That has always been the case.
Actually she uses an old JASC Paintshop Pro 7 I gave her, which is not RAM hungry. When Coral took it over they bloated it. On my Ryzen 9 3900XT with an SSD and 64GB RAM, the latest Paintshop Pro takes 30 seconds just to open! When I complained in their forums, they thought that was an acceptable time!

It also helps (a massive amount) if you have a SSD and not a HDD.
That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly.
ID: 101288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101289 - Posted: 13 Apr 2021, 16:28:49 UTC - in response to Message 101275.  

Still "- _abinitio_1_abinitio_" wus error.
Please, stop these wus
1) They're not listening, they don't read this forum.
2) Does it matter? They'll see the errors when they look at the results.
3) They might be learning what's wrong from us returning lots of errors.

So I'm just letting it do what it likes.
ID: 101289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101290 - Posted: 13 Apr 2021, 16:29:50 UTC - in response to Message 101276.  

My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.
The issue is the photo editor.
I know several people running Windows 10 systems with 4GB of RAM with no issues (i was one for quite some time myself). Of course if you use software that requires huge amounts of RAM to do the work it needs to do- such as photo editing- then you need a system with the appropriate amount of RAM. That has always been the case.
It also helps (a massive amount) if you have a SSD and not a HDD.
I’ve just upgraded my Lenovo L520 Win10 laptop from 2gb to its max of 4gb and whilst it’s slightly faster it still runs fine with Firefox, boinctasksjs and libre office calc as its normal workload. My one failure has been to get ms team to access the built in mic - it sees it ok but I cannot get any volume from it.
You must be a lot more patient than either me or my Aunt.
ID: 101290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 101292 - Posted: 13 Apr 2021, 17:11:02 UTC - in response to Message 101270.  

/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample.

I am also running on a Mac. The mini protein_relax8 units also do complete after ~18.7 hours and provide credit; however, the credit is in the "two-hundred" range for 67,000+ seconds of work. So, I've gone in and aborted all of the "ready to start" mini protein_relax8 units and now I have all pre-helical-bundles_round1_attempt1 queued up.



Yea, I suppose I could do that but I'm honestly here for the science and if Reddit has taught me anything, internet points aren't worth anything. 8-). The long WU's are producing results, and that might be helpful to researchers. So I let them run.

I've got a few long units now going on my 36hr boxes, they are coming up on the 46hour cutoff, I wonder what results they will provide.
ID: 101292 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Martin

Send message
Joined: 9 Oct 05
Posts: 23
Credit: 1,443,682
RAC: 585
Message 101295 - Posted: 14 Apr 2021, 0:47:59 UTC

No WU's will start. Some examples: Pre-helical-bundles
TMWFY3V

Have tried aborting the first batch, but the second one, of 15 WU's, also did not run.

This has happened, once before, a couple of weeks, ago.

jm
ID: 101295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 101297 - Posted: 14 Apr 2021, 1:49:13 UTC

Several tasks that failed in under 30 seconds each,

All have abinitio_1_abinitio in their task names.

Each gives the error message:

ERROR: ERROR: FragmentIO: could not open file 00001.500.6mers

It looks like the project should cancel all of the workunits with 00001.500.6mers in their command lines, then rebuild all of those workunits to send file 00001.500.6mers along with the workunit instead of trying to extract it from the database.
ID: 101297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 101300 - Posted: 14 Apr 2021, 7:17:07 UTC - in response to Message 101286.  

Just between us girls
Speak for yourself, I'm above average size.

The frequency with which you talk about "size" in your posts to me -- and only me -- is disturbing.

I think you'll find it's me on the receiving end


This is a detail about you that I didn't really need to know.
ID: 101300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,047
RAC: 1,450
Message 101302 - Posted: 14 Apr 2021, 12:32:12 UTC - in response to Message 101288.  

My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.
The issue is the photo editor.
That was just an example. Even MS Word and MS Mail can't run at once without severe (get a cup of coffee) swap file activity.

I know several people running Windows 10 systems with 4GB of RAM with no issues (i was one for quite some time myself). Of course if you use software that requires huge amounts of RAM to do the work it needs to do- such as photo editing- then you need a system with the appropriate amount of RAM. That has always been the case.
Actually she uses an old JASC Paintshop Pro 7 I gave her, which is not RAM hungry. When Coral took it over they bloated it. On my Ryzen 9 3900XT with an SSD and 64GB RAM, the latest Paintshop Pro takes 30 seconds just to open! When I complained in their forums, they thought that was an acceptable time!

It also helps (a massive amount) if you have a SSD and not a HDD.


Peter Hucker wrote: That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly.


They are cheap nowadays, get a cheap 250gb one and just replace once it's done it's 100 thousand/million hours or what it is, the next one will be even cheaper
ID: 101302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 101304 - Posted: 14 Apr 2021, 16:18:26 UTC - in response to Message 101295.  

No WU's will start. Some examples: Pre-helical-bundles
TMWFY3V

Have tried aborting the first batch, but the second one, of 15 WU's, also did not run.

This has happened, once before, a couple of weeks, ago.

jm


This happened to me. Initially they appeared to be hung on "waiting to start" but I let them sit for awhile (about 15 minutes) and they did eventually start on their own. Let them sit for a bit and see if the same happens to you.

When I say "I let them sit for awhile" I mean I tinkered with them doing all the normal diagnostics (Suspend, resume, change to run always etc). After tinkering apparently did nothing to help, I put all my settings back at my normal defaults. As I was pondering what to do next I got distracted by something un-related and walked away from my machine. When I returned about 15 minutes later they had all started up on their own. So I guess patience might be the trick.
ID: 101304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 16
Credit: 1,243,749
RAC: 0
Message 101308 - Posted: 15 Apr 2021, 16:41:51 UTC - in response to Message 101304.  

I had that with a task.
Checked it's requirements in client_state.xml and it wanted 4000000000 memory (PC had only 4G).
So it would only have run alone.
I shut down BOINC and edited to 2000000000 and it ran with max about 3.5G alongside other tasks.

Paul.
ID: 101308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101309 - Posted: 15 Apr 2021, 17:30:18 UTC - in response to Message 101292.  

Yea, I suppose I could do that but I'm honestly here for the science and if Reddit has taught me anything, internet points aren't worth anything. 8-). The long WU's are producing results, and that might be helpful to researchers. So I let them run.

I've got a few long units now going on my 36hr boxes, they are coming up on the 46hour cutoff, I wonder what results they will provide.
I've set no new work for the other projects, and increased the buffer to half a day. This seems to make all my 90 CPU cores run Rosetta continuously. When I had a 0+3 hour buffer, I was running out of Rosetta and it was refusing to get any more. I think Boinc backs off if it sees a computation error, so it doesn't flood a server with mistakes.
ID: 101309 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101310 - Posted: 15 Apr 2021, 17:37:13 UTC - in response to Message 101300.  

Just between us girls
Speak for yourself, I'm above average size.

The frequency with which you talk about "size" in your posts to me -- and only me -- is disturbing.
Don't get your hopes up, it just happens you're the only one that called me a girl.

I think you'll find it's me on the receiving end


This is a detail about you that I didn't really need to know.
Taking things out of context for comic effect? Don't give up the day job.
ID: 101310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 101311 - Posted: 15 Apr 2021, 17:38:05 UTC - in response to Message 101302.  
Last modified: 15 Apr 2021, 17:38:26 UTC

It also helps (a massive amount) if you have a SSD and not a HDD.


Peter Hucker wrote: That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly.


They are cheap nowadays, get a cheap 250gb one and just replace once it's done it's 100 thousand/million hours or what it is, the next one will be even cheaper

Replacing a disk is a nuisance when you lose files and/or have to set up the OS again. RAM is way faster anyway.
ID: 101311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 101312 - Posted: 15 Apr 2021, 21:51:09 UTC

To whoever took the database down and removed all the ****, thank you.
ID: 101312 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PorkyPies

Send message
Joined: 6 Apr 20
Posts: 45
Credit: 1,650,779
RAC: 0
Message 101313 - Posted: 16 Apr 2021, 2:29:34 UTC
Last modified: 16 Apr 2021, 3:24:35 UTC

A new issue.

Pi4 8GB running 3 Rosetta and 2 Einstein tasks. It has swapped the 2nd Rosetta and an Einstein out with a message "Waiting for memory". The currently running Rosetta task are using 1746MB and 665MB and the two Einsteins are 205MB each. According to top its still got 3GB of free memory. When I spotted it it was running 1 Rosetta and 2 Einstein with the 2nd Rosetta swapped out.

I wonder if its related to their requirement for 6.6GB free memory and BOINC is using that rather than the actual memory usage. Its BOINC 7.16.11
top - 12:21:34 up 12 days, 13:21,  1 user,  load average: 3.12, 3.04, 3.01
Tasks: 107 total,   4 running, 103 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.1 sy, 75.0 ni, 25.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7863.2 total,   3338.0 free,   2422.8 used,   2102.4 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.   5302.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
16113 boinc     39  19  205.0m 193.2m   4.9m R 100.0   2.5 121:50.61 einsteinb+
16093 boinc     39  19 1746.9m   1.6g 113.8m R  99.7  20.8 141:56.59 rosetta_4+
16262 boinc     39  19  665.5m 547.2m 105.0m R  99.7   7.0   8:20.99 rosetta_4+
  377 boinc     30  10  118.0m  18.9m  11.0m S   0.0   0.2  53:44.23 boinc
16125 boinc     39  19  205.0m 193.2m   5.0m S   0.0   2.5  92:55.58 einsteinb+


Update: A while later and the 1st Rosetta task failed so its back to normal. The Windows host that also ran it errored out so looks like a dud work unit. It was a miniprotein_relax8.
MarksRpiCluster
ID: 101313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 101314 - Posted: 16 Apr 2021, 4:17:53 UTC - in response to Message 101311.  

It also helps (a massive amount) if you have a SSD and not a HDD.


Peter Hucker wrote: That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly.


They are cheap nowadays, get a cheap 250gb one and just replace once it's done it's 100 thousand/million hours or what it is, the next one will be even cheaper

Replacing a disk is a nuisance when you lose files and/or have to set up the OS again. RAM is way faster anyway.

RAM loses its data every time you turn the power off.
ID: 101314 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 101315 - Posted: 16 Apr 2021, 6:54:17 UTC - in response to Message 101288.  

It also helps (a massive amount) if you have a SSD and not a HDD.
That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly.
No, it won't. Even with a ridiculously excessive amount of swap file usage, the actual volume of writes as a percentage of the drive's total capacity will only be a small fraction of the drive's rated DWPD (Drive Writes Per Day), and as long as there is plenty of free space on the drive the drive controller's wear levelling will also significantly prolong the life of the drive even further.

Yes, after several decades the drive will fail from all those writes due to the lack of system RAM, but most people don't keep their systems in use for that length of time.
Grant
Darwin NT
ID: 101315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 97 · 98 · 99 · 100 · 101 · 102 · 103 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org