Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 97 · 98 · 99 · 100 · 101 · 102 · 103 . . . 309 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample.It's possible workunits are failing on random machines. Universe sends out tasks that occasionally fail on an Android device. They almost always work fine on the next one, and there's no pattern to which devices cause it to fail. Unfortunately with those, "fail" means to run forever, with the % looping back to 0 again. So I have to manually spot one that's taken too long and cancel it. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
Of course. Seeing a sentence that I've read before makes it very easy to know who was talking to who. But that doesn't mean I can remember what it was in reply to. You either have a better memory than me or are in less conversations at once. I'm in several project forums, the main Boinc forum, games forums, Windows forums, pet forums, DIY forums, along with general purpose forums like newgroups, Reddit, Quora, etc.you failed to quote enough text so I knew what the conversation was about.There was enough for you to recognize that I was replying to you, but not enough for you to remember what we were talking about, from a conversation within the past 24 hours, even though you knew it was you. Got it. Just between us girlsSpeak for yourself, I'm above average size. isn't the real issue here the same as the one with "dood" and "@": you're immensely irritated at some features of my posting style. Including quoting only the essence of an exchange.No, I just can't remember everything that's ever been written. I think Letterman said it best: "An old man in a bathrobe on his front porch, shaking his fist at passing cars."I think you'll find it's me on the receiving end of shaking fists when driving. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
I've thought of a possible reason why some tasks are set to ask for 6 GB of memory. Quite a bit more is loaded to produce a core dump if they fail, but isn't needed if they don't fail. Not the best idea, but possible.You say "some", but until I changed my 8GB machines to use 100% RAM, which let the Rosettas in, I got zero tasks for Rosetta at all. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
That was just an example. Even MS Word and MS Mail can't run at once without severe (get a cup of coffee) swap file activity.My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.The issue is the photo editor. I know several people running Windows 10 systems with 4GB of RAM with no issues (i was one for quite some time myself). Of course if you use software that requires huge amounts of RAM to do the work it needs to do- such as photo editing- then you need a system with the appropriate amount of RAM. That has always been the case.Actually she uses an old JASC Paintshop Pro 7 I gave her, which is not RAM hungry. When Coral took it over they bloated it. On my Ryzen 9 3900XT with an SSD and 64GB RAM, the latest Paintshop Pro takes 30 seconds just to open! When I complained in their forums, they thought that was an acceptable time! It also helps (a massive amount) if you have a SSD and not a HDD.That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
Still "- _abinitio_1_abinitio_" wus error.1) They're not listening, they don't read this forum. 2) Does it matter? They'll see the errors when they look at the results. 3) They might be learning what's wrong from us returning lots of errors. So I'm just letting it do what it likes. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
You must be a lot more patient than either me or my Aunt.I’ve just upgraded my Lenovo L520 Win10 laptop from 2gb to its max of 4gb and whilst it’s slightly faster it still runs fine with Firefox, boinctasksjs and libre office calc as its normal workload. My one failure has been to get ms team to access the built in mic - it sees it ok but I cannot get any volume from it.My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.The issue is the photo editor. |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample. Yea, I suppose I could do that but I'm honestly here for the science and if Reddit has taught me anything, internet points aren't worth anything. 8-). The long WU's are producing results, and that might be helpful to researchers. So I let them run. I've got a few long units now going on my 36hr boxes, they are coming up on the 46hour cutoff, I wonder what results they will provide. |
Jim Martin Send message Joined: 9 Oct 05 Posts: 23 Credit: 1,443,682 RAC: 713 |
No WU's will start. Some examples: Pre-helical-bundles TMWFY3V Have tried aborting the first batch, but the second one, of 15 WU's, also did not run. This has happened, once before, a couple of weeks, ago. jm |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,456 |
Several tasks that failed in under 30 seconds each, All have abinitio_1_abinitio in their task names. Each gives the error message: ERROR: ERROR: FragmentIO: could not open file 00001.500.6mers It looks like the project should cancel all of the workunits with 00001.500.6mers in their command lines, then rebuild all of those workunits to send file 00001.500.6mers along with the workunit instead of trying to extract it from the database. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Just between us girlsSpeak for yourself, I'm above average size. The frequency with which you talk about "size" in your posts to me -- and only me -- is disturbing. I think you'll find it's me on the receiving end This is a detail about you that I didn't really need to know. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,768 |
That was just an example. Even MS Word and MS Mail can't run at once without severe (get a cup of coffee) swap file activity.My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.The issue is the photo editor. They are cheap nowadays, get a cheap 250gb one and just replace once it's done it's 100 thousand/million hours or what it is, the next one will be even cheaper |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
No WU's will start. Some examples: Pre-helical-bundles This happened to me. Initially they appeared to be hung on "waiting to start" but I let them sit for awhile (about 15 minutes) and they did eventually start on their own. Let them sit for a bit and see if the same happens to you. When I say "I let them sit for awhile" I mean I tinkered with them doing all the normal diagnostics (Suspend, resume, change to run always etc). After tinkering apparently did nothing to help, I put all my settings back at my normal defaults. As I was pondering what to do next I got distracted by something un-related and walked away from my machine. When I returned about 15 minutes later they had all started up on their own. So I guess patience might be the trick. |
PMH_UK Send message Joined: 9 Aug 08 Posts: 16 Credit: 1,243,749 RAC: 0 |
I had that with a task. Checked it's requirements in client_state.xml and it wanted 4000000000 memory (PC had only 4G). So it would only have run alone. I shut down BOINC and edited to 2000000000 and it ran with max about 3.5G alongside other tasks. Paul. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
Yea, I suppose I could do that but I'm honestly here for the science and if Reddit has taught me anything, internet points aren't worth anything. 8-). The long WU's are producing results, and that might be helpful to researchers. So I let them run.I've set no new work for the other projects, and increased the buffer to half a day. This seems to make all my 90 CPU cores run Rosetta continuously. When I had a 0+3 hour buffer, I was running out of Rosetta and it was refusing to get any more. I think Boinc backs off if it sees a computation error, so it doesn't flood a server with mistakes. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
Don't get your hopes up, it just happens you're the only one that called me a girl.Just between us girlsSpeak for yourself, I'm above average size. Taking things out of context for comic effect? Don't give up the day job.I think you'll find it's me on the receiving end |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
It also helps (a massive amount) if you have a SSD and not a HDD. Replacing a disk is a nuisance when you lose files and/or have to set up the OS again. RAM is way faster anyway. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 398 Credit: 12,294,748 RAC: 7,588 |
To whoever took the database down and removed all the ****, thank you. |
PorkyPies Send message Joined: 6 Apr 20 Posts: 45 Credit: 1,650,779 RAC: 0 |
A new issue. Pi4 8GB running 3 Rosetta and 2 Einstein tasks. It has swapped the 2nd Rosetta and an Einstein out with a message "Waiting for memory". The currently running Rosetta task are using 1746MB and 665MB and the two Einsteins are 205MB each. According to top its still got 3GB of free memory. When I spotted it it was running 1 Rosetta and 2 Einstein with the 2nd Rosetta swapped out. I wonder if its related to their requirement for 6.6GB free memory and BOINC is using that rather than the actual memory usage. Its BOINC 7.16.11 top - 12:21:34 up 12 days, 13:21, 1 user, load average: 3.12, 3.04, 3.01 Tasks: 107 total, 4 running, 103 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.1 sy, 75.0 ni, 25.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 7863.2 total, 3338.0 free, 2422.8 used, 2102.4 buff/cache MiB Swap: 100.0 total, 100.0 free, 0.0 used. 5302.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16113 boinc 39 19 205.0m 193.2m 4.9m R 100.0 2.5 121:50.61 einsteinb+ 16093 boinc 39 19 1746.9m 1.6g 113.8m R 99.7 20.8 141:56.59 rosetta_4+ 16262 boinc 39 19 665.5m 547.2m 105.0m R 99.7 7.0 8:20.99 rosetta_4+ 377 boinc 30 10 118.0m 18.9m 11.0m S 0.0 0.2 53:44.23 boinc 16125 boinc 39 19 205.0m 193.2m 5.0m S 0.0 2.5 92:55.58 einsteinb+ Update: A while later and the 1st Rosetta task failed so its back to normal. The Windows host that also ran it errored out so looks like a dud work unit. It was a miniprotein_relax8. MarksRpiCluster |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,456 |
It also helps (a massive amount) if you have a SSD and not a HDD. RAM loses its data every time you turn the power off. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,378,164 RAC: 20,578 |
No, it won't. Even with a ridiculously excessive amount of swap file usage, the actual volume of writes as a percentage of the drive's total capacity will only be a small fraction of the drive's rated DWPD (Drive Writes Per Day), and as long as there is plenty of free space on the drive the drive controller's wear levelling will also significantly prolong the life of the drive even further.It also helps (a massive amount) if you have a SSD and not a HDD.That's a workaround, although using the SSD heavily for a swapfile will wear it out very quickly. Yes, after several decades the drive will fail from all those writes due to the lack of system RAM, but most people don't keep their systems in use for that length of time. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org