Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 91 · 92 · 93 · 94 · 95 · 96 · 97 . . . 257 · Next

AuthorMessage
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101086 - Posted: 5 Apr 2021, 22:44:26 UTC - in response to Message 101083.  

Isn't there a board or site somewhere that explains what "they" are doing when changes occur in the project that affect the donors in significant ways?
There is. You’re on it. But it seems that Rosetta@home has been so successful for so long that “they” no longer feel any need to come here. The supply of computing resources appears to be taken for granted. Researchers simply throw tasks over the wall; as far as we know they do something with the results afterwards. That a batch of work can get sent out configured in a way that cuts off a third of the project’s capacity, without anybody noticing or taking any corrective action even though the problem was instantly noticed and reported by participants, leaves it quite evident that nobody is actively monitoring this project.
ID: 101086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1902
Credit: 35,032,850
RAC: 4,767
Message 101087 - Posted: 6 Apr 2021, 3:03:44 UTC - in response to Message 101082.  

[snip]
I wish that were true, but it isn't.
I have a 4-core i3-8350K with 16Gb RAM and lots of disk space but it hasn't been able to download or run any Rosetta tasks - only WCG backup project tasks, of which it now has a dozen.
While my main desktop 16-core 32Gb RAM hasn't had any problem.
If things work, it's like no problem exists. If they don't, it's like nothing will fix it.

Have you checked how much of the RAM is reserved for the operating system (Windows, Linux. etc.)?

Have you checked your settings for have much RAM and how much disk space BOINC is allowed to use?

I'm sure you're right about both those things, but this is an unattended PC in a room close to where some building work has been done over 2 or 3 months and when I got to it last month it was so covered in cr*p the display no longer works, so I can neither check nor modify it. I'll bring it home with me when I visit next week and try to clean it out and resolve the problem when I can dedicate some proper time to it.
It's running WCG just fine for now, so it's not the worst thing in the world. I'll get there eventually. Thanks for offering help.
ID: 101087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1380
Credit: 13,693,695
RAC: 291
Message 101088 - Posted: 6 Apr 2021, 6:24:05 UTC - in response to Message 101087.  

[snip]
I wish that were true, but it isn't.
I have a 4-core i3-8350K with 16Gb RAM and lots of disk space but it hasn't been able to download or run any Rosetta tasks - only WCG backup project tasks, of which it now has a dozen.
While my main desktop 16-core 32Gb RAM hasn't had any problem.
If things work, it's like no problem exists. If they don't, it's like nothing will fix it.

Have you checked how much of the RAM is reserved for the operating system (Windows, Linux. etc.)?

Have you checked your settings for have much RAM and how much disk space BOINC is allowed to use?
I'm sure you're right about both those things, but this is an unattended PC in a room close to where some building work has been done over 2 or 3 months and when I got to it last month it was so covered in cr*p the display no longer works, so I can neither check nor modify it.
Unless you have set it to use local settings, it will use whatever you have set in your account's Computing preferences section.
Grant
Darwin NT
ID: 101088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1380
Credit: 13,693,695
RAC: 291
Message 101089 - Posted: 6 Apr 2021, 6:28:38 UTC - in response to Message 101077.  

It seems the RAM problem is not yet solved. I just updated Rosetta now and got the same complaint about needing 6 GB of RAM and only having 3 GB.
It's not a problem, it's just some tasks needing more RAM.
It is a problem when a Work Unit says it it will need 6+GB of RAM, when it really only needs 300MB (or less), as that results in almost a third of the projects computing resources becoming unavailable.
Grant
Darwin NT
ID: 101089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1380
Credit: 13,693,695
RAC: 291
Message 101090 - Posted: 6 Apr 2021, 6:31:50 UTC
Last modified: 6 Apr 2021, 6:34:01 UTC

And to add to the lack of new work, we had another group of Tasks that had many which crashed and burned in a matter of seconds.
And those that didn't error out, only needed 3 hours to reach their end.

No wonder that last batch of new work went so quickly.
Grant
Darwin NT
ID: 101090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1902
Credit: 35,032,850
RAC: 4,767
Message 101091 - Posted: 6 Apr 2021, 7:00:48 UTC - in response to Message 101088.  

[snip]
I wish that were true, but it isn't.
I have a 4-core i3-8350K with 16Gb RAM and lots of disk space but it hasn't been able to download or run any Rosetta tasks - only WCG backup project tasks, of which it now has a dozen.
While my main desktop 16-core 32Gb RAM hasn't had any problem.
If things work, it's like no problem exists. If they don't, it's like nothing will fix it.

Have you checked how much of the RAM is reserved for the operating system (Windows, Linux. etc.)?

Have you checked your settings for have much RAM and how much disk space BOINC is allowed to use?
I'm sure you're right about both those things, but this is an unattended PC in a room close to where some building work has been done over 2 or 3 months and when I got to it last month it was so covered in cr*p the display no longer works, so I can neither check nor modify it.
Unless you have set it to use local settings, it will use whatever you have set in your account's Computing preferences section.

Oh! That's a good idea. It does continue to attempt connections, so definitely worth a try.
I have no idea whether I've set it to local or web preferences (probably local tbh) but I've now edited my web preferences both here and at WCG just in case.
ID: 101091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
yo2020

Send message
Joined: 2 Jan 21
Posts: 2
Credit: 111,170
RAC: 0
Message 101092 - Posted: 6 Apr 2021, 7:45:31 UTC - in response to Message 101086.  

Isn't there a board or site somewhere that explains what "they" are doing when changes occur in the project that affect the donors in significant ways?
There is. You’re on it. But it seems that Rosetta@home has been so successful for so long that “they” no longer feel any need to come here. The supply of computing resources appears to be taken for granted. Researchers simply throw tasks over the wall; as far as we know they do something with the results afterwards. That a batch of work can get sent out configured in a way that cuts off a third of the project’s capacity, without anybody noticing or taking any corrective action even though the problem was instantly noticed and reported by participants, leaves it quite evident that nobody is actively monitoring this project.


Yeah, and now the queue is empty and nobody cares to give an explanation or at least say "yeah, we know, we're working on it".
ID: 101092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1663
Credit: 6,521,035
RAC: 840
Message 101093 - Posted: 6 Apr 2021, 7:50:02 UTC - in response to Message 101090.  

No wonder that last batch of new work went so quickly.

Wus with bugs, empty queues, lack of communications.
This is why i reduce my cpu time in this project...
ID: 101093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 2,043
Message 101094 - Posted: 6 Apr 2021, 11:44:59 UTC - in response to Message 101093.  

This is why i reduce my cpu time in this project...
I used to run Rosetta by itself. Now I run SiDock too (both at 100%) and don't have to worry about switching anything.
Of course, Rosetta then gets less total CPU time, but it appears that they don't need it/can't use it anyway.
ID: 101094 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1663
Credit: 6,521,035
RAC: 840
Message 101096 - Posted: 6 Apr 2021, 12:01:23 UTC - in response to Message 101094.  

Of course, Rosetta then gets less total CPU time, but it appears that they don't need it/can't use it anyway.

It doesn't take much to keep the volunteers...
ID: 101096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom

Send message
Joined: 17 May 20
Posts: 2
Credit: 82,463
RAC: 0
Message 101097 - Posted: 6 Apr 2021, 13:11:21 UTC

I must agree with Brian, pretty disappointing that I have yet to see a project admin come in even acknowledging there is an issue. I'll give it to the end of the week. If it doesn't appear anyone is working on the problem I'll most likely drop Rosetta and look for a different project to donate processing to.
ID: 101097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Greger

Send message
Joined: 3 May 14
Posts: 10
Credit: 44,023,781
RAC: 8,384
Message 101100 - Posted: 6 Apr 2021, 18:11:25 UTC

<message>
process exited with code 1 (0x1, -255)</message>
ERROR: ERROR: FragmentIO: could not open file 00001.500.6mers
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 233
BOINC:: Error reading and gzipping output datafile: default.out
ID: 101100 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1902
Credit: 35,032,850
RAC: 4,767
Message 101101 - Posted: 6 Apr 2021, 20:13:08 UTC - in response to Message 101076.  

Returning to my anecdote about a remote PC I have being unable to download any Rosetta tasks, so running its backup project, WCG, 24/7, my local laptop is also doing weird things. It refuses to run a particular Rosetta task, so it's running those it has room for - a combination of WCG and later Rosetta tasks, but only 3 on 4 cores. Now I know it's definitely happening, I've set NNT and suspended all running tasks except for the one problem Rosetta task. It still refuses to run, even as the only task. No tasks are running in my experiment!

So, maintaining NNT, I've found some combination of WCG and Rosetta tasks that'll run together on all 4 cores. I'll work my way through my small cache until all are completed bar the problem task and see if it runs then. If not, I'll finally abort it and just grab fresh tasks.

Bit of a weird one. Even attempting to micromanage tasks doesn't entirely work. No wonder that graph is running so much lower than it was, if I'm any example

Finally got to the end of this.
Last night I had 3 WCG tasks running (2 of which were Africa Rainfall project that use slightly more RAM, but in fact were only using 300Mb each) and my one weird Rosetta nip* task reporting "waiting for memory" on my 4-core laptop.
Looking at my Event log, it was only when the last ARP was wrapping up that sufficient RAM was available for the Rosetta task to begin running. The last ARP task completed 3 minutes later and now the Rosetta task is the only task running.
Looking at the task's properties, it's only using between 271Mb & 292Mb RAM, while earlier complaining that it needed something like 6.6Gb RAM to begin.
I'm going to wait for completion before dragging any more tasks down. Hopefully there are some new tasks available to download at that time.

The task in question is this one

With 30 minutes to go, I've allowed new tasks and 11have come down. Stage one successful.
3 of the new tasks attempt to start. Stage two successful.
2 of the new tasks are waiting for memory... Oh

I'm going out for a while. When I return the older task will have completed and I'll see if the new tasks all run ok.
The journey continues...

And the answer is... no.
Still two new Rosetta tasks running and two more new ones waiting for memory...

It seems we're a way from having a solution or correction

The saga continues after a few tasks completed.
The same two tasks refuse to run and other tasks run in their place - 2 out of 4 cores.
I suspended all tasks except the ones waiting for memory and still they refused to run [Leave non-GPU tasks in memory while suspended - checked]

I decide to abort the 2 tasks. Two new tasks run in their place and all 4 cores now in use.
That took more messing around than I expected. Let's see how it goes from now on
ID: 101101 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1902
Credit: 35,032,850
RAC: 4,767
Message 101102 - Posted: 6 Apr 2021, 20:44:38 UTC

Seeing as it seems to be whining season for the excessively-entitled it might be worth making myself as popular as usual by stating the obvious that I can't be the only one aware of.

1) Looking at my main PC, I've downloaded 54 tasks today across 10 separate downloads. Even my weirdly-running laptop managed 6 more. And my Android 4 more.
For all the comments about it (none) it seems like I'm the only one.
Of course, I'm not.

2) There are people here from all over the world, and a similar range of nationalities and creeds at UW from what I've observed, so some people may not be aware that UW is in the USA, which is a largely Christian country.
Traditionally their institutions and work-places close for the Easter holidays which have been taking place over the last4 days, from "Good Friday" to "Easter Monday" even if some of the people working there don't personally celebrate.

But, of course, during this holiday period people here expect to demand the creation and issue of sufficient work to serve at least a third of a million tasks per day to the world at large, with no respite.
And the amazing thing is, a fair few do seem to have come down. Maybe my caches aren't quite completely full, but near enough.

And as thanks, I see the usual levels of appreciation here, to wit, "if you don't supply what I need 24hrs a day, even during the holiday season, and provide chapter and verse on progress to tell us what we already know, loyalty will be shown by reducing contribution levels or leaving altogether"

Maybe you should have the occasional day off from your disgusting levels or personal entitlement too.
Though tbf, there's never any shortage of that.

Yeah, save it. I heard last time too. And the time before that. And the time before that etc
ID: 101102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,499,739
RAC: 160
Message 101103 - Posted: 6 Apr 2021, 21:53:59 UTC
Last modified: 6 Apr 2021, 22:18:15 UTC

We had an interruption in power here last night (it happens), and when I got up this AM I found one of my hosts was completely dead. So, I restarted it and when I got into BOINC, I am finding that every task that I download aborts for "Computational Error" within about 20 seconds.

While this was going on, I tried to add SiDock as another project, with the intent to diversify as some have suggested (currently running Rosetta only). That produced an error that said that I could not join because I am required to agree to the terms of service. Which I did. I tried it a couple more times to confirm it.

Now I'm wondering if the power failure messed up something in BOINC, or if the problems are coincidental and not related to me.

Any suggestions?

{EDIT} I fixed the problems with joining SiDock. But all of the tasks from that project error out just like the ones for Rosetta. So I've suspended both projects until I can figure out a solution.
ID: 101103 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,499,739
RAC: 160
Message 101104 - Posted: 6 Apr 2021, 22:04:55 UTC - in response to Message 101102.  
Last modified: 6 Apr 2021, 22:06:45 UTC

Maybe you should have the occasional day off from your disgusting levels or personal entitlement too.


You said it yourself: there's a lot of diversity here. Some people are easygoing; others are more anal for whatever reason.

Me, I have obsessive-compulsive tendencies, so it rubs me the wrong way when I can't get the idealized version of Rosetta participation that I envision. Some people might find that disgusting. I know that others dislike the use of "@" and the slag use of "dude." People have bad days sometimes, and it's easy to take it out on an online project that seems to be more machine than human.

Speaking of Easter and bad days, just for kicks, I bought some cool egg decorating kits, which contained markers, stickers, stencils, etc. I was trying to do an elaborate floral pattern on my eggs, to give it some "wow" cred, but the stencil kept slipping. I became frustrated, and then distraught.

More distraught than I am now, with one of my hosts not producing anything. More distraught than I had any right to be. But no matter how carefully I worked at it, I couldn't get the stencil to stay in the right place.

I guess that you could say I was having an eggs 'n stencil crisis.
ID: 101104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 348
Credit: 989,292
RAC: 2
Message 101105 - Posted: 6 Apr 2021, 22:11:45 UTC - in response to Message 101103.  

I received 5 Rosetta work units on my laptop today. They all errored out.

The desktop also received 8 work units, 6 of which errored out.
ID: 101105 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 333
Credit: 9,366,911
RAC: 705
Message 101106 - Posted: 6 Apr 2021, 22:17:47 UTC - in response to Message 101103.  

We had an interruption in power here last night (it happens), and when I got up this AM I found one of my hosts was completely dead. So, I restarted it and when I got into BOINC, I am finding that every task that I download aborts for "Computational Error" within about 20 seconds.

While this was going on, I tried to add SiDock as another project, with the intent to diversify as some have suggested (currently running Rosetta only). That produced an error that said that I could not join because I am required to agree to the terms of service. Which I did. I tried it a couple more times to confirm it.

Now I'm wondering if the power failure messed up something in BOINC, or if the problems are coincidental and not related to me.

Any suggestions?


At the moment every Rosetta task I get is bombing Computation Error within 25 seconds of starting.
ID: 101106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101107 - Posted: 6 Apr 2021, 22:18:06 UTC - in response to Message 101103.  
Last modified: 6 Apr 2021, 23:10:02 UTC

Ignore the …_abinitio_1_abinitio_… tasks; lots of people are reporting problems with those.

For the others, this is a clue that all is not well:
couldn't start app: Input file rosetta_4.20_x86_64-pc-linux-gnu missing or invalid: file missing
Try Reset project under Project commands (Simple view) or on the Projects tab (Advanced view).

Not sure about the SiDock issue; try the forums over there.
{EDIT, though we’re way off topic here} You’re missing some prerequisite system libraries (glibc-⁠2.27) which are newer than your version of Linux. It might be possible to install those alongside your system glibc, or it might be easier to update the whole OS. SiDock uses much more modern software than Rosetta; I have the same problem that I can’t run it on the ancient version of Windows on my crunchers…
ID: 101107 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,499,739
RAC: 160
Message 101108 - Posted: 6 Apr 2021, 23:39:21 UTC - in response to Message 101107.  

You’re missing some prerequisite system libraries (glibc-⁠2.27)


Woah, dude, you can tell that even though my system was turned off?

Did you find the horse porn folder too? That was from...uh, my kid brother.
ID: 101108 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 91 · 92 · 93 · 94 · 95 · 96 · 97 . . . 257 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org