Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 302 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2126
Credit: 41,253,494
RAC: 7,932
Message 91554 - Posted: 15 Jan 2020, 17:54:50 UTC - in response to Message 91549.  

Bump. Hand to mouth over the last 24hrs.
Though tbf this is pretty typical for January
New tasks are a bit stopstart today. No tasks earlier today, then a batch of new ones came through, but empty again right now.

ID: 91554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91555 - Posted: 15 Jan 2020, 18:06:22 UTC - in response to Message 91554.  
Last modified: 15 Jan 2020, 18:08:06 UTC

Now that you mention it, I have one core free too. A manual update did not get anything.
It appears that the Christmas holiday has come late (or early) this year.
ID: 91555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
premier

Send message
Joined: 30 Dec 05
Posts: 14
Credit: 23,872,868
RAC: 0
Message 91557 - Posted: 16 Jan 2020, 7:31:43 UTC - in response to Message 91555.  

Same Here. 11 Machines are boring ATM. Not getting new job.
ID: 91557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2126
Credit: 41,253,494
RAC: 7,932
Message 91573 - Posted: 18 Jan 2020, 1:35:00 UTC - in response to Message 91554.  

Bump. Hand to mouth over the last 24hrs.
Though tbf this is pretty typical for January
New tasks are a bit stopstart today. No tasks earlier today, then a batch of new ones came through, but empty again right now.

A fair few tasks became available today, but it seems our buffers are so empty they all got taken and we're back to zero again.

Some efforts seem to have made, but more still needed
ID: 91573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Igor Kurpis
Avatar

Send message
Joined: 13 Jan 20
Posts: 2
Credit: 103,514
RAC: 0
Message 91601 - Posted: 23 Jan 2020, 21:55:57 UTC

Hello!

I was looking at my results recently I there is something odd in output of Rosetta task. Don't know if I should care or if there is a way to resolve this.

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
ter: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
[...]
** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu': free(): invalid pointer: 0x0000000005f08e93 ***
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures    28302 cpu seconds
This process generated     41 decoys from      41 attempts
======================================================
BOINC :: WS_max 3.91049e+08

BOINC :: Watchdog shutting down...
03:06:19 (14253): called boinc_finish(0)

</stderr_txt>
]]>


1118219840

Task itself completed succesfuly and has been validated, but still those errors doesn't look like something normal. In MiniRosetta there are no such errors.[/url]
ID: 91601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2126
Credit: 41,253,494
RAC: 7,932
Message 91630 - Posted: 31 Jan 2020, 1:25:06 UTC
Last modified: 31 Jan 2020, 1:26:04 UTC

Validators have been down for about 12 hours

Anyone around to give bwsrv2 a kick?

93k tasks awaiting validation - 30 of them mine
ID: 91630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2126
Credit: 41,253,494
RAC: 7,932
Message 91634 - Posted: 31 Jan 2020, 14:12:36 UTC - in response to Message 91630.  

Validators have been down for about 12 hours

Anyone around to give bwsrv2 a kick?

93k tasks awaiting validation - 30 of them mine

No change after 24hrs - bwsrv2 still down

185, 069 tasks awaiting validation
ID: 91634 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91635 - Posted: 31 Jan 2020, 15:18:57 UTC - in response to Message 91634.  

Thanks for reporting. I normally would not notice. I trust it is not a big deal, but maybe maintenance on a server or something.
However, it helps the crunchers to have a Plan B in mind.
ID: 91635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2126
Credit: 41,253,494
RAC: 7,932
Message 91639 - Posted: 31 Jan 2020, 23:05:20 UTC - in response to Message 91635.  

Thanks for reporting. I normally would not notice. I trust it is not a big deal, but maybe maintenance on a server or something.
However, it helps the crunchers to have a Plan B in mind.

No shortage of tasks throughout, just awarding credit

But all solved now and no tasks awaiting validation - all caught up, thanks
ID: 91639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 91659 - Posted: 7 Feb 2020, 19:32:46 UTC

Is your download server having problems?

My computer has been trying to download a rather small input file for many hours, and fails every time.

10v1nmgb_c724_10mer_gb_000434.zip

It looks like it won't download any more tasks until after it gets this input file.
ID: 91659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 91660 - Posted: 7 Feb 2020, 19:48:42 UTC

2/7/2020 12:22:52 PM | | Project communication failed: attempting access to reference site
2/7/2020 12:22:52 PM | Rosetta@home | Temporarily failed download of 10v1nmgb_c724_10mer_gb_000434.zip: transient HTTP error
2/7/2020 12:22:52 PM | Rosetta@home | Backing off 03:13:23 on download of 10v1nmgb_c724_10mer_gb_000434.zip
2/7/2020 12:22:54 PM | | Internet access OK - project servers may be temporarily down.
ID: 91660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91661 - Posted: 7 Feb 2020, 22:29:32 UTC - in response to Message 91659.  

It looks like it won't download any more tasks until after it gets this input file.

If it is holding up your machine, I think I would let the current tasks finish, detach, and try again.
ID: 91661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 91666 - Posted: 8 Feb 2020, 23:05:51 UTC - in response to Message 91661.  

It looks like it won't download any more tasks until after it gets this input file.

If it is holding up your machine, I think I would let the current tasks finish, detach, and try again.

How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start?

It's doing more for all the other BOINC projects I have selected that offer CPU tasks but no GPU tasks, though.
ID: 91666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91667 - Posted: 9 Feb 2020, 1:16:23 UTC - in response to Message 91666.  
Last modified: 9 Feb 2020, 1:48:56 UTC

How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start?

You detach and end its misery. Sometimes a reboot works though.
ID: 91667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,150
Message 91668 - Posted: 9 Feb 2020, 4:18:28 UTC - in response to Message 91667.  

How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start?

You detach and end its misery. Sometimes a reboot works though.

A restart followed by telling BOINC to retry the download finally helped. The file downloaded, and the task is now ready to start. Previously, telling BOINC to retry the download without the Windows restart didn't help.
ID: 91668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91673 - Posted: 10 Feb 2020, 17:14:49 UTC - in response to Message 91668.  
Last modified: 10 Feb 2020, 17:29:19 UTC

I had the same problem with a stuck download, and a reboot fixed it for me too. But that practically never happens. So the fact that it is happening more often now indicates to me that their servers are overloaded.
I will take a machine off.

And if they want to tell us otherwise, I will listen.
ID: 91673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,101,436
RAC: 16,911
Message 91676 - Posted: 12 Feb 2020, 7:46:37 UTC - in response to Message 91668.  
Last modified: 12 Feb 2020, 7:47:32 UTC

If downloading retry does not help - aborting file transfer will usually work.
Corresponding task will fail, but BOINC is smart enough to abort such tasks without trying to run it.
So no any computation is wasted.

P.S.
I also have few stuck files in last few days (previous such case was about a year ago).
I think one of the files was exactly the same file. And BOINC also stop getting new work from R@H until i have noticed it today and aborted stuck file transfer.

One of tasks with "stuck" downloads: https://boinc.bakerlab.org/rosetta/result.php?resultid=1121514493
ID: 91676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91684 - Posted: 12 Feb 2020, 22:53:42 UTC - in response to Message 91676.  

I just had to abort one on my best machine, a Ryzen 3700x. A reboot did not fix it.
Rosetta is beginning to lose some of its attraction for me. It was always a set-and-forget project. The errors were minor, and did not hang anything up.

And explanation would be useful, as unlikely as that it.
ID: 91684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,847,810
RAC: 7,763
Message 91694 - Posted: 13 Feb 2020, 22:29:02 UTC
Last modified: 13 Feb 2020, 22:35:59 UTC

I'm getting lots of downloads (always a 3kB zip file) that stick (on all 4 computers). A temporary workaround seems to be to abort the task (not the download), then update the project so the project acknowledges you don't want that task that you can't get. It will then get others instead. But it's happening quite a lot. Unless I'm on holiday, I have a permanent monitor beside me showing what all my computers are doing on Boinc (using Boinctasks), but I'm sure many people won't check their machines that often. And if that download failed for me, will it fail for the next person it gives it to, and so on?

Also I seem to have quite a high percentage of "error while computing" on all 4 machines (about a third of them). Is this normal or should I be trying to tweak something? I know with LHC@home an update to virtual machine fixed it.
ID: 91694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91695 - Posted: 13 Feb 2020, 23:35:02 UTC - in response to Message 91694.  

And if that download failed for me, will it fail for the next person it gives it to, and so on?

I am wondering whether it is related to the high memory requirements of some of the files recently.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510

Probably they are two different things, but I will monitor the amount of available memory the next time I see one stuck.
ID: 91695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 302 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org