Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 309 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 90539 - Posted: 19 Mar 2019, 23:53:13 UTC - in response to Message 90535.  

Sorry to be a bit late on this, but I did notice around 13th March I had a task consuming 2.4Gb and 14Gb of my 16Gb (total) RAM being in use to run 8 tasks.

I can't recall the tasks involved. Right now I'm back to my more usual level of 7.74Gb in use

As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday.

I'll monitor going forward and report back.

So you've got your extra RAM installed already? If it was a RAM issue (with 8Gb) you'll be fine now.

I was only indicating there were some rogue tasks around last week that may have tripped you up back then. Hopefully new tasks play nicer as standard.

Your original question was to ask if there was anything you could do - there probably wasn't at that time and you've more than covered yourself now under normal conditions.
ID: 90539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 90540 - Posted: 19 Mar 2019, 23:57:12 UTC - in response to Message 90536.  

As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday.

Yes, we are back to 8086 tasks ready to send according to the server status page which actually means 0 tasks ready to send. Maybe the admins should investigate, what those 8086 tasks are and if they eventually cause the issues.

Up to 10 minutes ago it was still showing those 8086 so that doesn't sound right.

However, I'm here to say a whole load of tasks just came down and the server status page has just changed to show an additional 20k Rosetta tasks in progress and 15k still unsent. No idea how long that will last, but there is some progress.
ID: 90540 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,229,863
RAC: 6,747
Message 90542 - Posted: 21 Mar 2019, 4:47:11 UTC
Last modified: 21 Mar 2019, 4:47:43 UTC

I was watching when a couple of the Rosetta WU failed.

They computed properly down until the TIME REMAINING was zero seconds and the compute time was 8 hours and a few minutes. Instead of reporting the completion, the WU was marked as WAITING with zero seconds remaining. When the WU restarted, it indicated a COMPUTE ERROR with the "finish file present too long</message>". The 34 failing WU seemed to all fail at the end and were 4.08 Linux WU.

https://boinc.bakerlab.org/rosetta/result.php?resultid=1063704662
ID: 90542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 90544 - Posted: 21 Mar 2019, 8:18:56 UTC

We had a good run, but no tasks left to download (and that mysterious 8086 ready to send again, whatever that is)
ID: 90544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 90546 - Posted: 21 Mar 2019, 14:26:12 UTC - in response to Message 90542.  

I was watching when a couple of the Rosetta WU failed.

They computed properly down until the TIME REMAINING was zero seconds and the compute time was 8 hours and a few minutes. Instead of reporting the completion, the WU was marked as WAITING with zero seconds remaining. When the WU restarted, it indicated a COMPUTE ERROR with the "finish file present too long</message>". The 34 failing WU seemed to all fail at the end and were 4.08 Linux WU.

https://boinc.bakerlab.org/rosetta/result.php?resultid=1063704662


That sounds very similar to mine.

I did notice that a few of mine showed n decoys and then appeared to restart and showed a session with 1 decoy before failing.
ID: 90546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bcavnaugh
Avatar

Send message
Joined: 7 Dec 13
Posts: 7
Credit: 2,389,640
RAC: 0
Message 90547 - Posted: 21 Mar 2019, 17:54:38 UTC
Last modified: 21 Mar 2019, 17:54:52 UTC

Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116
A T630 Server but my other T630 is getting them fine https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3282035
Both running Server 2012 R2
ID: 90547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bcavnaugh
Avatar

Send message
Joined: 7 Dec 13
Posts: 7
Credit: 2,389,640
RAC: 0
Message 90548 - Posted: 21 Mar 2019, 19:01:45 UTC - in response to Message 90547.  

ID: 90548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 90551 - Posted: 22 Mar 2019, 18:44:01 UTC - in response to Message 90548.  

Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116
A T630 Server but my other T630 is getting them fine https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3282035
Both running Server 2012 R2

Looks OK now https://boinc.bakerlab.org/rosetta/results.php?hostid=3112116


I suspect that was the last splutterings as the pool was draining, project status is showing 0 tasks unsent (but, as has been said, 8086 tasks ready to send).
ID: 90551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 90560 - Posted: 23 Mar 2019, 1:52:23 UTC - in response to Message 90551.  

Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116
A T630 Server but my other T630 is getting them fine https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3282035
Both running Server 2012 R2

Looks OK now https://boinc.bakerlab.org/rosetta/results.php?hostid=3112116


I suspect that was the last splutterings as the pool was draining, project status is showing 0 tasks unsent (but, as has been said, 8086 tasks ready to send).

Maybe they're tasks for pre-80386 machines? ...
ID: 90560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 90567 - Posted: 23 Mar 2019, 19:53:39 UTC

Despite having a 6 hour limit set I am currently processing a batch of Rosetta 4.08 WUs that have been running for 8 hours and are showing an estimated 2 hours remaining.

They all have names starting :-

rb_03_21_2022_2162_ab_t000__robetta_cstwt_5.0_FT

Is this normal or are they likely to error out?
ID: 90567 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 5144
Credit: 0
RAC: 0
Message 90569 - Posted: 24 Mar 2019, 0:43:53 UTC - in response to Message 90567.  

This seems odd but I would continue to let it run since it is a relatively large protein to model.
ID: 90569 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 90574 - Posted: 24 Mar 2019, 9:17:59 UTC - in response to Message 90569.  

This seems odd but I would continue to let it run since it is a relatively large protein to model.

Besides that, the limit is CPU-hours, so depending on what else the CPU has to do, the runtime can be a lot longer.
.
ID: 90574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 90575 - Posted: 24 Mar 2019, 11:37:54 UTC - in response to Message 90569.  

This seems odd but I would continue to let it run since it is a relatively large protein to model.


After 10 hours (elapsed and CPU) 2 of them (1064201222 and 1064201281) errored out with the same symptoms I’ve been seeing.

Interestingly the 4 that succeeded (1064201216, 1064201223, 1064201224 and 1064201283) also had the default.out.gz exist, stream information inconsistent error so that is also a red herring.
ID: 90575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 90685 - Posted: 17 Apr 2019, 22:25:50 UTC

Validation seems to be offline for the last half hour
ID: 90685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 90686 - Posted: 18 Apr 2019, 0:24:03 UTC - in response to Message 90685.  

Validation seems to be offline for the last half hour

And back about 30mins ago, I think
ID: 90686 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 90706 - Posted: 21 Apr 2019, 20:44:07 UTC - in response to Message 90686.  

Validation seems to be offline for the last half hour

And back about 30mins ago, I think


And off again since 04:00 this morning
ID: 90706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trevor ct

Send message
Joined: 7 Oct 14
Posts: 2
Credit: 23,386,023
RAC: 0
Message 90964 - Posted: 2 Aug 2019, 15:19:23 UTC

Rosetta 4.07 work tasks are reporting 'computational error' immediately they are opened on one of my two computers. Only known difference is affected computer BOINC version is 7.14.2 and the non-affected is earlier version 7.6.33.

I cannot discover how to revert version as a trial.

Trevor ct
ID: 90964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90965 - Posted: 2 Aug 2019, 16:14:58 UTC - in response to Message 90964.  

Rosetta 4.07 work tasks are reporting 'computational error' immediately they are opened on one of my two computers. Only known difference is affected computer BOINC version is 7.14.2 and the non-affected is earlier version 7.6.33.

It is not the BOINC version. I see the errors too (one on each of two Ubuntu machines), and they are both running BOINC 7.14.2.
ID: 90965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LarryMajor

Send message
Joined: 1 Apr 16
Posts: 22
Credit: 31,533,212
RAC: 0
Message 90966 - Posted: 2 Aug 2019, 20:47:19 UTC - in response to Message 90965.  

Yeah, I got a bunch of these on different machines, and they all fail when they are resent to someone else.
It's the work units, not your computer.
ID: 90966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trevor ct

Send message
Joined: 7 Oct 14
Posts: 2
Credit: 23,386,023
RAC: 0
Message 90968 - Posted: 3 Aug 2019, 21:30:12 UTC - in response to Message 90966.  

Thank you. Comforting I am not running a rogue program.
ID: 90968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org