Posts by Team TMR

1) Message boards : Number crunching : RAC dropping, BOINC dropping comms (Message 31703)
Posted 27 Nov 2006 by Team TMR
Post:
Is the boinc.exe task still running when this happens?
2) Message boards : Number crunching : Problems with Rosetta version 5.40 (Message 31118)
Posted 14 Nov 2006 by Team TMR
Post:
I woke up this morning to find that over 20 WUs failed overnight. It's good to see the cause has already been found though.
3) Message boards : Number crunching : Report Problems with Rosetta Version 5.25 (Message 21613)
Posted 2 Aug 2006 by Team TMR
Post:
I've had at least 5 WUs recently that have failed in the last week because they ran for over 12 hours (default 3 hour target CPU time in effect), and another is going to fail within the next hour (it's already up to 12.5 hours). They're not getting credit either.
4) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 13381)
Posted 10 Apr 2006 by Team TMR
Post:
WU 16856997 was aborted after 7 hours, when it was stuck on about 1.36%.

I have 3 more that seem to be stuck near 1% after an hour, but I won't abort them until they pass 2 hours or so.
5) Message boards : Number crunching : Report stuck & aborted WU here please (Message 13078)
Posted 5 Apr 2006 by Team TMR
Post:
I've got two more somewhere that are stuck on 1.04% after 2 hours and 1.5 hours.
6) Message boards : Number crunching : Report stuck & aborted WU here please (Message 13038)
Posted 4 Apr 2006 by Team TMR
Post:
This WU seems to be stuck: 13107954

Over 3 hours in and it's on 1.19%. Job CPU time is set to 2 hours.

Edit: Now at 4 hours and 1.30%.
7) Message boards : Number crunching : Report stuck & aborted WU here please (Message 12497)
Posted 22 Mar 2006 by Team TMR
Post:
Another one, aborted after 16 hours stuck on 1%

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11717839

This one was a bit different - it was stuck on 30.19% after 8 hours. After restarting BOINC, it reset back to 38 mins CPU time and 30.19% and got stuck again.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11743501

It's getting increasingly frustrating having to babysit this project all the time. Fingers crossed for those working on a fix.
8) Message boards : Number crunching : Report stuck & aborted WU here please (Message 12332)
Posted 20 Mar 2006 by Team TMR
Post:
And another.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11627337

This and the 4 I mentioned below were all stuck on 1%.
9) Message boards : Number crunching : Report stuck & aborted WU here please (Message 12322)
Posted 20 Mar 2006 by Team TMR
Post:
Just aborted 4 more. Really hope this gets fixed soon, we've just wasted over 5 days of CPU time! Good luck Rom.

8.8 hours
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11584596

18.2 hours
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11551106

41.4 hours
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11460182

71.0 hours
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11330309
10) Message boards : Number crunching : Report stuck & aborted WU here please (Message 12089)
Posted 16 Mar 2006 by Team TMR
Post:
Another one: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11167215

Stuck on 1% after 12 hours.
11) Message boards : Number crunching : Report stuck & aborted WU here please (Message 12009)
Posted 14 Mar 2006 by Team TMR
Post:
Had 3 today that have been stuck on 1% after anything between 3-16 hours (runtime set to 2 hours).

3.1 hours:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11019262

4.3 hours:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11008654

16.9 hours:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=10949277

All were aborted.
12) Message boards : Number crunching : Report stuck & aborted WU here please (Message 11557)
Posted 2 Mar 2006 by Team TMR
Post:
This one WU 9696277 was stuck on 1% for 3 days! I've just aborted it.

No wonder my daily points have taken a hit.

Looking forward to getting the credit it...
13) Message boards : Number crunching : Report stuck & aborted WU here please (Message 11002)
Posted 20 Feb 2006 by Team TMR
Post:
Well it finished eventually, at 8hr 39mins. But it never did get off 1% as far as I could see.
14) Message boards : Number crunching : Report stuck & aborted WU here please (Message 10997)
Posted 20 Feb 2006 by Team TMR
Post:
Another one, 9442770

Over 8 hours in and still stuck on 1%. It's running rosetta 4.82 too, so I guess that didn't fix the 1% problem then. Max CPU setting is 2 hours.
15) Message boards : Number crunching : Report Maximum CPU Time Exceeded WU HERE (Message 10575)
Posted 8 Feb 2006 by Team TMR
Post:
And now the 3rd has failed.

Result 8433350

I hope we're going to get credit for these!
16) Message boards : Number crunching : Report Maximum CPU Time Exceeded WU HERE (Message 10574)
Posted 8 Feb 2006 by Team TMR
Post:
I also have 3 other ABINITIO WUs in progress that have been running over 12 hours (2 are on 2+ GHz PCs) which might be heading the same way.

One of them now has: Result 9027571
17) Message boards : Number crunching : Report Maximum CPU Time Exceeded WU HERE (Message 10569)
Posted 8 Feb 2006 by Team TMR
Post:
This one just timed out: WU 5610404, Result 9094342

I also have 3 other ABINITIO WUs in progress that have been running over 12 hours (2 are on 2+ GHz PCs) which might be heading the same way.

If these timed out WUs are of use, are you still giving credit for them?
18) Message boards : Number crunching : Shorter WU deadlines (Message 10007)
Posted 27 Jan 2006 by Team TMR
Post:
For us, we've had to rejig the projects that some PCs work on, taking some PCs off Rosetta completely.

We had Rosetta running on a few slow PCs, which take 1-2 days to complete a WU. When you factor in that those PCs are only on for less than 8 hours a day, it becomes 4-6 days to complete - and then a weekend arives (PCs are off) and it takes over a week to complete.

If the short deadline WUs were smaller so they completed quicker it wouldn't be such a problem.
19) Message boards : Number crunching : huge WU (Message 9799)
Posted 25 Jan 2006 by Team TMR
Post:
Thanks! I obviously didn't click through enough pages.
20) Message boards : Number crunching : huge WU (Message 9797)
Posted 25 Jan 2006 by Team TMR
Post:
I know these ABINITIO WUs are big and take longer, but is anyone else finding they disappear when they're reported?

We've just uploaded two of these WUs, each took 18 hours to complete, but both are missing from our list of results. I've checked back, and of the 5 ABINITIO WUs that we've completed this morning only 1 is present in our results list. Every other type of WU seems to appear in the results list immediately.

I'm fairly sure we're getting credit (our credit increased by the expected amount the last time we reported out of these WUs), I'm just curious why they're missing from the reports list.


Next 20



©2024 University of Washington
https://www.bakerlab.org