Major problems with granted credit

Message boards : Number crunching : Major problems with granted credit

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 58595 - Posted: 7 Jan 2009, 8:43:49 UTC

I experience major problems with credits granted. On 2 occasions claimed crdit was 48.9 (https://boinc.bakerlab.org/rosetta/result.php?resultid=217748955) and 90.4 (https://boinc.bakerlab.org/rosetta/result.php?resultid=217572433) while the granted credit was only 2 and 8. Run-times are in line with previous, correct work-units. What is going on?

ID: 58595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58616 - Posted: 7 Jan 2009, 14:39:39 UTC

I thought that tasks ending in _0 were original to the specific user and the only other test would have been on RALPH. So how can another RAH user have the same task with that exact random start point assigned to them, unless it errored out on another machine?

Didn't tasks with Zinc get a low credit rating? But I wouldn't think as low as 8 credits on a high end processor with no errors. On the other hand..if you break his run time out into hours, he had a 2.79 hour run and perhaps for that time frame the credit is correct. Wonder if it would have been better at 4-6 hrs run time?
ID: 58616 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58619 - Posted: 7 Jan 2009, 15:50:55 UTC - in response to Message 58616.  

I thought that tasks ending in _0 were original to the specific user and the only other test would have been on RALPH. So how can another RAH user have the same task with that exact random start point assigned to them, unless it errored out on another machine?


Yes, this is true, but many machines will work on models for the same protein and same solution method. Each with different starting points. So these are what is averaged together. Basically, look at the WU names and the batch number that preceeds the _0. Many thousands of tasks will be generated with the same name and batch. Each has a unique random seed and will produce unique models.

So, for example, the task first mentioned is:
cc2_1_8_native_cen_cst_hb_t313__IGNORE_THE_REST_1RY6A_3_5845_79_0

5845 is the batch number
79 is the task within that batch
0 is the replication level so far

But if you had a list of all the tasks for batch 5845, you would see some for proteins other then 1RY6A... or is it t313? I don't have a perfect understanding of the names either :)

It wouldn't make sense to average runtimes from protein "A" with runtimes from protein "B". So, within a batch, the specific protein being studied is the basis of what gets averaged as results come in.
Rosetta Moderator: Mod.Sense
ID: 58619 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sarha1

Send message
Joined: 23 Sep 05
Posts: 5
Credit: 6,339,735
RAC: 0
Message 58620 - Posted: 7 Jan 2009, 15:59:14 UTC

I realized recently a flat rate credit granted - exactly 2 credits per decoy counted. It is really strange.
ID: 58620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
david @ TPS

Send message
Joined: 26 Nov 06
Posts: 3
Credit: 881,762
RAC: 0
Message 58622 - Posted: 7 Jan 2009, 16:26:18 UTC
Last modified: 7 Jan 2009, 16:33:57 UTC

Thanks for the explanations.

Due to the CUDA fiasco, I am looking for a new home for my farm. I have been crunching Rosetta with a few of my older boxes but it does not sound like the big horsepower would be well served by this system. I might move a dual core and see how it likes it -- then decide.

David

PS: I have not looked at my computers for a while, and I see the credit numbers are MUCH better than they used to be! Here are a few that mirror Martin's, I think.

218028581 198676953 31 Dec 2008 10:21:34 UTC 5 Jan 2009 17:20:55 UTC Over Success Done 18,349.98 36.14 10.53
218028580 198676951 31 Dec 2008 10:21:34 UTC 7 Jan 2009 11:33:44 UTC Over Success Done 18,927.20 37.28 2.00
218028579 198676945 31 Dec 2008 10:21:34 UTC 7 Jan 2009 11:33:44 UTC Over Success Done 8,364.75 16.47 6.00

Think it's time to start re-tasking some horesepower............
ID: 58622 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bono_vox

Send message
Joined: 5 Dec 05
Posts: 8
Credit: 371,092
RAC: 0
Message 58623 - Posted: 7 Jan 2009, 16:50:37 UTC

All recent granted credits are integer values. Strange...
ID: 58623 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,184,495
RAC: 10,704
Message 58628 - Posted: 7 Jan 2009, 17:46:45 UTC - in response to Message 58623.  

All recent granted credits are integer values. Strange...

Ditto
ID: 58628 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58629 - Posted: 7 Jan 2009, 17:47:56 UTC

I'm seeing the same. 2.0 credits per model. I've EMailed the Project Team.
Rosetta Moderator: Mod.Sense
ID: 58629 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aegis Maelstrom

Send message
Joined: 29 Oct 08
Posts: 61
Credit: 2,137,555
RAC: 0
Message 58630 - Posted: 7 Jan 2009, 18:12:22 UTC

Hi Martin, Sartha, Sid and our brave Mod!

The same here, firstly I thought it was just a single case, but it is repeating and repeating - flat two points per decoy - see here, here and here.

All three abinitio norelax homfrag end with _0 just like Greg said, however the ones reported previously (on 6th of Jan) were rated more reasonably...
If I were to bet I would say some rating module on the server side broke during the latest crash.

I hope it will be fixed soon - I know I don't have any significant crunching power on this lappy but we do our best :) and it's a kind of slap.

----

On the side: I had another problem with another WU from the same set. See the other topic.
ID: 58630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58634 - Posted: 7 Jan 2009, 18:54:31 UTC

What is the difference for the names of these two tasks:
abinitio_norelax_homfrag_129_B_1a8oA_SAVE_ALL_OUT_4626_4562_0
abinitio_norelax_homfrag_129_B_1bq9A_SAVE_ALL_OUT_4626_4562_0

The only difference I see is the 1a or 1b name, so is this two different proteins or starting points or what?

The first task I lost 20 credits 128 granted vs 148 claimed and the other task I gained credit 99 vs 130. But then here is another interesting thing, the run times were different and never modified by me as far as I know since the system has been down and no communication is possible. The first task ran 6hrs and the other ran 4hrs. How can this be if no settings were changed?

on the credit point, the first task generated exactly 2 points per decoy for 64 decoys and the second came out to 1.61 for 81 decoys.

seems kind of low for credit per decoy.
ID: 58634 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 58635 - Posted: 7 Jan 2009, 18:57:28 UTC

the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now.
ID: 58635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58636 - Posted: 7 Jan 2009, 18:59:21 UTC - in response to Message 58635.  

the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now.



thanks for the update. hope you got everything else working ok now.
been quite a trying few days for you i see.
thanks for your hard work.
ID: 58636 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,184,495
RAC: 10,704
Message 58656 - Posted: 7 Jan 2009, 21:42:42 UTC - in response to Message 58635.  

the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now.

Confirmed, thanks. Pity about those lost ~300 credits, but I'll make it up now that I've upped my runtime to 4 hours and all those lockfile errors disappeared.
ID: 58656 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58657 - Posted: 7 Jan 2009, 21:43:56 UTC - in response to Message 58656.  

the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now.

Confirmed, thanks. Pity about those lost ~300 credits, but I'll make it up now that I've upped my runtime to 4 hours and all those lockfile errors disappeared.



4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done.
ID: 58657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ConflictingEmotions

Send message
Joined: 5 Jun 08
Posts: 10
Credit: 3,081,990
RAC: 0
Message 58658 - Posted: 7 Jan 2009, 21:46:32 UTC - in response to Message 58635.  

the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now.


So when will we see the updates to the affected WU that were already completed?
ID: 58658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,184,495
RAC: 10,704
Message 58660 - Posted: 7 Jan 2009, 21:49:16 UTC - in response to Message 58657.  

4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done.

More likely, it reduces the amount of complaining about long running models... <oops>
ID: 58660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58662 - Posted: 8 Jan 2009, 1:00:47 UTC - in response to Message 58660.  

4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done.

More likely, it reduces the amount of complaining about long running models... <oops>

lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right.
ID: 58662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 114,410,350
RAC: 54,579
Message 58667 - Posted: 8 Jan 2009, 10:58:55 UTC - in response to Message 58662.  

4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done.

More likely, it reduces the amount of complaining about long running models... <oops>

lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right.

big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size.

I expect sure they'd like to send out much bigger models than the currently do. I think it'd be useful if they could be more selective in how much resource the different models require and only send those to adequate computers. It'd also be useful if there were a selection of large and small tasks so, for example, a quad might be able to do up to two large and two small as a maximum.
ID: 58667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58668 - Posted: 8 Jan 2009, 11:37:22 UTC - in response to Message 58667.  

4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done.

More likely, it reduces the amount of complaining about long running models... <oops>

lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right.

big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size.

I expect sure they'd like to send out much bigger models than the currently do. I think it'd be useful if they could be more selective in how much resource the different models require and only send those to adequate computers. It'd also be useful if there were a selection of large and small tasks so, for example, a quad might be able to do up to two large and two small as a maximum.



If the scheduler or whatever program that stores the info about our systems could be made intelligent enough to read through the database of systems and their settings and say..oh..heres a system with 6 hour run times or longer, lets send a large model protien to it and then the same with lower run times and memory etc.
But I suppose such a program or whatever would take a lot of time to develop or is not possible at this time. That would take care of the over runs I would think.
ID: 58668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,184,495
RAC: 10,704
Message 58717 - Posted: 11 Jan 2009, 0:56:12 UTC - in response to Message 58667.  

...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right.

Big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size.

Not entirely sure about that. Yes, there are bigger models coming through, but there also seems to be an issue of some taking unreasonably long, but not returning anything like the credit a model of that size should warrant. That's what they seem to be trying to pick up.
ID: 58717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Major problems with granted credit



©2024 University of Washington
https://www.bakerlab.org