Major problems with granted credit

Author	Message
Martin P. Send message Joined: 26 May 06 Posts: 38 Credit: 168,333 RAC: 0	Message 58595 - Posted: 7 Jan 2009, 8:43:49 UTC I experience major problems with credits granted. On 2 occasions claimed crdit was 48.9 (https://boinc.bakerlab.org/rosetta/result.php?resultid=217748955) and 90.4 (https://boinc.bakerlab.org/rosetta/result.php?resultid=217572433) while the granted credit was only 2 and 8. Run-times are in line with previous, correct work-units. What is going on? ID: 58595 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 58616 - Posted: 7 Jan 2009, 14:39:39 UTC I thought that tasks ending in _0 were original to the specific user and the only other test would have been on RALPH. So how can another RAH user have the same task with that exact random start point assigned to them, unless it errored out on another machine? Didn't tasks with Zinc get a low credit rating? But I wouldn't think as low as 8 credits on a high end processor with no errors. On the other hand..if you break his run time out into hours, he had a 2.79 hour run and perhaps for that time frame the credit is correct. Wonder if it would have been better at 4-6 hrs run time? ID: 58616 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 58619 - Posted: 7 Jan 2009, 15:50:55 UTC - in response to Message 58616. I thought that tasks ending in _0 were original to the specific user and the only other test would have been on RALPH. So how can another RAH user have the same task with that exact random start point assigned to them, unless it errored out on another machine? Yes, this is true, but many machines will work on models for the same protein and same solution method. Each with different starting points. So these are what is averaged together. Basically, look at the WU names and the batch number that preceeds the _0. Many thousands of tasks will be generated with the same name and batch. Each has a unique random seed and will produce unique models. So, for example, the task first mentioned is: cc2_1_8_native_cen_cst_hb_t313__IGNORE_THE_REST_1RY6A_3_5845_79_0 5845 is the batch number 79 is the task within that batch 0 is the replication level so far But if you had a list of all the tasks for batch 5845, you would see some for proteins other then 1RY6A... or is it t313? I don't have a perfect understanding of the names either :) It wouldn't make sense to average runtimes from protein "A" with runtimes from protein "B". So, within a batch, the specific protein being studied is the basis of what gets averaged as results come in. Rosetta Moderator: Mod.Sense ID: 58619 · Rating: 0 · rate: / Reply Quote

sarha1 Send message Joined: 23 Sep 05 Posts: 5 Credit: 6,339,735 RAC: 0	Message 58620 - Posted: 7 Jan 2009, 15:59:14 UTC I realized recently a flat rate credit granted - exactly 2 credits per decoy counted. It is really strange. ID: 58620 · Rating: 0 · rate: / Reply Quote

david @ TPS Send message Joined: 26 Nov 06 Posts: 3 Credit: 881,762 RAC: 0	Message 58622 - Posted: 7 Jan 2009, 16:26:18 UTC Last modified: 7 Jan 2009, 16:33:57 UTC Thanks for the explanations. Due to the CUDA fiasco, I am looking for a new home for my farm. I have been crunching Rosetta with a few of my older boxes but it does not sound like the big horsepower would be well served by this system. I might move a dual core and see how it likes it -- then decide. David PS: I have not looked at my computers for a while, and I see the credit numbers are MUCH better than they used to be! Here are a few that mirror Martin's, I think. 218028581 198676953 31 Dec 2008 10:21:34 UTC 5 Jan 2009 17:20:55 UTC Over Success Done 18,349.98 36.14 10.53 218028580 198676951 31 Dec 2008 10:21:34 UTC 7 Jan 2009 11:33:44 UTC Over Success Done 18,927.20 37.28 2.00 218028579 198676945 31 Dec 2008 10:21:34 UTC 7 Jan 2009 11:33:44 UTC Over Success Done 8,364.75 16.47 6.00 Think it's time to start re-tasking some horesepower............ ID: 58622 · Rating: 0 · rate: / Reply Quote

bono_vox Send message Joined: 5 Dec 05 Posts: 8 Credit: 371,092 RAC: 0	Message 58623 - Posted: 7 Jan 2009, 16:50:37 UTC All recent granted credits are integer values. Strange... ID: 58623 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2591 Credit: 47,220,881 RAC: 3	Message 58628 - Posted: 7 Jan 2009, 17:46:45 UTC - in response to Message 58623. All recent granted credits are integer values. Strange... Ditto ID: 58628 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 58629 - Posted: 7 Jan 2009, 17:47:56 UTC I'm seeing the same. 2.0 credits per model. I've EMailed the Project Team. Rosetta Moderator: Mod.Sense ID: 58629 · Rating: 0 · rate: / Reply Quote

Aegis Maelstrom Send message Joined: 29 Oct 08 Posts: 61 Credit: 2,137,555 RAC: 0	Message 58630 - Posted: 7 Jan 2009, 18:12:22 UTC Hi Martin, Sartha, Sid and our brave Mod! The same here, firstly I thought it was just a single case, but it is repeating and repeating - flat two points per decoy - see here, here and here. All three abinitio norelax homfrag end with _0 just like Greg said, however the ones reported previously (on 6th of Jan) were rated more reasonably... If I were to bet I would say some rating module on the server side broke during the latest crash. I hope it will be fixed soon - I know I don't have any significant crunching power on this lappy but we do our best :) and it's a kind of slap. ---- On the side: I had another problem with another WU from the same set. See the other topic. ID: 58630 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 58634 - Posted: 7 Jan 2009, 18:54:31 UTC What is the difference for the names of these two tasks: abinitio_norelax_homfrag_129_B_1a8oA_SAVE_ALL_OUT_4626_4562_0 abinitio_norelax_homfrag_129_B_1bq9A_SAVE_ALL_OUT_4626_4562_0 The only difference I see is the 1a or 1b name, so is this two different proteins or starting points or what? The first task I lost 20 credits 128 granted vs 148 claimed and the other task I gained credit 99 vs 130. But then here is another interesting thing, the run times were different and never modified by me as far as I know since the system has been down and no communication is possible. The first task ran 6hrs and the other ran 4hrs. How can this be if no settings were changed? on the credit point, the first task generated exactly 2 points per decoy for 64 decoys and the second came out to 1.61 for 81 decoys. seems kind of low for credit per decoy. ID: 58634 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 58635 - Posted: 7 Jan 2009, 18:57:28 UTC the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. ID: 58635 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 58636 - Posted: 7 Jan 2009, 18:59:21 UTC - in response to Message 58635. the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. thanks for the update. hope you got everything else working ok now. been quite a trying few days for you i see. thanks for your hard work. ID: 58636 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2591 Credit: 47,220,881 RAC: 3	Message 58656 - Posted: 7 Jan 2009, 21:42:42 UTC - in response to Message 58635. the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. Confirmed, thanks. Pity about those lost ~300 credits, but I'll make it up now that I've upped my runtime to 4 hours and all those lockfile errors disappeared. ID: 58656 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 58657 - Posted: 7 Jan 2009, 21:43:56 UTC - in response to Message 58656. the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. Confirmed, thanks. Pity about those lost ~300 credits, but I'll make it up now that I've upped my runtime to 4 hours and all those lockfile errors disappeared. 4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. ID: 58657 · Rating: 0 · rate: / Reply Quote

ConflictingEmotions Send message Joined: 5 Jun 08 Posts: 10 Credit: 3,081,990 RAC: 0	Message 58658 - Posted: 7 Jan 2009, 21:46:32 UTC - in response to Message 58635. the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. So when will we see the updates to the affected WU that were already completed? ID: 58658 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2591 Credit: 47,220,881 RAC: 3	Message 58660 - Posted: 7 Jan 2009, 21:49:16 UTC - in response to Message 58657. 4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. More likely, it reduces the amount of complaining about long running models... <oops> ID: 58660 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 58662 - Posted: 8 Jan 2009, 1:00:47 UTC - in response to Message 58660. 4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. More likely, it reduces the amount of complaining about long running models... <oops> lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right. ID: 58662 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1836 Credit: 124,981,563 RAC: 3	Message 58667 - Posted: 8 Jan 2009, 10:58:55 UTC - in response to Message 58662. 4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. More likely, it reduces the amount of complaining about long running models... <oops> lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right. big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size. I expect sure they'd like to send out much bigger models than the currently do. I think it'd be useful if they could be more selective in how much resource the different models require and only send those to adequate computers. It'd also be useful if there were a selection of large and small tasks so, for example, a quad might be able to do up to two large and two small as a maximum. ID: 58667 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 58668 - Posted: 8 Jan 2009, 11:37:22 UTC - in response to Message 58667. 4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. More likely, it reduces the amount of complaining about long running models... <oops> lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right. big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size. I expect sure they'd like to send out much bigger models than the currently do. I think it'd be useful if they could be more selective in how much resource the different models require and only send those to adequate computers. It'd also be useful if there were a selection of large and small tasks so, for example, a quad might be able to do up to two large and two small as a maximum. If the scheduler or whatever program that stores the info about our systems could be made intelligent enough to read through the database of systems and their settings and say..oh..heres a system with 6 hour run times or longer, lets send a large model protien to it and then the same with lower run times and memory etc. But I suppose such a program or whatever would take a lot of time to develop or is not possible at this time. That would take care of the over runs I would think. ID: 58668 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2591 Credit: 47,220,881 RAC: 3	Message 58717 - Posted: 11 Jan 2009, 0:56:12 UTC - in response to Message 58667. ...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right. Big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size. Not entirely sure about that. Yes, there are bigger models coming through, but there also seems to be an issue of some taking unreasonably long, but not returning anything like the credit a model of that size should warrant. That's what they seem to be trying to pick up. ID: 58717 · Rating: 0 · rate: / Reply Quote