Message boards : Number crunching : Report long-running models here
Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next
Author | Message |
---|---|
William Timbrook Send message Joined: 2 Nov 05 Posts: 3 Credit: 11,623,185 RAC: 0 |
William, parts of what you describe are normal and expected, and some parts are not. I've moved your posts here to this thread because you appear to have a 3hr runtime (the default) configured for that host, and so the 8hrs you report is well beyond that. Thanks for the update. I had another one like that which was experiencing the same thing. Seeing the 10hrs of cpu time just didn't look that comforting. I wanted to finish the jobs but... some other hosts can pick those 2 up. I was on 5.10.45 (?) but upgraded that host to 6.2.18 but it seem to not make a difference. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
This one took 4 hrs to complete the first model (and only checkpoint). AA2A_6_modeling_1_AA2A_1_AA2A_2RH1_align_4492_20784_0 Similar WU name is already over 2hrs and hasn't completed model 1. AA2A_7_modeling_1_AA2A_1_AA2A_2RH1_align_4493_31406_0 Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
hombench_mtyka_looprelax_test_full_2_looprelax_t326__IGNORE_THE_REST_1A9XB_17_4531_8_0 using minirosetta version 134 Running almost 9 hours and still on model 1 Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Two models in nearly 18hrs of crunchtime. AA2A_7_modeling_1_AA2A_1_AA2A_2RH1_align_4493_97149_0 ...and this one ...and this one ...and this one ...and this one all AA2A's, all 2 models completed in ~62,000 seconds. this one took over 12hrs to do just one model. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
...and 30 credits for that 12 hours work; so generous. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t328___4598_485 on a machine that has 3 hour wu times ran for well over 7 hours. For a normal 3 hour wu it claims and gets 50-60 credit, this wu claimed 148 and was granted 20. There is something wrong here. Windows XP SP3, Intel Q6600, CC 5.10.30, MR 1.34.
I really don't want to drop Rosetta from my portfolio, I have been here from the start, but recently there have been a near continual series of issues with the production quality project which a good deal of the Beta and some of the Alphas do not have. This, in spite of a dedicated Beta test project. Not good enough? Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Semunozg Send message Joined: 23 Nov 05 Posts: 2 Credit: 15,659,163 RAC: 0 |
hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t328___4598_485 on a machine that has 3 hour wu times ran for well over 7 hours. For a normal 3 hour wu it claims and gets 50-60 credit, this wu claimed 148 and was granted 20. There is something wrong here. Same here, a bunch of workunits that ran way over my set limit claimed 1xx+ crdits, and received 90 or less. I dont mind having to crunch longer a WU, but getting credit when its due would be nice. Im not going to drop the project though becuase of this... but it would be nice if they fixed this or at least kept the public informed about current problems... challenges... etc. A little more participation. For example, SETI@Home has excellent forums and the team is constantly making updates about diverese news. From either server issues, to Tflops goals. Rosetta@Home team didnt even acknoledge the server down that ocurred a few night ago (A forum Mod said it was a kernel panic, but this was posted in the forums... and very few people actually read the forums). |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Im not going to drop the project though becuase of this... As I said in my report, it is not just this issue. Recently I have had to post a number of times with different, unrelated problems. This one, just short changing with credit, is pretty minor, goes along with the reasonably high number of simple wu crashes. Others involving locking out cores or whole machines are much more serious. This from a production status project with a seperate Beta tester. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Hubington Send message Joined: 3 Feb 06 Posts: 24 Credit: 127,236 RAC: 0 |
minirosetta 1.34: hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t328___4598_724 Usually it takes between 2-3 hours for a work unit to compelte for me however for the above listed unit it is currently on 9 hours with 98.2% compelte with 9 mins 50 secs remaining I first noticed the runtime of it earlier today when it was at about 6.5hours at 97.3% with 9 mins 50 secs remaining. Now I know the remaining times are estimates, but there are estimates, theres what windows estimates when you go to copy a file and then there is this. Basicly I'm worried that the work unit is just wasteing cycles and wondered if anyone has any thoughts on it. based on a 60 second sampling I just took It is notching up 0.001% of progress every 20 seconds so in theory it should complete in about 9-10 hours time. That is assuming that it just contains a lot more work than normal rather than just spinning it's wheels. any comments welcomed |
Hubington Send message Joined: 3 Feb 06 Posts: 24 Credit: 127,236 RAC: 0 |
in fine accordance with Murphys law, it just finished. If someone could find out why the run time was over 3 times the norm though it could be useful as others may kill off the work units thinking they had died. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Read this thread https://boinc.bakerlab.org/forum_thread.php?id=4388 in fine accordance with Murphys law, it just finished. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
in fine accordance with Murphys law, it just finished. Your machines are hidden so I can't look at your results. Curious to see how the claimed/granted credit you got from that wu compares to the same machines regular performance. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
abinitio_nohomfrag_70_A_2hx5A_4482_62740_0 did 6 models in 25hrs. Not surprisingly, his brothers abinitio_nohomfrag_70_A_2hx5A_4482_58343_0 abinitio_nohomfrag_70_A_2hx5A_4482_59391_0 both did 5 models in just over 24hrs and 19.5hrs. abinitio_nohomfrag_70_A_2hx5A_4482_46233_1 did 4 models in 18hrs. abinitio_nohomfrag_70_A_2hcmA_4482_36776_0 did 5 models in 26hrs. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
this AA2A has been running for 10hrs and not even checkpointed yet. So it must still be on model one. AA2A_20_modeling_1_AA2A_1_AA2A_2VTA_SAVE_ALL_OUT_align_4600_37939_0 Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Hubington Send message Joined: 3 Feb 06 Posts: 24 Credit: 127,236 RAC: 0 |
in fine accordance with Murphys law, it just finished. yeah I'm paranoid :) Here is a smattering of the surrounding restults for that WU though CPU Time |Claimed |Granted 9,912.30 |37.37 |33.71 9,435.89 |32.95 |34.51 33,868.45 |127.69 |22.29 10,209.95 |35.65 |33.69 10,327.84 |36.06 |34.75 5,979.70 |20.88 |23.02 9,766.56 |34.10 |25.06 (the formating gets messed up so I've seperated the coloumns with | marks) New one on the way incidently minirosetta 1.34: hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t286___4580_1561_0 currently been running for 36 hours & 5 mins! 99.540% complete OK I just noticed something VERY worrying while trying to see how long it took to click over 0.001%, the run time jumped back 6 mins?!?!?! and now it lost 0.001% from the progress taking it back to 99.539 |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
9,933.34 | 55.85 | 57.86 9,569.86 | 53.81 | 56.64 11,264.72 | 63.34 | 71.97 10,648.98 | 59.88 | 64.31 26,326.86 | 148.03 | 20.86 <------ 10,750.00 | 60.44 | 63.28 9,991.64 | 56.18 | 58.85 They really stand out don't they. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Hubington Send message Joined: 3 Feb 06 Posts: 24 Credit: 127,236 RAC: 0 |
Well I imgaine that the claime is based on cycles used so I imagaine that your processor puts out more power than mine which is why you generate more credits per hour than I do. But then the granted credit is problaby result based rather than effort put in. The theory being that X amount of effort usually yeilds Y amount of results. Which is why you get small variances between the claimed and granted, usually being granted less than claimed but seemingly not always. Also I suspect certain sub projects of WU yeild more/less results per hour than others. In the case of these work units though the system is seemingly using a lot of cycles but producing little or no results for it and so the claime going in is much higher than whats being granted. Just an observation though |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
26,326.86 | 148.03 | 20.86 <------ Credit claimed represents how much time your computer put in to it, as compared to the benchmarks for that computer. The credit granted is based on the work you actually completed and is the average of the claims of others that did similar work. So, it looks to me as though that task took you 2.5 times longer then normal. And everyone else completed the models in "normal" time. Hence, you have a long-running model there that required dramatically more work then others for the same protein. And hence, this thread. To identify such occurences so the team can track down what caused it to run for so long. Rosetta Moderator: Mod.Sense |
Mark1212 Send message Joined: 23 Sep 05 Posts: 1 Credit: 1,952,358 RAC: 0 |
You think you have it bad what a waste of time and energy this one was 194952967 26 Sep 2008 18:51:19 UTC Over Client error Compute error 61,849.00 562.03 --- |
Message boards :
Number crunching :
Report long-running models here
©2024 University of Washington
https://www.bakerlab.org