Posts by The Gas Giant

1) Message boards : Number crunching : Best Current Target CPU Run Time? (Message 20375)
Posted 17 Jul 2006 by Profile The Gas Giant
Post:
With the short deadlines of CASP7 submissions what is the "best" target cpu run time for the project to complete work most effectively? I have just rejoined Rosetta on one of my hosts and want to do the "most" for the project with it. I currently have it set to 12hrs.

Live long and crunch.
2) Message boards : Number crunching : How long to 1%? (Message 11187)
Posted 22 Feb 2006 by Profile The Gas Giant
Post:
Many thanks for the reply. I now feel more comfortable and will hit resume as soon as I get home in a couple of hours.

Paul.
3) Message boards : Number crunching : How long to 1%? (Message 11177)
Posted 22 Feb 2006 by Profile The Gas Giant
Post:
With the recent release of 4.82 how long should it take to get to 1% with this style of wu -> PRODUCTION_ABINITIO_INCREASECYCLES50_1tul__312_1645. I had some initial wu's that were quite quick, but I have seen a change.

My 3.2GHz HT machine is currently over 1hr and still at 1%.

I also have the same style of wu on the same machine at 53.22% (BOINC had been stopped and started so this looks like a checkpoint) and it is at over 5hrs cpu time. The %complete also did not increase while the cpu time was increasing once I restarted BOINC (thanks windows updates - grr).

I am also seeing the total memory usage (real and virtual) hitting 1.2GB with 2 wu's running and at around 700MB with 1 wu running. Is this also "normal"?

I have suspended Rosetta for the moment.

Live long and crunch.
4) Message boards : Number crunching : Help us solve the 1% bug! (Message 10570)
Posted 8 Feb 2006 by Profile The Gas Giant
Post:
This wu http://boinc.bakerlab.org/rosetta/workunit.php?wuid=7601894 was stuck at 1% for over 3hrs. I followed the guide right at the bottom to get the following command to be run in the termical window on XP. Within a few minutes the progress was at 10%

C:Program FilesBOINCprojectsboinc.bakerlab.org_rosetta>rosetta_4.81_windows_
intelx86.exe aa 2tif _ -abrelax -stringent_relax -more_relax_cycles -relax_score
_filter -output_chi_silent -vary_omega -sim_aneal -rand_envpair_res_wt -rand_SS_
wt -farlx -ex1 -ex2 -silent -barcode_from_fragments -barcode_from_fragments_leng
th 10 -ssblocks -barcode_mode 3 -omega_weight 0.5 -jitter_frag -jitter_variation
gauss -max_frags 400 -number_3mer_frags 200 -number_9mer_frags 100 -output_sile
nt_gz -paths frags400.txt -filter1 -90 -filter2 -115 -nstruct 10 -constant_seed
-jran 1373221

Hope this helped a little.

Live long and crunch.
5) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 9891)
Posted 26 Jan 2006 by Profile The Gas Giant
Post:
Fairly easy to search for the affected results, just do a search in the stderr out section of the result for max cpu time exceeded. It might take a bucket load of hrs to complete the search, but then we have contributed a bucket load of cpu time in good faith. David is being very silent on this point.

Live long and crunch.

Paul.
6) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 9612)
Posted 23 Jan 2006 by Profile The Gas Giant
Post:
So it looks as though no credit will be issued for a problem caused by Rosetta that has resulted in a lot of wasted cpu time and wasted effort by our machines, thanks for telling us guys!

Live long and crunch.
7) Message boards : Number crunching : Stuck at 1% and restarting BOINC. (Message 9159)
Posted 16 Jan 2006 by Profile The Gas Giant
Post:
I just read (I had read it before, but not in this context so it clicked) that if you have a stuck wu at 1% and restart BOINC the wu is restarted with a different initial random seed and this maybe the reason why the wu then completes successfully.

So even though you think the wu has completed successfully it is starting with a different set of parameters so it is not the exact same result.

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=899#8908

To specifically help the project and not potentially waste lots of cpu time in the future please read the initial post in the link above.

Live long and crunch.
8) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8891)
Posted 13 Jan 2006 by Profile The Gas Giant
Post:
David, something needs to be done about this.


I have asked again for specifics... I don't expect an answer at this point to be today, but probably tomorrow. Handling the "cpu time exceeded" cases is likely to be more difficult than ones where the original issue date is known; the error message is not readily accessible. I don't know if this _can_ be done, it may be that to grant credit for these, credit would have to be given for every failed WU regardless of reason... and I don't know how big a problem that could cause. Someone WILL give more info as soon as it's available.

Be sure to look at the text file for credits and not just the results web page.


I checked the 4.2MB text file and found I received about 120c (one nice wu of 111, the rest being the small variety). Only another ~1900 to go...lol!

It would help if BOINC increased the DCF when a wu errored out on max_cpu_time_exceeded. I also understand why it is there since I had to stop and restart BOINC yesterday morning just prior to leaving for work as I had a stuck wu at 1% for 4hrs. I couldn't get the sdout info as I was running a little late. I lost 4hrs of cpu time, but atleast the wu then completed OK.

So overall there are two problems;

1. WU's get stuck at 1%.

2. WU's progress OK but are longer than typical and error out due to max_cpu_time_exceeded, but it would have completed if left to run.

So if we get rid of problem #1 we can relax the settings that cause #2.

Paul.
9) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8881)
Posted 12 Jan 2006 by Profile The Gas Giant
Post:
I just asked David Kim about the whole credits issue, and he said he has had "backend stuff" that has tied him up, that he would try to do something today if possible, and would post when he was done. (Servers have been down a couple of times today.) I have not seen the script he's going to be running, so I don't know exactly what is covered.



David has just finished awarding credits to recently returned jobs, and will have gone through all of the archived jobs within the next two days.


Ah, but the credit granted was not for max-cpu-time-exceeded. We have major problem here. BOINC/Rosetta is not capable of handling some of the versions of wu's that have been released when the cpu time exceeds the estimated time by something like 20%. Under normal circumstances of BOINC operation a wu hitting this limit is a regular occurance.

David, something needs to be done about this. I have confirmed that if I manually alter the DCF a wu that has an extended completion time does complete normally. Maybe for these wu's you need to increase the number of estimate flops and iops.

I know of 2 fairly large crunchers who have left this project because of this issue and the lost credit.

Paul.
10) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8808)
Posted 11 Jan 2006 by Profile The Gas Giant
Post:
11) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8707)
Posted 10 Jan 2006 by Profile The Gas Giant
Post:
Just to iterate a point(s) here. None of the wu's that I have listed have been stuck at 1%! They were all at least 50% of the way through and some as high as 90% when they were automatically aborted due to the maximum cpu time. The only wu that was stuck at 1%, I aborted after 9hrs of crunching (overnight) and is the only wu to have exceeded 6 or 7hrs and not automatically abort. I have since read that if you stop and restart boinc there is a very high likelihood that the wu will complete successfully for wu's that have been stuck at 1%.

Live long and crunch.

Paul.
12) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8614)
Posted 9 Jan 2006 by Profile The Gas Giant
Post:
Looks like some of the wu's are now getting deleted. Is anything going to happen with this?
13) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8573)
Posted 8 Jan 2006 by Profile The Gas Giant
Post:
And BOINC doesn't increase the estimated cpu time when a wu errors out at the higher cpu time which doesn't help.
14) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8326)
Posted 4 Jan 2006 by Profile The Gas Giant
Post:
At the risk of opening myself up to ridicule and derision over the amount of optimisation and oc and hence higher bm's my machines have, below is a list of wu's that have errored out,

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4346998
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4471098
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4016382
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4099470
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4151210
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4151110
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4151020
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4428403
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4209811
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4363700
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4428129
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=4444790
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=3760889

On quite a few of these wu's I am not the only person to run into trouble.

This is a lot of cpu time that has been lost.

Live long and crunch.
15) Message boards : Number crunching : BOINC - known issues - so why use it? (Message 8246)
Posted 3 Jan 2006 by Profile The Gas Giant
Post:
Talking of hijacking the thread (matrix of optimal project combinations on multi-cpu machines).

I recently had 2 CPDN wu's running on my HT machine. 1 sulphur, 1 standard. On completion of the standard wu the sec/Ts dropped on the sulphur unit from 4.2298 to 4.2033. Not a lot I know, but it is better than it increasing like it normally does and it is about 1% better than were it would have been at that timestep based on previous sec/Ts increases. The other cpu was running a combination of seti and rosetta. I was lucky enough/unlucky enough to be able to keep the sulphur unit always running due to it having a problem with it's deadline, so BOINC is in EDF mode due to the sulphur wu. I might do some trials while BOINC is in EDF mode and always running CPDN on 1 cpu with the NNW setting for seti and rosetta and see if there is any other changes.

Live long and crunch.
16) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8245)
Posted 3 Jan 2006 by Profile The Gas Giant
Post:
I went into the client_state file and increased he DCF figure for Rosetta and the problem has gone away for the moment on the machine in question. Another machine currently has a wu in progress (INCREASE_CYCLES_10_1dtj_226_4391_1) that is at 2hrs23min cpu time and is only 30% done with another 3hr54min to completion. Prior to this wu I had a few wu's take around 2hrs only, so it looks as though this may error out as well, but I'll keep an eye on it.

I tend to agree with Snake Doctor. Only error the wu out if the progress % is not increasing/has not increased after the cpu time reaches say 20% of the estimated To Completion time.

Live long and crunch.

ps. It's to see the background on my sig came back.
17) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8132)
Posted 1 Jan 2006 by Profile The Gas Giant
Post:
Just had another wu error out at 5hrs 10min on my 3.2GHz@3.64GHz P4 HT machine. This is the first time this machine has done Rosetta in a while due to a problem with cpdn deadlines and an early sulphur wu. The wu that error'd out was the 2nd or 3rd wu it had completed and true to Bill's comments the first ones were fairly quick and the DCF had dropped a fair amount. I must admit I am using the optimised BOINC client as well as I am also running the optimised SETI@home application. So your comment about the max cpu time being associated with the bm's and DCF makes sense.

This is an interesting cunnundrum for folks doing multiple projects with one of them being SETI@home and using the optimised client/app. I'll have to think about whether it's worth while to continue running Rosetta from both the projects' and my perspective. I don't want good wu's erroring out and I don't want to loose credit, but I do want to continue running Rosetta. Can the project maybe look at increasing the max cpu figure and ignore the DCF since wu times are all over the place.

Live long and crunch and a very happy New Year to all,

ps. Where did the background in my sig disappeat to?
18) Message boards : Number crunching : Maximum CPU time Exceeded...How about some granted credit! (Message 8099)
Posted 1 Jan 2006 by Profile The Gas Giant
Post:
I've had a few wu's recently where the computation errors out with Maximum CPU time exceeded. These typically have taken 6 hrs or so and have basically lost about two days of cpu time because of it (these wu's were not all the _205_ batch of wu's).

Any chance of getting credit for these wu's?

Live long and crunch.
19) Message boards : Number crunching : Why do bad wu's keep getting sent out. (Message 7232)
Posted 22 Dec 2005 by Profile The Gas Giant
Post:
I have noticed that Rosetta keeps sending out wu's that have come back with a client error like this one;
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=3799981
and this one
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=3728658

I understand the result generation process so please no comments on that. I am just pointing out that the devs should consider limiting the number of results that can be generated on a bad wu to 3 only.

Live long and crunch.

20) Message boards : Cafe Rosetta : irc.freenode.net #rosetta@home (Message 3646)
Posted 19 Nov 2005 by Profile The Gas Giant
Post:
#boinc has a large number of active users, where we discuss all things boinc and nearly everything else under the sun and beyond!

Live long and crunch.


Next 20



©2024 University of Washington
https://www.bakerlab.org