Posts by sslickerson

1) Message boards : Number crunching : Wall clock vs CPU clock (Message 61611)
Posted 7 Jun 2009 by Profile sslickerson
Post:
Credit given in this project is based on the actual amount of work done and not the amount of CPU Time. Credit for Rosetta is given according to the number of decoys generated in the task. More information on the credit system can be found here.


Ok I understand that but it brings up another question then.

Typically my laptop is far slower than my desktop as shown here: laptop, desktop.

Both of these two above workunits completed 99 decoys and were thus stopped by the watchdog. My desktop did it about 3 times faster than the laptop but was granted about 1.5 times the credit. I understand that different wu's get different granted credit but it seems to me that if all things are considered equal when time is not a factor (i.e. OS and BOINC version do not matter) then shouldn't these both get the same credit even though they are of different types? They both did 99 decoys, that is to say they both did the same amount of work but it seems that my slow laptop is getting more credit than it deserves but my faster desktop is getting less.

Sure there might be parity among all computers doing the same type of wu but it seems there is not parity among all types of workunit which then leads to inflated/deflated credit based on someone simply placing a number of indeterminate value on a wu (because each wu is valued in comparison to itself not all the other dozens of wu.)

I don't know, perhaps it all evens out in the end, but if appears there is too much guesswork going on here.

Thanks,

Timothy

Edit: from the time I wrote this and posted it my link for the laptop went dead. The one that is currently there is close to the original one I was referencing but not quite but I still think it holds.
2) Message boards : Number crunching : Wall clock vs CPU clock (Message 61606)
Posted 7 Jun 2009 by Profile sslickerson
Post:
I've recently noticed that BOINC manager displays elapsed time as the time it took for the workunit to finish including time in which the CPU was being used for some other task.

For instance, this workunit shows a CPU time of about 86400 seconds (my runtime is 24 hours) but BOINC manager showed over 29 hours to complete or about 105,000 seconds.

THE RDCF for this laptop is 4.8. I manually changed the RDCF to 1 and uploaded two wu which in turn changed it back to 4.8.

This computer claimed 381 credits for that workunit and was granted 147! This is typical for all workunits on this computer.

Is there something going on here or can I expect this continue because it really stings when I can expect granted credit to be only 40% of claimed credit.

Timothy
3) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58812)
Posted 14 Jan 2009 by Profile sslickerson
Post:
Thanks Paul and everyone else. I'll give these suggestions a try sometime this week (besides Rosie is out of work for the time being).

In lieu of any direct reply, I note that every recent job for sslickerson has completed successfully.

Looks like Boinc 6.4.5 answers at least one person's problems with MiniRosetta WUs. Worth thinking about for anyone with otherwise persistent problems, it seems.


Hey there, sorry about not replying. Actually, the Rosetta Wu's you are looking at are on my desktop (BOINC 6.4.5) which *typically* does not have issues with minirosetta. I have not allowed work on my laptop (BOINC 6.5.0) since the last batch of errors, so I am uncertain if the update would have fixed the issue.

I am going to reattach to RALPH for awhile and hopefully if there are errors we can get them fixed over there.

Timothy
4) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58499)
Posted 4 Jan 2009 by Profile sslickerson
Post:
Thanks Paul and everyone else. I'll give these suggestions a try sometime this week (besides Rosie is out of work for the time being).

Also check to see if processor usage is set to 100% ...

I saw a note on EaH that with windows and the processor usage not set to 100% this is a common error. In that this killed about 20 models here for me ... I am interested if this is really the case ... I know ROsetta runs well on OS-X in that I have not had any failures there ...

On Win XP I got 10 failures out of about 20 tries ... which is when *I* gave up again on RaH ...

I had set usage to 99% to give me a little more head room and that may have been enough to farble things up ...

Anyone up for the test?

THis is addressed to the "Cant' acquire lock-file" problem only ...

5) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58486)
Posted 4 Jan 2009 by Profile sslickerson
Post:
Just for the fun of it I checked my desktop (AMD 4200+) for any errors, typically this one is and has been rock solid for years. Lo and behold, there was one error there that occurred in the past few hours with the same error as my vista laptop. So the error is not machine or cpu specific (AMD vs Intel...XP vs Vista) it has happened in each (as far as my setup at least).

AMD 4200:218361409
6) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58480)
Posted 4 Jan 2009 by Profile sslickerson
Post:
@ greb_be & robertmiles

Thanks for looking into this. I let rosetta run last night with increased runtimes and I left the application in memory but I see that 1 wu did fail: 218380754 for the same reason as before.

Also of note, there were 20 that failed because of client error while downloading--couldn't get input files, MD5 check failed: 218580846 for instance.

On this computer, I have set Rosetta to no new work and I had to abort the remaining wu's. I really want to attach here but the problems are far too severe at the moment. Perhaps I'll try again in 6 months, but I must say, this is getting a bit old...
7) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58445)
Posted 4 Jan 2009 by Profile sslickerson
Post:
@greg_be

No, I am running stock. I lowered my runtime to 1 hour (thus no switching of apps) and of the 4 completed MR that have completed, all look like they will validate. Is there causation here, idk, but I would be interested to know.

It seems like the 4 or 5 times that I have come back to Rosetta with this setup (64bit Vista) everything works well until the runtime is increased to greater than 1 hour. Perhaps I will increase the runtime but switch to "leave app in memory" to see if there is any change...

I've had a fairly consistent failure rate for the mini-Rosetta app on my 64bit Vista computer for several months now (hence the reason why it is rarely crunching here). I thought I saw some light at the end so I attached again yesterday only to find 3 more tasks that have failed. All have error code:

-1073741819 (0xc0000005)

The workunits are as follows:

218380490
218380489
218380488

I do hope project staff will look into these. I would really like to get back over to ROSETTA on this machine but I can' waste the cycles without the fix. I can run some RALPH WU if this is needed to track it down. Also, all three WU had messages reporting that the "Output file was missing" prior to failure.

Edit Added: Paul Buck mentioned a few posts ago that his tasks that failed were possibly suspended and I know for a fact that the tasks that failed on my computer were indeed suspended and were not left in memory after the suspension.


quick qustion. are you OC'd at all?
this looks like what I had when my OC speed was to high.
I lowered it and all was ok.


8) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58425)
Posted 3 Jan 2009 by Profile sslickerson
Post:
I've had a fairly consistent failure rate for the mini-Rosetta app on my 64bit Vista computer for several months now (hence the reason why it is rarely crunching here). I thought I saw some light at the end so I attached again yesterday only to find 3 more tasks that have failed. All have error code:

-1073741819 (0xc0000005)

The workunits are as follows:

218380490
218380489
218380488

I do hope project staff will look into these. I would really like to get back over to ROSETTA on this machine but I can' waste the cycles without the fix. I can run some RALPH WU if this is needed to track it down. Also, all three WU had messages reporting that the "Output file was missing" prior to failure.

Edit Added: Paul Buck mentioned a few posts ago that his tasks that failed were possibly suspended and I know for a fact that the tasks that failed on my computer were indeed suspended and were not left in memory after the suspension.
9) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55593)
Posted 7 Sep 2008 by Profile sslickerson
Post:
I bought a new laptop a few days ago. I'm running Vista (x64) and I seem to be getting a lot errors. This error has shown up a couple times (can't acquire lock file). There are also 4 or 5 of theseError code:200.
10) Message boards : Number crunching : Minirosetta v1.28 bug thread (Message 54305)
Posted 8 Jul 2008 by Profile sslickerson
Post:
mine was able to complete 2WUs but, the deadline for submission id Dec13,19001 and it wont upload the finished WUs just in the state of "ready to report"been days now in that state.. won't even download another work units..


Check and make sure your system clock is set to the right date and time.
11) Message boards : Number crunching : WU time limiting - how works? (Message 54173)
Posted 4 Jul 2008 by Profile sslickerson
Post:
The WU in question is here

His runtime preference is 57600 seconds but the task finished at 117945 seconds, a full 16 more hours than required. For me, this means that the WU did complete 10 decoys before his runtime preference was achieved, however; the application thought it could do one more during the alloted time. It started and did not complete for at least 16 hours. It seems that something very wrong is going on here.
12) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54170)
Posted 4 Jul 2008 by Profile sslickerson
Post:


As long as the CPU is running, the WU is alive and well.

My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...[/quote]

Finally finished at 24:09:10 CPU

Clalmed Credit 189.06
Granted Credit 97.39

not as good credit/work as normal but thanks for the confidence to let it complete[/quote]

But as you can see the credit discrepancy is huge. This is actually a problem as far as I am concerned. The project staff should look into why this is happening.

The application would not do say 5 decoys, decide it had time to do one other and then take *14 hours* to do so. This is a huge problem and I hope the project staff are looking into it.

Tim
13) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54129)
Posted 1 Jul 2008 by Profile sslickerson
Post:
Here is a really bad WU, even though it validated: 174615818

My runtime preference is 7200 seconds but this one ran over 29000 seconds but here is the kicker: Claimed credit 102.7, Granted credit 13.4.

and another with the same problem: 174641989

3-5 hours of wasted credit is a huge problem!
14) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54085)
Posted 30 Jun 2008 by Profile sslickerson
Post:
Here is a very fast error running version 5.98 on Windows XP: 174404220. It looks like it failed on at least one other host in the same manner.

15) Message boards : Rosetta@home Science : FoldIt 404 (Message 54036)
Posted 28 Jun 2008 by Profile sslickerson
Post:
Perhaps you were using the URL from before it went public.


I was, when I updated to firefox 3 my bookmarks had to be reloaded from the delicious website and the old url was loaded instead of the new url. I think...
16) Message boards : Rosetta@home Science : FoldIt 404 (Message 53975)
Posted 24 Jun 2008 by Profile sslickerson
Post:
oh, i just realized the URL I have in my bookmarks is different than the current one, this must have recently changed.
17) Message boards : Rosetta@home Science : FoldIt 404 (Message 53965)
Posted 24 Jun 2008 by Profile sslickerson
Post:
great! it was down for almost a day on my end, weird...
18) Message boards : Rosetta@home Science : FoldIt 404 (Message 53957)
Posted 24 Jun 2008 by Profile sslickerson
Post:
Anyone else seeing a 404 error at the foldit main page?
19) Message boards : Number crunching : Problems with version 5.96 (Message 53905)
Posted 21 Jun 2008 by Profile sslickerson
Post:
Are you absolutely sure it was hung up?


Yes, before I suspended Rosetta, none of my tasks of version 5.96 went to completion at 100%. All were hung prior to 100%. Task manager in windows showed exactly 50% system idle process. One core was always idle.
20) Message boards : Number crunching : Problems with version 5.96 (Message 53889)
Posted 21 Jun 2008 by Profile sslickerson
Post:
I posted minirosetta version 1.29 and rosetta_beta 5.97 on ralph which both include a fix for this bad bug that stalls clients. The problem was a possible infinite loop in the boinc api when an access violation caused by our t405 job was caught after the job completed. Hopefully the tests running on ralph will confirm the fix.


Thanks David! I really need to get back to Rosetta. Let's hope this works.

Tim


Next 20



©2024 University of Washington
https://www.bakerlab.org