Posts by David Emigh

21) Message boards : Number crunching : teraFLOPS estimate? (Message 53796)
Posted 18 Jun 2008 by Profile David Emigh
Post:
Over the past 8 days, the estimated teraflops for the project has declined by 20%

This is not a blip, but a week long trend that shows no sign of bottoming out.

What is happening?
22) Message boards : Number crunching : Shorter WU time (Message 53770)
Posted 18 Jun 2008 by Profile David Emigh
Post:
Does rosetta give other tasks (other apps or other molecules) if you change your prefrence time to 1hour for example.

What are the (pro's and) conta's for taking short WU times?


I don't think the project makes a distinction based on runtime preference. The only distinction (to my knowledge) is available RAM.

From personal experience, some types of errors can be avoided by setting a shorter runtime preference. There was a bad stretch with one of the rosetta minis (1.20 or something like that) where two of my crunchers were reliably crashing between 12 and 16 hours into a task. By cutting my runtime preference below that threshold, I was able to keep those crunchers online and contributing to the project.

The downside of this is that some of the proteins we study are so complex that it may take a very long time to complete a single model.

If the time to complete a single model exceeds your preferred runtime by more than a factor of four, the watchdog will kill the workunit.

As an example: one of the crunchers I described above recently completed a workunit in which a single model took over 15 hours. If the runtime preference on that computer had been set to 3 hours or less, the watchdog would have killed the workunit.

I hope you will be able to find a way to stay with the project.
23) Message boards : Number crunching : teraFLOPS estimate? (Message 53732)
Posted 17 Jun 2008 by Profile David Emigh
Post:
It looks to me as if the decline has taken place in just the last 6 days.

Reviewing the 60 day history, this looks like a pretty serious drop.
24) Message boards : Number crunching : Problems with version 5.96 (Message 53725)
Posted 16 Jun 2008 by Profile David Emigh
Post:
Compute error here at about 20 minutes CPU time:

resultid=171712340

Large debugger output at the link.

I was the wingman, both computers had similar issues.
25) Message boards : Number crunching : What happened to minirosetta? (Message 53590)
Posted 8 Jun 2008 by Profile David Emigh
Post:
{...}
we are using the older code for problems where the two are pretty much equivalent.
{...}


Thank you for the reply :)
26) Message boards : Number crunching : Problems with version 5.96 (Message 53587)
Posted 7 Jun 2008 by Profile David Emigh
Post:
And here we have a 24 hour validate error:

resultid=169434889

The stderr_txt has basically nothing to say as to why this might be invalid...
27) Message boards : Number crunching : What happened to minirosetta? (Message 53548)
Posted 5 Jun 2008 by Profile David Emigh
Post:
Is it my imagination, or has the project not sent out any mini work units in several days?

I have five cores (soon to be six) crunching away, and none of them has been assigned a mini for half a week or so.

Has anyone else noticed this?
28) Message boards : Number crunching : minirosetta v1.25 bug thread (Message 53524)
Posted 2 Jun 2008 by Profile David Emigh
Post:
Here's a crash at 53000+ seconds:

Maximum disk usage exceeded

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x77100004

Engaging BOINC Windows Runtime Debugger...


A large and detailed debugger message follows the above, available at the link.

My success rate with mini 1.25 stands at 11/14 at this time.
29) Message boards : Number crunching : minirosetta v1.25 bug thread (Message 53437)
Posted 29 May 2008 by Profile David Emigh
Post:
Compute error at 25000+ seconds

resultid=166515872

Moderate sized debugger report at the link.

I'm 3/5 with Mini 1.25 so far.
30) Message boards : Number crunching : minirosetta v1.25 bug thread (Message 53367)
Posted 27 May 2008 by Profile David Emigh
Post:
This one failed in 0 seconds.

resultid=166731460

I was the wingman. This task failed for the original recipient also.
31) Message boards : Number crunching : minirosetta v1.24 bug thread (Message 53354)
Posted 26 May 2008 by Profile David Emigh
Post:
This one would be amusing if it weren't such a waste.

resultid=166286977

It failed at about 12 hours, approximately half of my runtime preference for the host on which it was running, so it's a bona fide crash.

The error message looks fairly tame at first: "can not open psipred_ss2 file tt"

But that line is followed by four or five thousand repetitions of the line: "Error writing"
32) Message boards : Number crunching : Problems with version 5.96 (Message 53325)
Posted 25 May 2008 by Profile David Emigh
Post:
OUCH!

Yes boys and girls, this is a 24 hour validate error.

resultid=165956131

I hate it when this happens...
33) Message boards : Rosetta@home Science : My BOINC no longer working... (Message 53237)
Posted 21 May 2008 by Profile David Emigh
Post:
Hello Warren.

Communication Deferred is a method that BOINC uses to prevent the project servers from being overloaded with requests.

If your BOINC client contacts the server at a time when the server is too busy, it will get a Communication Deferred signal for a random amount of time. When that time has passed, it will try again. If the server is still too busy, it gets another Communication Deferred signal, but for a longer random time. This process repeats until the server is able to handle your BOINC client's request.

You don't have to do anything to resolve it. The process is automatic.

If you have questions like this in the future, you may wish to post them in the Number crunching forum, as you will probably get faster answers.
34) Message boards : Number crunching : Lots of workunit failures... (Message 53169)
Posted 19 May 2008 by Profile David Emigh
Post:
I also have discovered the workaround of decreasing runtime.
35) Message boards : Number crunching : Claimed credit vs grant credit (Message 53083)
Posted 16 May 2008 by Profile David Emigh
Post:
Claimed credit is based on the benchmarks your computer runs when you set up the BOINC client. The factors that affect it include IntegerOps/second, FloatingPointOps/second, and so forth.

For Rosetta, granted credit is based on the number of models/decoys that your computer actually builds during the time it is running the task, irrespective of the amount of time it takes to build each model.

The very first person to report a particular type of task gets exactly the same granted credit as claimed credit.

For each person who reports after the first, the granted credit is a running average of all of the claimed credits previously.

The following is my understanding of how the system works. I could be wrong. I humbly submit to correction by anyone better informed than myself.

Example:

The first person to report completes 10 models and claims 100 credits. They get exactly what they claimed. Those models are worth 10 credits each, regardless of how long it took to make them.

The second person to report completes 8 models and claims 170 credits. They get 80 credits, because the models were worth 10 credits each when they reported. However, the models will be worth a little more to the next person.

At this point, 18 models have been completed and 270 credits have been claimed. The models are now worth 15 credits each.

The third person to report completes 12 models and claims 90 credits. They get 180 credits, because models were worth 15 credits each when they reported. However, the models will be worth a little less to the next person.

At this point, 30 models have been completed and 360 credits have been claimed. The models are now worth 12 credits each.

And so on...


Again, I remind you that the above is my understanding of the process, and my understanding may be flawed.
36) Message boards : Number crunching : minirosetta v1.19 bug thread (Message 53079)
Posted 15 May 2008 by Profile David Emigh
Post:
{...}
Just throwing my thoughts around, but I think it's pointless to post out of memory errors since the problem is already fixed in v1.2.


"Pointless" only for those who: 1) Participate in RALPH@home, 2) Have long runtime preferences, 3) Run Windows operating systems, and 4) Agree with the conclusion that the problem is solved.
37) Message boards : Number crunching : minirosetta v1.19 bug thread (Message 53040)
Posted 13 May 2008 by Profile David Emigh
Post:
Error number 4, at 77,900+ CPU seconds.

Reason: Access Violation (0xc0000005) at address 0x005C1E7C write attempt to address 0x00000024

Large and detailed debugger report available at the link, if anyone is reading those things at this point.

The host that received the above error is 1/4 on mini 1.19 tasks that have a runtime preference in excess of 12 hours, but is 8/8 on mini 1.19 tasks with a runtime preference of 12 hours or less.
38) Message boards : Rosetta@home Science : DSN @ Home : Project cannot launch : no USB CD-ROM drive or MacMini to replace MiniPC (Message 53012)
Posted 12 May 2008 by Profile David Emigh
Post:
{...}
If you do this, I will bid on the computer. My eBay ID is davide405. Please feel free to check my profile and feedback.

I am not promising to be the purchaser of the computer, only that I will bid on it to meet (or exceed) the reserve price as specified above. If someone else outbids me, we all win.
{...}


The above offer is now withdrawn.
39) Message boards : Number crunching : minirosetta v1.19 bug thread (Message 53001)
Posted 12 May 2008 by Profile David Emigh
Post:
A third "Compute Error", this one on 73,000+ seconds of CPU time.

The system cannot find the path specified. (0x3) - exit code 3 (0x3)
40) Message boards : Number crunching : minirosetta v1.19 bug thread (Message 52993)
Posted 11 May 2008 by Profile David Emigh
Post:
A second "Compute Error", this one on 85,000+ seconds of CPU time:

Reason: Access Violation (0xc0000005) at address 0x005C3051 write attempt to address 0x00000024

There is a large and detailed debugger message.

This error, and the one I reported earlier in this thread, have the same signature as the errors I was getting with mini 1.15, errors which crippled two stable and reliable crunchers until I discovered a workaround.

The only difference now is that the mini 1.19 workunits take about twice as long to crash, resulting in twice as much wasted CPU time...


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org