Posts by Honza

1) Message boards : Number crunching : New credit system now being tested at RALPH@home (Message 22130)
Posted 9 Aug 2006 by Honza
Post:
Note:

CPDN's RAC's were really inflated across the board before their new credit system went into affect.
Note that CPDN had fixed credit per model since the very begining, IIRC. It was Einstein that somehow followed years later (SETI uses different method).

AMD_is_logical - what you say is logical :-)
The thing is - where is the assigning credit fairly method...

A server driven credit estimation a-la Einstein can be considered quite fair, IMO.
(we can't simple old & good one from CPDN due to quite numerous result types).
2) Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III (Message 14673)
Posted 26 Apr 2006 by Honza
Post:
This ResultsID got only 1.356 after 15 hours on X2, manual abort in place I guess.
http://boinc.bakerlab.org/rosetta/result.php?resultid=18214281
On another machine that had the same WU, it got automatically aborted with -161 error in less than one hour...
3) Message boards : Number crunching : Report Maximum CPU Time Exceeded WU HERE (Message 10697)
Posted 12 Feb 2006 by Honza
Post:
Had a WU taking ~70 hours which has not errored out due to long processing
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=8363123
4) Message boards : Number crunching : Report stuck & aborted WU here please (Message 10696)
Posted 12 Feb 2006 by Honza
Post:
Mercyfully killed WU after ~70 hours on Pentium D; the other resultID (also Pentium D) went fine.
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=8363123
5) Message boards : Number crunching : @ Dave Baker (Deadlines causes EDF) (Message 10362)
Posted 2 Feb 2006 by Honza
Post:
@ JM7 and Paul.
I'm happy that some BOINC old-timers with in-depth insight into BOINC feel that idea a was trying to express is not a nonsense.

How about simply sort there slots where 'priority' would be a key?
Some slots may server as 'express line'.

I see that scheduler was not designed as a smart one. May serve pretty well for simple SETI where all WUs are the same (size, type and priority/deadline) and where there are x100k WUs sent/received every day.

I remember Carl (of CPDN) was trying to improve the scheduler so it at least consider about of memory of a host (and sent less demanding Slab or more demanding Sulphur). This should be usefull for life-science projects as well where WUs types are present. Same for BURP.
Just another example to have 'smarter' scheduler...

Perhaps BOINC 6.x ?
6) Message boards : Number crunching : @ Dave Baker (Deadlines causes EDF) (Message 10288)
Posted 31 Jan 2006 by Honza
Post:
The BOINC client uses earliest deadline first not FIFO. It may appear FIFO when all the tasks for a given project have the same initial deadline.
Well, I was talking about server scheduler, not BOINC client! [It is apparent that when WUs are downloaded, there is not much space to handle WUs other than deadline, EDF etc.]
AFAIK, scheduler is a simple pool: you put some WUs in queue and when machine ask for more WUs, scheduler provides them - old ones first. It ignores machine characteristics that helps faster return of WUs - Average turnaround time on first place.
Again, it should be more effective to provide WUs with shorter deadline to machines with low average turnaround time (more reliable), small WUs cache, high % uptime etc. If there machine characteristics were considered, 'important' WUs should have been returned within hours.
Does this make sense?
7) Message boards : Number crunching : @ Dave Baker (Deadlines causes EDF) (Message 10222)
Posted 30 Jan 2006 by Honza
Post:
I have suggested this several times: why don't use smarter scheduler unlike primitive FIFO that runs since BOINC started?
Each host has several usefull characteristics. Among others like % BOINC is running is "Average turnaround time". Send WUs were you need results soon to those machine with (i) fast CPU, (ii) high % of BOINC running, (iii) low Average turnaround time and you get results the same day.

Data of all host characterictic is all there - why don't use it?
Improved scheduler on server side - FIFO was good in 60's...we should do better know.
8) Message boards : Number crunching : BOINC over a cluster (Message 7732)
Posted 27 Dec 2005 by Honza
Post:
It is up to you how many CPUs can BOINC use
Just go to the General preferences ( http://boinc.bakerlab.org/rosetta/prefs.php?subset=global ) and set "On multiprocessors, use at most ... processors"
It should be fine with 8 if you get to a quad dual-core Opteron machine for example.

When running a farm, I would suggest to use BOINCView ( http://boincview.amanheis.de/ ) to monitor/control machines.

You can run BOINC over a network, from a Ramdrive or even from a flash disk [only CPDN is not suitable for this since it is quite HD space and I/O demanding].
9) Message boards : Number crunching : 32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!) (Message 6813)
Posted 19 Dec 2005 by Honza
Post:
The other slot is SETI. I was solely running SETI until their servers stopped handing out work because they were overloaded. I now run at least two projects per machine just to maximize CPU usage.
Thanks for making it clear.

For those with curiosity and to prevent further download of this quite large file, I'm attaching first lines of the 32GB file.
Now it's quite clear that Rosetta is involved in this issue.


# =====================================
# random seed: 793501
# =====================================
# =====================================
# random seed: 801521
# =====================================

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C911E58 read attempt to address 0x402118E8

1: 12/18/05 12:09:11
1: SymGetLineFromAddr(): GetLastError = 126
1: SymGetLineFromAddr(): GetLastError = 126

[last line repeating ad infinitum].
10) Message boards : Number crunching : PURGE facility please (Message 6809)
Posted 19 Dec 2005 by Honza
Post:
although I think this can be controlled by smart allocation of jobs in the first place - send urgent jobs to quick/reliable crunchers.
Correct - that's what I was ment to say.

@ Ingleside - it is evident that "smart" schedulling doesn't work easy when there is no "smart" time-to-complete estimate.

Take into account, that finishing a WU takes about the same on same machines. Once a first results is in place, a time-to-complete is better known [when erroring out, lower time-to-complete is known].
Resending a WU to slow machines with rare internet connection and not running 24/7 (just to name other aspects of quick/reliable) is not "smart" anyway.

If there are WUs that takes dozens of hours or even days, such project should take some knowledge from CPDN and implement trickles - a partial result/progress upload and sending message that WU is still being processed.
This is another way to help make schedulling smart.

11) Message boards : Number crunching : 32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!) (Message 6804)
Posted 19 Dec 2005 by Honza
Post:
I see from your screenshot, that you have 2 slot folders but only single-cored, signle CPU machine. This indicate that you are or have been attached to more that one project (Rosetta).
So, this may be a rare BOINC but as well.

Still not nice.
12) Message boards : Number crunching : PURGE facility please (Message 6776)
Posted 19 Dec 2005 by Honza
Post:
A purge job is not a new idea. Actually, it has been implemented on CPDN already.
There is a slight difference: purge job on CPDN SpinUp is to terminate ongoing model (200 years or 3.000 CPU hours is a long WU). This so-called trickle-down is ment to terminate model upon core team decison when they see that particular model is not needed to be finished in whole (e.g. 150 yers is enough, 'cause it already went stable).

I agree that there are some aspects of BOINC schedulers in order to make computing effecient.
Another approach to this problem might be that scheduler doesn't simply act like a FIFO queue but sends WUs to host according to their Average turnaround time.
This might be used in several cases:
- when results are needed really soon, send them to host with low Average turnaround time
- send test WUs to such hosts so that team knowns fast if a new application/WU works
- if a WU fails, re-send it to such host.
Such approach might even lower queue "to validate" hence lower disk space.
It may also prevent users from waiting too long to credit.

I think that such approach is suitable for any project with or without WUs redundancy.
13) Message boards : Rosetta@home Science : Dont allow to do another works with the computer....... (Message 6743)
Posted 19 Dec 2005 by Honza
Post:
Hi Levent,

it seems that your computers are capable of running Rosetta.
What version of BOINC are you running?

In your profile, under general preferences, you can:
Change "Do work while computer is in use?" to No if it helps.
I would set "Leave applications in memory while preempted?" to yes.
Perhaps it is better to tell your host with only 256MB RAM to use only 1 CPU ("On multiprocessors, use at most"

Any antivirus running? I would exclude BOINC folder from scanning [It is a known issue than AV may crash a computation by locking some files; spotted on CPDN].

Any overclocking? Heating issue?
Have you performed some stability tests like Prime, Memtes etc?

EDIT: I'm a bit confused with name of the tread - would you like to trying to solve the problem or disallow to download more work for Rosetta?
I would first try to solve anyway...in case you do not success, choose "No more work" fo Rosetta on project tab of BOINC Manager.
14) Message boards : Rosetta@home Science : Where we are and where we are going. (Message 6419)
Posted 16 Dec 2005 by Honza
Post:
Thanks for the update, appreaciated.
15) Message boards : Number crunching : How to have the best BOINC project. (Message 6344)
Posted 15 Dec 2005 by Honza
Post:
honestly whatever works to help the science work better is what we want.. i like the list and agree comunication isnt listed enough...

Agree.

How about next point:
20. finish what has been started/promised.
Just put together "honestly whatever works to help the science work better is what we want.." and "code release and redundancy" thread.
We thought it would be good to give out the code because we thought 1) people would be interested in seeing it, 2) compilation and code performance on a much wider array of platforms than we have in house could be optimized and 3) experts could experiment with variations on search strategies. But because of the many concerns I am reconsidering this--keeping all of you happy is clearly critical!

A one month lack of communication on this topic from core team (until a participant raise his voice) can be seen in constrast with points #1-3. I understand the connection with point #4 (plaforms support) and #9 but - unfortunatelly, this topic have not moved much further.
I understand that doing science (and programming as well) is a continuous process so there may be no any point of "finish". That, a feedback of progress (see point #1-3) takes place.

This is not to state Rosetta is a bad project. Actually, I rate Rosetta very high among other BOINC or DC projects (and I have participated in many of them since last century). Any project should follow this "How to...".
This is just to make it a bit more balanced.
[Hope my English have not taken me down].
16) Message boards : Rosetta@home Science : Not getting enough work (Message 5935)
Posted 12 Dec 2005 by Honza
Post:
I'm also getting "no work" on one Windows machine, albeit Scheduler status says "Queued: 85,690"
17) Message boards : Rosetta@home Science : Graphics (Message 5819)
Posted 10 Dec 2005 by Honza
Post:
Graphics seem to be down today? Am I the only one or is this a Boinc problem?
Same here - just attached new machine to the project and downlaoded rosetta_4.80_windows_intelx86.exe
But the graphics is there - it is just not name "graphics" as - I assume -application with graphics becomes a standard version.
18) Message boards : Rosetta@home Science : Unanswered quetions about rosetta@home (Message 5686)
Posted 9 Dec 2005 by Honza
Post:
3) What's the difference/relation of rosetta@home with folding@home and predictor@home? Aren't you just duplicating research here?

Not an expert but at least remember such discussion there before
How Is Rosetta Different Than Folding@home? - http://boinc.bakerlab.org/rosetta/forum_thread.php?id=63. I'm sure you may find more on this topic throughout the forum.

Topic 1) and 2) were already mentioned...and at least partially answered.
As they are spread on the forum, perhaps it's better to search posts from Rosetta projects members (David Kim, David Barker, Jack Schonbrun etc.).
After that, your questions should be answered quite well.
19) Message boards : Number crunching : code release and redundancy (Message 5356)
Posted 7 Dec 2005 by Honza
Post:
CPDN, with the "bonus" for completing models and flat rate granting of credit per trickle in fact gives a higher "return" per second.
Yes, fixed credit per trickle/model, that's right - but what "bonus" are you talking about?
CPDN is not the only one project that give fixed credit per WU type. I think Primegrid is following the same patern.
20) Message boards : Number crunching : code release and redundancy (Message 5301)
Posted 6 Dec 2005 by Honza
Post:
come on, guys - wasn't the thread originally about credit but a promised code realease? Or is the credit system restraint to release it?
I understand that redundancy is conntected with the issue as there might be a problem of validity.
For me, it's not about credit...rather as Jack pointed out "the goal is to have as many productive cpu cycles as possible.". Code release and it's optimalization is the way to make CPU more productive.
Credit doesn't make CPU more productive...but may "motivate" some to migrate from one project to another. This is, IMO, not a best way to attract people. Well optimalized application, graphics (which Rosetta has one of the best), active science team members on forum etc. - this is what charms many users. I mean - [some] credit is on every project but some provides more...


Next 20



©2024 University of Washington
https://www.bakerlab.org