Posts by Yeti

1) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50680)
Posted 14 Jan 2008 by Yeti
Post:
And one word from me:

Please, discuss things like Rosetta against Ralph please in a different thread; I restarted crunching Rosetta with 5.93 and was looking, if something relevant is to be find about Errors with 5.93, but I had to read all your discussion.

Yes, the content of this discussion is okay, but for me it is definitely the wrong place in this thread
2) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50679)
Posted 14 Jan 2008 by Yeti
Post:
Here is a 5.93er WU that errored with Exit status -1073741819 (0xc0000005)

http://boinc.bakerlab.org/rosetta/result.php?resultid=133082830

The box is a Double-Quad-Xeon, running 2003 Server 64 Bit with 8 GB memory

3) Message boards : Number crunching : Help us solve the 1% bug! (Message 8933)
Posted 13 Jan 2006 by Yeti
Post:
Easy to answer, is anyone getting 1% errors commonly NOT running PPAH?

HM, if I remember right, I saw the 1%-problem on boxes, that had no PPAH-units
4) Message boards : Number crunching : Boinc/Rosetta not working anymore (Message 8923)
Posted 13 Jan 2006 by Yeti
Post:
Depending on the versions of BOINC:

Port 1043
Port 31416

5) Message boards : Number crunching : Help us solve the 1% bug! (Message 8921)
Posted 13 Jan 2006 by Yeti
Post:
David,

all the time I can't avoid thinking of a different reason :-(

Could it be, that the problem is related to something like a change from one WU to the next and rosetta not clearing all intern buffers / variables / tmp ... ?

As you can easily see in my sig, I'm running a lot of different projects; when other projects had something similar, it could be found a reason in the project-client. Since last summer, I didn't watch such a problem with a non-rosetta-applikation, so I guess, it must have something to do with rosetta ...
6) Message boards : Number crunching : Help us solve the 1% bug! (Message 8842)
Posted 12 Jan 2006 by Yeti
Post:
David,

for our understanding: If a WU is stuck at 1% and BOINC (and Rosetta) are restarted, is the random seed normally the same or does this number change by restarts ?

7) Message boards : Number crunching : Report stuck work units here (Message 8715)
Posted 10 Jan 2006 by Yeti
Post:
Wao, I got one :-)

WU-Name: INCREASE_CYCLES_10_1hz6_226_6922_0

The last 20 lines of stdout:

Size: 3 NUMBER OF FRAGS FOR POS: 53 200
Size: 3 NUMBER OF FRAGS FOR POS: 54 200
Size: 3 NUMBER OF FRAGS FOR POS: 55 200
Size: 3 NUMBER OF FRAGS FOR POS: 56 200
Size: 3 NUMBER OF FRAGS FOR POS: 57 200
Size: 3 NUMBER OF FRAGS FOR POS: 58 200
Size: 3 NUMBER OF FRAGS FOR POS: 59 200
[T/F OPT]New TRUE value for [-jitter_frag]
[REAL OPT]Default value for [-jitter_amount] 2
[STR OPT]New value for [-jitter_variation] gauss.
score0 done: (best, low) rms
2 0 10.9471054
---------------------------------------------------------
score1 done: (best, low) rms (best,low)
-4.41134071 -17.6313858 8.64116001 4.5229826
standard trials: 20000 accepts: 585 %: 2.925
-----------------------------------------------------
Alternate score2/score5...
kk score2 score5 low_score n_low_accept rms rms_min low_rms
0 -27.503 -12.841 -27.503 28 4.523 3.605 4.523


I have saved the whole boinc-directory, so, if you are interested, I can zip it for you and put it on one of my servers so that you can download it from me

The WU should normally take 6 hours on this machine; until now it has taken 6 hours and says 1%

8) Message boards : Number crunching : No New Work (Message 7308)
Posted 23 Dec 2005 by Yeti
Post:
12/22/2005 7:06:50 PM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
12/22/2005 7:06:50 PM|rosetta@home|Reason: Requested by user
12/22/2005 7:06:50 PM|rosetta@home|Note: not requesting new work or reporting results
12/22/2005 7:06:55 PM|rosetta@home|Scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi succeeded

All seems to be fine !

The ResourceShare on your client says, that it is not time for Rosetta to get work; you can hit the update-button as often you want, you won't get new work from Rosetta.

You have given Rosetta 3% ResourceShare. Your possibilies are:


  • wait, until the client asks really for work from Rosetta
  • raise the ResourceShare for Rosetta



Your local scheduler will kep an eye on your projects and when he/she/it thinks, it is time to take work from rosetta, it will do

9) Message boards : Number crunching : Rosetta alpha ?? (Message 7283)
Posted 22 Dec 2005 by Yeti
Post:
RALPH seems to be a good idea, you could count me in.
10) Message boards : Number crunching : Please abort WUs with (Message 7062)
Posted 21 Dec 2005 by Yeti
Post:
Thanks for the open information, Jack :-)

Is known, wheather all of the "bad" WUs are now out of the "download sequence" and we are back to business as usual or are these faulty WUs are still in there ?
11) Message boards : Number crunching : Please abort WUs with (Message 7045)
Posted 21 Dec 2005 by Yeti
Post:
At the moment, this problem seems to start a deadly cycle ...

Boxes more often ask for work:


  • more traffic on your side
  • scheduler not fast enough



I have several boxes, that didn't get work in the last 60 minutes, but they don't have reached the daily quota.

And I see boxes, that have delayed / interrupted downloads.

Please, keep an eye on the servers ...

12) Message boards : Number crunching : Please abort WUs with (Message 7040)
Posted 21 Dec 2005 by Yeti
Post:
HM, never took a look how big are the WUs of Rosetta to download ...

16 of 17 boxes are on an internet-connection, where I have to pay for each GB, but it's not too much. So, if the WUs are not too big, I keep them downloading ...

Any hint for me ?
13) Message boards : Rosetta@home Science : Has anyone created a very simplified (1) page explaination? Pls Read (Message 6770)
Posted 19 Dec 2005 by Yeti
Post:
Hi J.

Yeti is saying he creates a new user account in Windows that BOINC will run under, rather than running it under the currently logged in user.

As part of Windows security you can set permissions on exactly what folders can be accessed by programs running under this account, so Yeti sets the BOINC folder as the only one accessible to this account. That means that in the VERY unlikely event of a virus etc being downloaded and started, the virus would only have access to the BOINC folder.

It's another level of security that's available.

HTH
Danny

Thanks Danny, you described it better than I could have done
14) Message boards : Number crunching : BOINC Manager is not able to connect to a BOINC client (Message 6729)
Posted 18 Dec 2005 by Yeti
Post:
If you have problems with port 1043, another possibility is to run the command-line-client(s).

Shurely it will take some time to configure them, but it should work ...

15) Message boards : Number crunching : BOINC Manager is not able to connect to a BOINC client (Message 6649)
Posted 18 Dec 2005 by Yeti
Post:
i guess, he will not even see a project.

Which client are you using ? There is a known problem, running "normal mode" on boxes with a personal-firewall.

At the moment, best solution is to use BOINC 5.2.14.

If the problem still appears, exit BOINC-Manager, then go to TASKMANAGER and stop BOINC.EXE. If you then start BOINC-Manager again, it should work.

Or, install BOINC as a [edit]service[/edit], then this problem doesn't occur.

Developers are working hard on this problem, and it has become better with 5.2.14, but they still work on it.
16) Message boards : Number crunching : How to have the best BOINC project. (Message 6585)
Posted 17 Dec 2005 by Yeti
Post:
Uuh, this seems to be a collection of oldtimers ;-)

Maybe we should open a club ;-)
17) Message boards : Number crunching : Something wrong with Server-Side-Scheduler (Message 6423)
Posted 16 Dec 2005 by Yeti
Post:
@Ingleside

It is important for calculating, how much work to fetch or send, to respect, which projects are suspended on the client or have the switch "no new work".

I have 5 - 10 projects attached to my clients, several are suspended or switched to "no new work". Normally, I let my clients crunch two main projects and 1 or 2 backup-projects.

This looks like:

Rosetta 48%
LHC 48%
P@H 1%
E@H 1%
...

So, LHC has often no work, when then Rosetta has no work, I often saw from project-schedulers: Won't finish in time, project get's only 1% of ResourceShare.

That may have been good with the old 4.x Clients; but with the new client-scheduler, this should be changed. The client-scheduler now keeps his eyes on the deadlines, and, so far I could see, he does it very good !
18) Message boards : Rosetta@home Science : Has anyone created a very simplified (1) page explaination? Pls Read (Message 6422)
Posted 16 Dec 2005 by Yeti
Post:
To the point fo safety:

Mheanwhile, you can easily run BOINC in a sandbox (on Windows Systems).

Before installing BOINC, create a specialized-user-account, that only has rights to BOINC-directory. If done, install BOINC as a service, running within and only with this user-account.

Then, you have the maximum of safety, because, if you would get malicious code, it never could do anything bad.

I run all my normal boxes as described above; it works great and without problems; only we can't see the graphics.
19) Message boards : Number crunching : Report stuck work units here (Message 6378)
Posted 16 Dec 2005 by Yeti
Post:
If I watch similar things, shall I collect the data like with this one or shall I wait until you have had a look in this one ?
20) Message boards : Number crunching : Report stuck work units here (Message 6377)
Posted 16 Dec 2005 by Yeti
Post:
I'm not sure if your WU is still a candidate for being hung, perhaps you are having a different problem.

Really hung WU I only had several weeks ago, when I started with Rosetta. In the last week, I watched several times, that WUs keep very long the 1%, but normally, after 1 / 2 / 3 hours they jump to 10%. After this, they go on much faster. (The jump from 1% to 10% has been at 1:35, now the WU has 2:05 and says 20%)

Until we figure out a good way for you to give us the whole slot, if you could go into the active Slot and post the content of stderr.txt, as well as the first 10 and last 10 lines of stdout.txt, that would help a lot.

I have made a copy of the whole slot and saved; I can zip or rar them and e-mail to you. If you don't want to post an email-adress, send me the adress via my registered e-mail-adress.

Or, I tell you an adress where you can download the slot from one of my servers ...

The WU-Name: 1ogw_topology_sample_88400_0




Next 20



©2024 University of Washington
https://www.bakerlab.org