Posts by HA-SOFT, s.r.o.

1) Message boards : Number crunching : Client errors (Message 75293)
Posted 27 Mar 2013 by HA-SOFT, s.r.o.
Post:
I can confirm no client errors more.

Nice job Jacob.
2) Message boards : Number crunching : Daily quota (Message 73711)
Posted 28 Aug 2012 by HA-SOFT, s.r.o.
Post:
I tried it without WCG. Only rosetta and gpugrid together, but no luck.

GPUgrid, that's running on GPU, isn't it? Try without that. Some of your machines are completing Rosetta tasks successfully, are that those without GPU crunching?


These are windows machines, where the situation is not so bad. I think it's related to linux with combination of 7.xx client. I know that rosetta is worse on linux than on windows.

I'll try rosetta without gpugrid, but I think that problem is not in crunching, but in sending/validating.
3) Message boards : Number crunching : Daily quota (Message 73706)
Posted 27 Aug 2012 by HA-SOFT, s.r.o.
Post:
do the errors occur only or mostly on machines also running GPU tasks? Or also on machines without GPUs used for crunching? Some people reported that in the other thread. BTW, minirosetta 3.41 is out, you could try that.


I tried it without WCG. Only rosetta and gpugrid together, but no luck. Client error again. If I look into log of jobs with client error, they are normal with succesfull boinc finish. I think there may be a problem with interpreting results on server side.
4) Message boards : Number crunching : Daily quota (Message 73697)
Posted 26 Aug 2012 by HA-SOFT, s.r.o.
Post:
Summary of my experiences with rosetta from last 6 months:

1. Boinc 7.xx and windows xp (32bit) as service = No problems
2. Boinc 7.xx and windows server 2008 R2 x64 as service = No problems
3. Boinc 6.xx and 7.xx with windows 7 64 bit (intel i5) = 50% error, 50 % ok
Does not matter of Boinc version and service/noservice install
4. Boinc 6.xx and Ubuntu 11.04, 11.10 x64 = no problem
5. Boinc 7.xx and Ubuntu 12.04 x64 = first week after install ok, next week all tasks with client error

Other project I'm running:

1. WCG CPU ok (6.XX and 7.XX, windows and Ubuntu)
2. Primegrid GPU ok (6.XX and 7.XX, Ubuntu)
3. GPUGRID ok (6.XX and 7.XX, Ubuntu)
5) Message boards : Number crunching : Daily quota (Message 73682)
Posted 23 Aug 2012 by HA-SOFT, s.r.o.
Post:

You have two problems...1st is you are using an unapproved version of Boinc here at Rosetta, they have NOT approved the version 7.X.X series yet and you are using version 7.0.31. 2nd you are also gpu crunching with your pc, that is another deal breaker for the Rosetta workunits, they do not share well with others! Rosetta has consistently said they are happy with the settings they have and are not going to be changing them.


I'm using 7.0.27 with Ubuntu 12.04 LTS and this is the main problem. GPU project (GPUGRID) is ok with rosseta (1 year without problems with 6.x version).

I have upgraded PC's to Ubuntu 12.04 one or two weeks ago. I will disable rosseta on these computers and wait till 7.x support.

Thanks
Zdenek
6) Message boards : Number crunching : Daily quota (Message 73680)
Posted 23 Aug 2012 by HA-SOFT, s.r.o.
Post:
Since your computers are hidden I can only guess, but probably you are throwing out only errors (on a 8 core machine). Unhide your computers, than we can eventually see the reason.


I did and yes there are many errored task with CLientError. I miss it. According to log all task are calculated ok and succesfuly finished. I don't know why there is clienterror and what this error means.

Thanks
Zdenek
7) Message boards : Number crunching : Daily quota (Message 73677)
Posted 22 Aug 2012 by HA-SOFT, s.r.o.
Post:
Hello,

I have got a message:

reached daily quota of 8 results

and no tasks sent. Is it a new rule on rosetta? I have never seen it before.

Zdenek
8) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58754)
Posted 12 Jan 2009 by HA-SOFT, s.r.o.
Post:
StdErr is empty or contains message about access violation on 0xc0000005. Application hangs with 3MB RAM and does nothing. I have for example about 10 minirosetta apps in memory that do nothing. When I kill them, there is not stderr or any other file in slots directory.

greb_be and all,

When there is a new version of minirosetta update, we usually put a windows debug symbol image in a downloadable location. So when a WU crashes out, it should provide a backtrace of how an error is caused (this does not work every time and that makes our debugging very hard). If it is an error from Minirosetta program or bad command line/input file setup, the stdout or stderr usually will print out a message as hints, for example, the hbond NAN problem in the previous versions. Also, we should see a significantly higher error rate among either all or certain batches of WUs running. If it is caused by interfacing with the host's hardware or software, we will usually see that certain client hosts kept encountering errors or failure. We wish we could tell what have been wrong in every scenario when an error occurs, however, most of us Rosetta developer are far from being an expert on computer software/hardware and we can only hope to trap errors locally on our testing machines to continue with debugging.

Thank you all for voluntarily helping us on doing this project and sorry about any inconvenience/trouble caused on your computer. Please continue to report problems and/or possible fixes you have found as every bit of such information will certainly help us to improve R@H stability and resolve hidden bugs/problems sooner or later. Happy holidays to every one and happy crunching!


9) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58132)
Posted 23 Dec 2008 by HA-SOFT, s.r.o.
Post:
I have the same problem on 64 bit Win 2008 server only for all Minirosetta tasks. Minirosetta 1.45 had this problem too. All other PC (32bit, XP64bit) have no problem.

Zdenek


Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.

Can you tell me how to see the difference between a error due to windows or OC speed vs a program error that triggers a windows dump with '-1073741819 (0xc0000005)'?

Thanks again for the reply.


10) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 57272)
Posted 27 Nov 2008 by HA-SOFT, s.r.o.
Post:
I have problem on W2008Server 64 bit, where all Minirosetta task hangs at 0.00 progress. Rosetta beta work ok. BOINC 6.2.19

Zdenek






©2024 University of Washington
https://www.bakerlab.org