Client error with Rosetta Mini 3.19

Message boards : Number crunching : Client error with Rosetta Mini 3.19

To post messages, you must log in.

AuthorMessage
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 71888 - Posted: 25 Dec 2011, 23:59:34 UTC

Hello,

I am having problems with a new host I just attached to rosetta.

All WU I report show outcome client error.

For example task ID 472808156 (https://boinc.bakerlab.org/rosetta/result.php?resultid=472808156)

However according to the stderr out (see below) I don't have a clue what the Problem is.

The spefic host is new, no OC and crunching successfully for several different projects (Einstein, SETI, WCG, LHC, Primegrid)

Has anybody a clue why this happens?

I stopped rosetta on this host for now.

Best Regards,
Rayburner

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
[2011-12-25 20:14:32:] :: BOINC:: Initializing ... ok.
[2011-12-25 20:14:32:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46494.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 14400
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
Starting work on structure: _00010
Starting work on structure: _00011
Starting work on structure: _00012
Starting work on structure: _00013
Starting work on structure: _00014
Starting work on structure: _00015
Starting work on structure: _00016
======================================================
DONE :: 1 starting structures 13863.3 cpu seconds
This process generated 16 decoys from 16 attempts
======================================================
BOINC :: WS_max 4.09907e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


ID: 71888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JKitterman

Send message
Joined: 21 Oct 05
Posts: 11
Credit: 814,463
RAC: 0
Message 71889 - Posted: 26 Dec 2011, 1:35:40 UTC

It looks like they are failing validation, if my guess is correct. I didn't see any other issues and your output looks comparable to my successfully completed workunits. It may be a validation error or you can try resetting your project on this computer.
ID: 71889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JKitterman

Send message
Joined: 21 Oct 05
Posts: 11
Credit: 814,463
RAC: 0
Message 71890 - Posted: 26 Dec 2011, 1:55:56 UTC

I took a second look at your results. I find it odd that your results are spending more CPU time about 13,000 seconds compare to mine a little over 10,000 seconds. You claim a lot more credit of about 125 and I claim about 75. I would expect your computer to be faster than mine.
Is your BOINC setup completely stock or are you using customer parameters or something?
ID: 71890 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71892 - Posted: 26 Dec 2011, 14:52:23 UTC

The line in the output here:
# cpu_run_time_pref: 14400
just indicates that Rosetta preferences have been set to indicate that the preferred runtime is 14400 seconds (4 hrs. rather than the default of 3). So this is why the tasks are running a bit longer than 3 hours, and also why they tend to claim more credit for 4 hours of work rather than 3.

Several of the tasks have the following shown after the number of decoys summary:
BOINC :: WS_max 3.97197e+008

I'm not positive what this tells us. I don't see any other errors in the messages.

Some additional observations, the host has never been granted credit. The host has 8 CPUs and 8GB of memory. The host is running Win7 with BOINC 6.12.34

There have been several reports of problems with displaying the graphic. So I would suggest not using BOINC as you screensaver, and not displaying the graphic just to see if this makes any difference.
Rosetta Moderator: Mod.Sense
ID: 71892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 71893 - Posted: 26 Dec 2011, 16:01:08 UTC

I have detached and reatached to the project, run time set to 1 hour, screensaver is not active

The result is still the same. Outcome is client error

My second host in the meantime is generating credits.

Regards,
Rayburner
ID: 71893 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 71899 - Posted: 27 Dec 2011, 15:34:40 UTC

It looks like the missing credits were granted by the project (automatically??), but still all new returned results by this host are marked as outcome client error. As credits were granted I assume my returned results are valuable to the project.

So how are we going to proceed? I think I have checked everything on my side. Is it a problem on the server side? Does it make sense to delete this host on the server and let it create a new id by contacting the server again?

Regards,
Rayburner
ID: 71899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JKitterman

Send message
Joined: 21 Oct 05
Posts: 11
Credit: 814,463
RAC: 0
Message 71900 - Posted: 27 Dec 2011, 20:45:49 UTC - in response to Message 71899.  

It looks like you are now running the Beta Boinc client
<core_client_version>7.0.3</core_client_version>

In looking at some previous results on this bad host, none of them show a application version at the end of the Task Details online.



ID: 71900 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71901 - Posted: 27 Dec 2011, 21:10:17 UTC

Detaching has reloaded everything on your end, so I wouldn't do anything more there. Sorta sounds like the problem is on the server side in the validator.

Returned results are ALWAYS valuable to the project and that is part of why they wrote a script to grant credit is cases that BOINC defaults would not. If nothing else, the result is revealing a problem in the validator. But from what I could see in the output, it looked like your machine was crunching successful models as well, so those will definitely be useful.
Rosetta Moderator: Mod.Sense
ID: 71901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 71902 - Posted: 27 Dec 2011, 21:45:12 UTC - in response to Message 71900.  

It looks like you are now running the Beta Boinc client
<core_client_version>7.0.3</core_client_version>

In looking at some previous results on this bad host, none of them show a application version at the end of the Task Details online.





right. I installed the beta BOINC client hoping that it would help...

However the problem was the same with 6.12.34

like Mod.Sense has written looks like the validator seems to have a problem analyzing my returned result. I guess that is why no application version is displayed.

Regards,
Rayburner
ID: 71902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 71913 - Posted: 28 Dec 2011, 19:17:43 UTC

now this host has reached a daily quota of 8 because it always returns "bad" results.

Looks I have to take this host out of rosetta. It doesn't make sense to keep it attached as long this problem persists.

Regards,
Rayburner
ID: 71913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Client error with Rosetta Mini 3.19



©2024 University of Washington
https://www.bakerlab.org