Client Errors

Message boards : Number crunching : Client Errors

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 8 · Next

AuthorMessage
A.M.

Send message
Joined: 13 Jun 06
Posts: 12
Credit: 954,586
RAC: 0
Message 72319 - Posted: 16 Feb 2012, 6:23:48 UTC

https://boinc.bakerlab.org/rosetta/results.php?hostid=1518514

Brand new computer, fresh install of Win7, etc. No problems with Einstein or Collatz, but every single one of the Rosetta WUs this computer has been allowed to finish has been flagged as invalid. Why?
ID: 72319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72346 - Posted: 17 Feb 2012, 17:13:55 UTC

It looks like everything finished fine on your end (from the clean shutdowns in the stderr reports and the "Exit status: 0 (0x0)" line), but something went wrong in getting it back to the server. It's not necessarily a validation error, as you're getting "Outcome: Client error" rather than "Outcome: Validation error" or something like that. The curious thing is that the application version isn't being reported - this is something others have seen, so it doesn't look like you're the only one experiencing this particular issue.

We're looking into things to see if it's some intermittent error on our end, but in the meantime the recommendation would be to try detaching and reattaching the Rosetta@home project from your boinc client, to see if that might fix things.
ID: 72346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
A.M.

Send message
Joined: 13 Jun 06
Posts: 12
Credit: 954,586
RAC: 0
Message 72359 - Posted: 18 Feb 2012, 20:29:19 UTC

I have done as you suggested, with the same outcome as before.
https://boinc.bakerlab.org/rosetta/result.php?resultid=485001771
ID: 72359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Digital Savior

Send message
Joined: 16 Jul 06
Posts: 1
Credit: 191,042
RAC: 0
Message 72397 - Posted: 26 Feb 2012, 2:52:49 UTC

Yea, I had about 140 WUs that ended in client error. =/ Wish I noticed sooner...
ID: 72397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sky King

Send message
Joined: 28 Feb 12
Posts: 11
Credit: 15,912
RAC: 0
Message 72422 - Posted: 1 Mar 2012, 15:36:44 UTC

Is there any update to this? I too just recently switched my stock Win7 i7 over to Rosetta after years of being an F@H contributor, and without exception, every unit results in "client error."

If I'm just generating heat and not contributing to science, I need to switch to another project, so I am hoping to hear an update on this.
ID: 72422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72425 - Posted: 1 Mar 2012, 19:11:27 UTC - in response to Message 72422.  
Last modified: 11 Mar 2012, 20:16:48 UTC

Is there any update to this? I too just recently switched my stock Win7 i7 over to Rosetta after years of being an F@H contributor, and without exception, every unit results in "client error."

If I'm just generating heat and not contributing to science, I need to switch to another project, so I am hoping to hear an update on this.


Unfortunately, I don't have any progress to report. We're still not sure why these runs look like they're successfully completing, but then resulting in errors while reporting back to the server.

If you're up to it, something might reveal itself if people encountering these errors turned on the extra debugging information in the cc_config.xml file
The relevant portion to focus on is the post-result reporting. Log flags like file_xfer_debug, http_debug, http_xfer_debug, network_status_debug, proxy_debug are ones to try. If anything looks off/wonky in clients who have these issues, then that's a lead we can follow up on in debugging. (That's not to say that the error will necessarily show itself on the client end - but it may be worth a shot.)

Edit: Others on the forum have indicated that the issue may be GPU related. (Rosetta@home doesn't use GPUs, but that's not to say a GPU-related setting might be involved.) You can also try (temporarily) turning off GPU crunching, to see that fixes things - even if you turn GPUs back on for non-Rosetta projects, posting that the issue was fixed on your machine with a GPU setting change will help us in tracking down the issue.
ID: 72425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Markus Elfring

Send message
Joined: 10 Jun 06
Posts: 17
Credit: 3,610,273
RAC: 0
Message 72454 - Posted: 5 Mar 2012, 14:21:01 UTC - in response to Message 72422.  

[...], and without exception, every unit results in "client error."

Are you interested in any improvements for a topic like "Statistics for computation errors"?
ID: 72454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AlphaLaser

Send message
Joined: 19 Aug 06
Posts: 52
Credit: 3,327,939
RAC: 0
Message 72466 - Posted: 7 Mar 2012, 21:38:12 UTC
Last modified: 7 Mar 2012, 21:39:11 UTC

I have been experiencing the same problem with this host. I was able to complete work in the past but now nothing is validating. The BOINC version is 6.10.58.

I will try out the log flags when I clear out work from other projects, maybe I can get to it by this weekend.
ID: 72466 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wbblakemore

Send message
Joined: 18 Dec 07
Posts: 33
Credit: 4,181
RAC: 0
Message 72488 - Posted: 11 Mar 2012, 2:24:20 UTC

I've reported this in other threads, so I apologize in advance if anyone is offended by my multiple posts ...

I'm having the same problem as others in this thread have reported. It doesn't look like a validation error, since most of my stderr messages are similar to this one:

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 55 starting structures 10824.1 cpu seconds
This process generated 55 decoys from 55 attempts
======================================================
BOINC :: WS_max 3.04529e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly
ID: 72488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
finchna

Send message
Joined: 21 Oct 10
Posts: 2
Credit: 1,078,913
RAC: 0
Message 72524 - Posted: 15 Mar 2012, 21:31:11 UTC

i'm also experiencing this problem -- going back at least 100 tasks. 2010 PowerMac -- not new but not old and seti and milkyway are running fine.
ID: 72524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72525 - Posted: 15 Mar 2012, 22:00:04 UTC - in response to Message 72524.  

i'm also experiencing this problem -- going back at least 100 tasks. 2010 PowerMac -- not new but not old and seti and milkyway are running fine.


If you're referring to this computer, it looks to be a different issue (not the same as above).

From the stderr reports for the task, you're getting a "Process creation failed: errno=13" according to this post, errno=13 is a permissions error. Others have also experienced this with the update to 3.24. Best I can tell, this is an issue with Boinc 7.0, rather than with Rosetta@home itself. (Boinc 7 is currently development code, and not recommended for general use - at least not with Rosetta@home)
ID: 72525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
finchna

Send message
Joined: 21 Oct 10
Posts: 2
Credit: 1,078,913
RAC: 0
Message 72526 - Posted: 15 Mar 2012, 23:27:36 UTC - in response to Message 72525.  

If you're referring to this computer, it looks to be a different issue (not the same as above).

Best I can tell, this is an issue with Boinc 7.0, rather than with Rosetta@home itself. (Boinc 7 is currently development code, and not recommended for general use - at least not with Rosetta@home)


That is the computer and sorry to hear that 7 has a problem as the 6.12.35 client flat out fails on that machine and the BOINC folks suggested trying the 7 client which now makes some things work but, unfortunately, not Rosetta@home.
ID: 72526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72529 - Posted: 16 Mar 2012, 16:52:59 UTC - in response to Message 72526.  

That is the computer and sorry to hear that 7 has a problem as the 6.12.35 client flat out fails on that machine and the BOINC folks suggested trying the 7 client which now makes some things work but, unfortunately, not Rosetta@home.


You can try manually changing the permissions on the projects/boinc.bakerlab.org_rosetta/minirosetta_3.24_i686-apple-darwin (which looks like to be file with the permissions error in your case), to see if that helps. pvh in the other thread said that it worked for him.

The other suggestion is to talk with the BOINC 7 folks, to see if you can troubleshoot why it gave this permission error in the first place.

Good Luck
ID: 72529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AlphaLaser

Send message
Joined: 19 Aug 06
Posts: 52
Credit: 3,327,939
RAC: 0
Message 72531 - Posted: 16 Mar 2012, 20:38:48 UTC
Last modified: 16 Mar 2012, 20:43:29 UTC

Hi. I enabled suggested log messages, I'm not sure which part is helpful since I get a "[network_status_debug] status: online" or "status: don't need connection" message every second but here is where I start to upload the task id=491825324 which just gave "Client error"


16-Mar-2012 16:23:39 [rosetta@home] Computation for task T0575_boinc_rosetta_cm_abrelax_cmiles_SAVE_ALL_OUT_44670_137_0 finished
16-Mar-2012 16:23:39 [rosetta@home] Starting if3dimer_fold_and_dock_if3design12_SAVE_ALL_OUT_44597_8465_0
16-Mar-2012 16:23:51 [rosetta@home] Starting task if3dimer_fold_and_dock_if3design12_SAVE_ALL_OUT_44597_8465_0 using minirosetta version 324
16-Mar-2012 16:23:52 [---] [network_status_debug] woke up after 14.552832 seconds
16-Mar-2012 16:23:52 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:23:53 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:23:53 [rosetta@home] [fxd] starting upload, upload_offset -1
16-Mar-2012 16:23:53 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle 'C:Program Files (x86)BOINCca-bundle.crt'
16-Mar-2012 16:23:53 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
16-Mar-2012 16:23:53 [---] [proxy_debug] HTTP_OP::no_proxy_for_url(): http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler
16-Mar-2012 16:23:53 [---] [proxy_debug] returning false
16-Mar-2012 16:23:53 [rosetta@home] Started upload of T0575_boinc_rosetta_cm_abrelax_cmiles_SAVE_ALL_OUT_44670_137_0_0
16-Mar-2012 16:23:53 [rosetta@home] [file_xfer_debug] URL: http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler
16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Info: timeout on name lookup is not supported
16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Info: About to connect() to srv4.bakerlab.org port 80 (#0)
16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Info: Trying 128.95.160.145...
16-Mar-2012 16:23:54 [---] [network_status_debug] status: online
16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Info: Connected to srv4.bakerlab.org (128.95.160.145) port 80 (#0)
16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: POST /rosetta_cgi/file_upload_handler HTTP/1.1

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: User-Agent: BOINC client (windows_x86_64 6.10.58)

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: Host: srv4.bakerlab.org

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: Accept: */*

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: Accept-Encoding: deflate, gzip

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: Content-Type: application/x-www-form-urlencoded

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server: Content-Length: 318

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Sent header to server:

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server: HTTP/1.1 200 OK

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server: Date: Fri, 16 Mar 2012 20:24:08 GMT

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server: Server: Apache/2.2.3 (Red Hat)

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server: Connection: close

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server: Transfer-Encoding: chunked

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server: Content-Type: text/plain; charset=UTF-8

16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Received header from server:

16-Mar-2012 16:23:54 [---] [http_xfer_debug] [ID#201] HTTP: wrote 93 bytes
16-Mar-2012 16:23:54 [---] [http_debug] [ID#201] Info: Closing connection #0
16-Mar-2012 16:23:55 [---] [network_status_debug] status: online
16-Mar-2012 16:23:55 [rosetta@home] [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
16-Mar-2012 16:23:55 [rosetta@home] [file_xfer_debug] parsing upload response: <data_server_reply>
<status>0</status>
<file_size>0</file_size>
</data_server_reply>
16-Mar-2012 16:23:55 [rosetta@home] [file_xfer_debug] parsing status: 0
16-Mar-2012 16:23:55 [rosetta@home] [fxd] starting upload, upload_offset 0
16-Mar-2012 16:23:55 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
16-Mar-2012 16:23:55 [---] [proxy_debug] HTTP_OP::no_proxy_for_url(): http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler
16-Mar-2012 16:23:55 [---] [proxy_debug] returning false
16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Info: timeout on name lookup is not supported
16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Info: About to connect() to srv4.bakerlab.org port 80 (#0)
16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Info: Trying 128.95.160.145...
16-Mar-2012 16:23:56 [---] [network_status_debug] status: online
16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Info: Connected to srv4.bakerlab.org (128.95.160.145) port 80 (#0)
16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: POST /rosetta_cgi/file_upload_handler HTTP/1.1

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: User-Agent: BOINC client (windows_x86_64 6.10.58)

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: Host: srv4.bakerlab.org

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: Accept: */*

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: Accept-Encoding: deflate, gzip

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: Content-Type: application/x-www-form-urlencoded

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: Content-Length: 8955

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server: Expect: 100-continue

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Sent header to server:

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: HTTP/1.1 100 Continue

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: HTTP/1.1 200 OK

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: Date: Fri, 16 Mar 2012 20:24:10 GMT

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: Server: Apache/2.2.3 (Red Hat)

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: Connection: close

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: Transfer-Encoding: chunked

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server: Content-Type: text/plain; charset=UTF-8

16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Received header from server:

16-Mar-2012 16:23:56 [---] [http_xfer_debug] [ID#201] HTTP: wrote 64 bytes
16-Mar-2012 16:23:56 [---] [http_debug] [ID#201] Info: Closing connection #0
16-Mar-2012 16:23:57 [---] [network_status_debug] status: online
16-Mar-2012 16:23:57 [rosetta@home] [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
16-Mar-2012 16:23:57 [rosetta@home] [file_xfer_debug] parsing upload response: <data_server_reply>
<status>0</status>
</data_server_reply>
16-Mar-2012 16:23:57 [rosetta@home] [file_xfer_debug] parsing status: 0
16-Mar-2012 16:23:57 [rosetta@home] [file_xfer_debug] file transfer status 0
16-Mar-2012 16:23:57 [rosetta@home] Finished upload of T0575_boinc_rosetta_cm_abrelax_cmiles_SAVE_ALL_OUT_44670_137_0_0
16-Mar-2012 16:23:57 [rosetta@home] [file_xfer_debug] Throughput 6782 bytes/sec


Here is the second example for task id=491825190.


16-Mar-2012 16:25:20 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:25:21 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:25:22 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:25:25 [rosetta@home] Computation for task if3dimer_design9monomer_abinitio_SAVE_ALL_OUT_44611_2681_0 finished
16-Mar-2012 16:25:25 [rosetta@home] Starting T0600_boinc_rosetta_cm_abrelax_cmiles_SAVE_ALL_OUT_44692_155_0
16-Mar-2012 16:25:28 [rosetta@home] Starting task T0600_boinc_rosetta_cm_abrelax_cmiles_SAVE_ALL_OUT_44692_155_0 using minirosetta version 324
16-Mar-2012 16:25:28 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:25:29 [---] [network_status_debug] status: don't need connection
16-Mar-2012 16:25:29 [rosetta@home] [fxd] starting upload, upload_offset -1
16-Mar-2012 16:25:29 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle 'C:Program Files (x86)BOINCca-bundle.crt'
16-Mar-2012 16:25:29 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
16-Mar-2012 16:25:29 [---] [proxy_debug] HTTP_OP::no_proxy_for_url(): http://srv3.bakerlab.org/rosetta_cgi/file_upload_handler
16-Mar-2012 16:25:29 [---] [proxy_debug] returning false
16-Mar-2012 16:25:29 [rosetta@home] Started upload of if3dimer_design9monomer_abinitio_SAVE_ALL_OUT_44611_2681_0_0
16-Mar-2012 16:25:29 [rosetta@home] [file_xfer_debug] URL: http://srv3.bakerlab.org/rosetta_cgi/file_upload_handler
16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Info: timeout on name lookup is not supported
16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Info: About to connect() to srv3.bakerlab.org port 80 (#0)
16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Info: Trying 128.95.160.144...
16-Mar-2012 16:25:30 [---] [network_status_debug] status: online
16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Info: Connected to srv3.bakerlab.org (128.95.160.144) port 80 (#0)
16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: POST /rosetta_cgi/file_upload_handler HTTP/1.1

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: User-Agent: BOINC client (windows_x86_64 6.10.58)

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: Host: srv3.bakerlab.org

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: Accept: */*

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: Accept-Encoding: deflate, gzip

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: Content-Type: application/x-www-form-urlencoded

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server: Content-Length: 314

16-Mar-2012 16:25:30 [---] [http_debug] [ID#202] Sent header to server:

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server: HTTP/1.1 200 OK

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server: Date: Fri, 16 Mar 2012 20:25:44 GMT

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server: Server: Apache/2.2.3 (Red Hat)

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server: Connection: close

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server: Transfer-Encoding: chunked

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server: Content-Type: text/plain; charset=UTF-8

16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Received header from server:

16-Mar-2012 16:25:31 [---] [http_xfer_debug] [ID#202] HTTP: wrote 93 bytes
16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Info: Expire cleared
16-Mar-2012 16:25:31 [---] [http_debug] [ID#202] Info: Closing connection #0
16-Mar-2012 16:25:31 [---] [network_status_debug] status: online
16-Mar-2012 16:25:31 [rosetta@home] [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
16-Mar-2012 16:25:31 [rosetta@home] [file_xfer_debug] parsing upload response: <data_server_reply>
<status>0</status>
<file_size>0</file_size>
</data_server_reply>
16-Mar-2012 16:25:31 [rosetta@home] [file_xfer_debug] parsing status: 0
16-Mar-2012 16:25:31 [rosetta@home] [fxd] starting upload, upload_offset 0
16-Mar-2012 16:25:31 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
16-Mar-2012 16:25:31 [---] [proxy_debug] HTTP_OP::no_proxy_for_url(): http://srv3.bakerlab.org/rosetta_cgi/file_upload_handler
16-Mar-2012 16:25:31 [---] [proxy_debug] returning false
16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Info: timeout on name lookup is not supported
16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Info: About to connect() to srv3.bakerlab.org port 80 (#0)
16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Info: Trying 128.95.160.144...
16-Mar-2012 16:25:32 [---] [network_status_debug] status: online
16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Info: Connected to srv3.bakerlab.org (128.95.160.144) port 80 (#0)
16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: POST /rosetta_cgi/file_upload_handler HTTP/1.1

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: User-Agent: BOINC client (windows_x86_64 6.10.58)

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: Host: srv3.bakerlab.org

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: Accept: */*

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: Accept-Encoding: deflate, gzip

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: Content-Type: application/x-www-form-urlencoded

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: Content-Length: 14156

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server: Expect: 100-continue

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Sent header to server:

16-Mar-2012 16:25:32 [---] [http_debug] [ID#202] Received header from server: HTTP/1.1 100 Continue

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server: HTTP/1.1 200 OK

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server: Date: Fri, 16 Mar 2012 20:25:46 GMT

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server: Server: Apache/2.2.3 (Red Hat)

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server: Connection: close

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server: Transfer-Encoding: chunked

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server: Content-Type: text/plain; charset=UTF-8

16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Received header from server:

16-Mar-2012 16:25:33 [---] [http_xfer_debug] [ID#202] HTTP: wrote 64 bytes
16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Info: Expire cleared
16-Mar-2012 16:25:33 [---] [http_debug] [ID#202] Info: Closing connection #0
16-Mar-2012 16:25:33 [---] [network_status_debug] status: online
16-Mar-2012 16:25:33 [rosetta@home] [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
16-Mar-2012 16:25:33 [rosetta@home] [file_xfer_debug] parsing upload response: <data_server_reply>
<status>0</status>
</data_server_reply>
16-Mar-2012 16:25:33 [rosetta@home] [file_xfer_debug] parsing status: 0
16-Mar-2012 16:25:33 [rosetta@home] [file_xfer_debug] file transfer status 0
16-Mar-2012 16:25:33 [rosetta@home] Finished upload of if3dimer_design9monomer_abinitio_SAVE_ALL_OUT_44611_2681_0_0
16-Mar-2012 16:25:33 [rosetta@home] [file_xfer_debug] Throughput 11894 bytes/sec


And then I reported the tasks.


16-Mar-2012 16:33:27 [rosetta@home] Sending scheduler request: Requested by user.
16-Mar-2012 16:33:27 [rosetta@home] Reporting 2 completed tasks, not requesting new tasks
16-Mar-2012 16:33:27 [---] [http_debug] HTTP_OP::init_post(): http://srv4.bakerlab.org/rosetta_cgi/cgi
16-Mar-2012 16:33:27 [---] [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
16-Mar-2012 16:33:27 [---] [proxy_debug] HTTP_OP::no_proxy_for_url(): http://srv4.bakerlab.org/rosetta_cgi/cgi
16-Mar-2012 16:33:27 [---] [proxy_debug] returning false
16-Mar-2012 16:33:27 [---] [network_status_debug] woke up after 17.175983 seconds
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Info: timeout on name lookup is not supported
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Info: About to connect() to srv4.bakerlab.org port 80 (#0)
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Info: Trying 128.95.160.145...
16-Mar-2012 16:33:27 [---] [network_status_debug] status: online
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Info: Connected to srv4.bakerlab.org (128.95.160.145) port 80 (#0)
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: POST /rosetta_cgi/cgi HTTP/1.1

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 6.10.58)

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: Host: srv4.bakerlab.org

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: Accept: */*

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: Content-Length: 17602

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server: Expect: 100-continue

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Sent header to server:

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: HTTP/1.1 100 Continue

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: HTTP/1.1 200 OK

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: Date: Fri, 16 Mar 2012 20:33:41 GMT

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: Server: Apache/2.2.3 (Red Hat)

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: Connection: close

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: Transfer-Encoding: chunked

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server: Content-Type: text/xml

16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Received header from server:

16-Mar-2012 16:33:27 [---] [http_xfer_debug] [ID#1] HTTP: wrote 1216 bytes
16-Mar-2012 16:33:27 [---] [http_xfer_debug] [ID#1] HTTP: wrote 1813 bytes
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Info: Expire cleared
16-Mar-2012 16:33:27 [---] [http_debug] [ID#1] Info: Closing connection #0
16-Mar-2012 16:33:28 [rosetta@home] Scheduler request completed
16-Mar-2012 16:33:28 [---] [network_status_debug] status: online
16-Mar-2012 16:33:29 [---] [network_status_debug] status: online
16-Mar-2012 16:33:30 [---] [network_status_debug] status: online


I am still saving logs for a few more tasks still in progress.
ID: 72531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile In Memory of Kimsey M Fowler Sr

Send message
Joined: 10 Mar 12
Posts: 26
Credit: 39,033,222
RAC: 0
Message 72543 - Posted: 18 Mar 2012, 17:05:28 UTC
Last modified: 18 Mar 2012, 17:15:35 UTC

I've been experiencing this problem for a week now on a new machine. I'm running two nearly identical machines: the new one is experiencing this problem and the older one is not. I want to document the similarities and differences between these machines so that others might find something in common that may yield a good clue:

Old Machine (no problem):
i7-3930K processor, family 6, model 45, stepping 6
Win7 Home Premium x64, SP1
Solid state hard drive
Motherboard ASUS X-79 Sabertooth TUF with CPU overclocked to 4.7 GHz
CPU running Rosetta@Home full time
2x EVGA 560Ti GPU's running Folding@Home full time

New Machine (problem):
i7-3930K processor, family 6, model 45, stepping 7 <==delta
Win7 Home Premium x64, SP1
Solid state hard drive
Motherboard ASUS X-79 Sabertooth TUF with CPU overclocked to 4.7 GHz
CPU running Rosetta@Home for the failing 8 WU's/day, remaining time to F@H.
2x EVGA 580 GPU's running Folding@Home full time <==delta

Things I have tried incrementally:
1) setting the GPU Activity button to "Suspend GPU"
2) uninstalling the F@H SMP core to get same s/w configuration as old machine
3) uninstalling and reinstalling BOINC and Rosetta

Planned test for Monday night:
1) completely shut down F@H during the download, execution, and upload of next 8 R@H WU's.

Please respond if you see any commonality.

Note: All GPU's are using the same driver version 285.62; BOINC is version 6.12.34(x64).
ID: 72543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 72544 - Posted: 18 Mar 2012, 21:01:11 UTC - in response to Message 72543.  

I've been experiencing this problem for a week now on a new machine. I'm running two nearly identical machines: the new one is experiencing this problem and the older one is not. I want to document the similarities and differences between these machines so that others might find something in common that may yield a good clue:

Old Machine (no problem):
i7-3930K processor, family 6, model 45, stepping 6
Win7 Home Premium x64, SP1
Solid state hard drive
Motherboard ASUS X-79 Sabertooth TUF with CPU overclocked to 4.7 GHz
CPU running Rosetta@Home full time
2x EVGA 560Ti GPU's running Folding@Home full time

New Machine (problem):
i7-3930K processor, family 6, model 45, stepping 7 <==delta
Win7 Home Premium x64, SP1
Solid state hard drive
Motherboard ASUS X-79 Sabertooth TUF with CPU overclocked to 4.7 GHz
CPU running Rosetta@Home for the failing 8 WU's/day, remaining time to F@H.
2x EVGA 580 GPU's running Folding@Home full time <==delta

Things I have tried incrementally:
1) setting the GPU Activity button to "Suspend GPU"
2) uninstalling the F@H SMP core to get same s/w configuration as old machine
3) uninstalling and reinstalling BOINC and Rosetta

Planned test for Monday night:
1) completely shut down F@H during the download, execution, and upload of next 8 R@H WU's.

Please respond if you see any commonality.

Note: All GPU's are using the same driver version 285.62; BOINC is version 6.12.34(x64).


Hi.

First thing to do is put it back to stock speed and see if you still get errors, if you haven't done that already.

Not all rigs will run the same science stable.

my 2c worth.



ID: 72544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72548 - Posted: 18 Mar 2012, 23:14:27 UTC

AlphaLaser - Thanks for posting the log. I can't see anything obviously wrong from the log, but hopefully it will help us rule out what it isn't.

In Memory of Kimsey M Fowler Sr - Thanks for posting the system delta, and a double thanks for you efforts in troubleshooting.

I don't think it's been mentioned so far, but from what I've seen so far, the issue looks to be confined to Windows 7 machines (most commonly SP1 x64), although I can't say if it's due to the operating system, or rather the type of machines which typically run Win7.
ID: 72548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile In Memory of Kimsey M Fowler Sr

Send message
Joined: 10 Mar 12
Posts: 26
Credit: 39,033,222
RAC: 0
Message 72559 - Posted: 20 Mar 2012, 19:36:16 UTC

This is a report on my troubleshooting activities. Last night on the problem machine I ran 8 Rosetta WU's with the following concurrent changes to the system:
1) Folding@Home was stopped on CPU and GPU's,
2) All applications except BOINC/Rosetta were terminated,
3) All nonessential processes were terminated with Task Manager,
4) CPU was not overclocked,
5) screen saver and desktop image were not used, only blue background used.

The results were the same. All WU's failed due to client error, invalid, and application version not reported (showing only three dashes).

A.M. reported via e-mail using an EVGA 550Ti GPU, previously with driver 285.62 and now with 295.73. It would be useful to hear from William Blakemore, Alpha Laser, Sky King, and Digital Savior if they are running EVGA's, how many, what model, and if they have been running Folding@Home. Here is a summary of the machines from this thread that appear to be dealing with the same problem:

a) my problem machine: i7-3930K, 3.2GHz, Family 6, model 45, stepping 7
b) William Blakemore: i7-2700K, 3.5GHz, Family 6, model 42, stepping 7
c) Alpha Laser: i7-Q740 CPU, 1.7GHz, Family 6, model 30, stepping 5
d) Sky King: i7-920 CPU, 2.7GHz, Family 6, model 26, stepping 4
e) Digital Savior: i7-2600K, 3.4GHz, Family 6, model 42, stepping 7
f) A.M.: i7-2600K, 3.4GHz, Family 6, model 42, stepping 7

I think all of the stepping 7 processors are the latest CPU version. Suspicious, but I found other i7-3930K's stepping 7 without our issue (computer ID's 1520085 & 1524601).

I suspect there are plenty more machines out there with this problem, but either the users haven't noticed it yet, or they haven't bothered to report it.

For tonight's continued troubleshooting I'm considering uninstalling & physically removing the EVGA GPU's and installing a low end GPU of another type. Other suggestions/approaches would be appreciated if anyone has any ideas.
ID: 72559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AlphaLaser

Send message
Joined: 19 Aug 06
Posts: 52
Credit: 3,327,939
RAC: 0
Message 72560 - Posted: 20 Mar 2012, 20:32:39 UTC
Last modified: 20 Mar 2012, 20:34:26 UTC

Hi, my system is a Dell XPS laptop, the specs are:

CPU: Core i7 740QM 1.73GHz
GPU: Nvidia Geforce 435M (2GB DDR3 Video RAM)
Driver: 285.62
RAM: 6GB DDR3-1333
OS: Windows 7 Home Premium SP1 64-bit
HDD: 640 GB 7200 RPM

I don't run folding but I do run GPUGrid on my GPU.
ID: 72560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 72562 - Posted: 20 Mar 2012, 21:39:28 UTC
Last modified: 20 Mar 2012, 21:45:37 UTC

Hi!

You can add my system to the list:

CPU:i7-2600K CPU @ 3.40GHz Family 6 Model 42 Stepping 7
RAM: 8GB PC3-12800
GPU: NVIDIA GeForce GTX 570 (2560MB)
Graphics driver: 296.10
OS: MS Windows 7 Professional x64 Edition, SP 1

I run GPUGrid and PrimeGrid on the GPU.

EDIT: Are all your host running NVIDIA GPUs?
I have another i7 960 system with an AMD RADEON 6970 which is crunching
Rosetta just fine.

Regards,
Rayburner
ID: 72562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Number crunching : Client Errors



©2024 University of Washington
https://www.bakerlab.org