Client errors

Message boards : Number crunching : Client errors

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Menno

Send message
Joined: 13 Jan 13
Posts: 2
Credit: 4,610
RAC: 0
Message 74930 - Posted: 18 Jan 2013, 23:40:12 UTC

Hello,

All my work keeps ending up getting reported as a client error.
I'm new and searched the board for solutions and found the similar cases. So I made sure that the work stays in the memory when shut down or paused and checked my NVIDIA driver (306.97) but my latest work still came up as having a client error.
The results can be found here: https://boinc.bakerlab.org/rosetta/results.php?userid=467186
I'm running the latest BOINC client. My laptop is an i7 3010QM with 16GB memory and have a 2GB DDR5 GT650M NVIDIA GPU. I'm running on windows 7 home premium X64 SP1
Is there anything I can do now or do I have to suspend new work until the issue is resolved?
ID: 74930 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 74933 - Posted: 19 Jan 2013, 12:32:11 UTC - in response to Message 74930.  
Last modified: 19 Jan 2013, 12:33:36 UTC

You have already done everything others have done to get their to work. It might just be time to find another project, there are ALOT of them!!
ID: 74933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,319,717
RAC: 11,503
Message 74934 - Posted: 19 Jan 2013, 13:11:17 UTC

Hmm. Errors looks like same "nvidia GPU bug" discussed in other topics.
But before R@H work with 306.97 drivers OK.

So the driver seems is only one part of the NV GPU problem...
Perhaps the problem is not even in the driver itself. Such as people who change 310.xx to the older version helped to stop this error may be due to the fact that drivers reinstall just reset some of graphics settings. Or change(altered) resources used by card/GPU driver.

So my suggestion - try reinstall drivers anyway (still 306.xx). If it not help then it is a time to choose another dc project. Because apparently this error does not will be fixed in any near future...
ID: 74934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Menno

Send message
Joined: 13 Jan 13
Posts: 2
Credit: 4,610
RAC: 0
Message 74944 - Posted: 21 Jan 2013, 20:57:14 UTC

Thanks for the help. It did not seem to work however. It is unfortunate because I really liked the project.
ID: 74944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74951 - Posted: 22 Jan 2013, 14:49:14 UTC

It's a frustrating problem. I have a feeling that the project is loosing quite a few participants because of this. I tried to break my installation by installing the latest nVidia drivers (I normally do anyway, I'm somewhat of a gamer) but no matter what I do I can't recreate the problem that others are having. Rosetta is the only project I run.
ID: 74951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 74956 - Posted: 23 Jan 2013, 12:42:18 UTC - in response to Message 74951.  

It's a frustrating problem. I have a feeling that the project is loosing quite a few participants because of this. I tried to break my installation by installing the latest nVidia drivers (I normally do anyway, I'm somewhat of a gamer) but no matter what I do I can't recreate the problem that others are having. Rosetta is the only project I run.


I wonder if it is because you are using Linux?
ID: 74956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74957 - Posted: 23 Jan 2013, 13:32:52 UTC - in response to Message 74956.  

It's a frustrating problem. I have a feeling that the project is loosing quite a few participants because of this. I tried to break my installation by installing the latest nVidia drivers (I normally do anyway, I'm somewhat of a gamer) but no matter what I do I can't recreate the problem that others are having. Rosetta is the only project I run.


I wonder if it is because you are using Linux?


I have a couple of Windows machines as well. The i7 box is dual boot, I generally run BOINC on linux though lately. I never had any problems when running 6.x or 7.x on Windows 7 x86_64. I can't find a pattern in this at all.
ID: 74957 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74958 - Posted: 23 Jan 2013, 18:18:37 UTC

Can any of you point me to specific results/computers?
ID: 74958 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trigggl
Avatar

Send message
Joined: 20 Apr 09
Posts: 4
Credit: 102,177
RAC: 0
Message 74961 - Posted: 24 Jan 2013, 12:11:35 UTC

I had one Linux pc that was working. However, last Saturday I ran updates and now I'm getting nothing but errors even though the tasks appear to be completing cleanly.

I'm running:
boinc 7.0.29
nvidia driver 304.64
linux kernel 3.6.11-gentoo

Pre-update task:
https://boinc.bakerlab.org/rosetta/result.php?resultid=556137155

Post-update task:
https://boinc.bakerlab.org/rosetta/result.php?resultid=557936373

I notice the failing task doesn't list the app version.
ID: 74961 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74967 - Posted: 24 Jan 2013, 18:17:04 UTC - in response to Message 74961.  

I had one Linux pc that was working. However, last Saturday I ran updates and now I'm getting nothing but errors even though the tasks appear to be completing cleanly.

I'm running:
boinc 7.0.29
nvidia driver 304.64
linux kernel 3.6.11-gentoo

Pre-update task:
https://boinc.bakerlab.org/rosetta/result.php?resultid=556137155

Post-update task:
https://boinc.bakerlab.org/rosetta/result.php?resultid=557936373

I notice the failing task doesn't list the app version.



Very interesting. What specific updates did you run?
ID: 74967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JAMES DORISIO

Send message
Joined: 25 Dec 05
Posts: 15
Credit: 187,864,195
RAC: 88,583
Message 74970 - Posted: 24 Jan 2013, 21:41:21 UTC

I just set this computer to allow new tasks, run time preference 6 hours. I believe from reading other threads that it has this problem and it is the nvidia driver that causes the problem.

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1485068

This computer was running Ubuntu 10.04 amd64 nvidia driver 304.** Boinc 6.10.17 and was ok, I upgraded it to Ubuntu linux 12.04 amd64 ,nvidia driver 310.14, Boinc 7.0.27. Since then all work units complete ok but show client error. Before the change all work units were ok and valid, but that was a month ago and they are removed from history.

It was and still is running Gpu work from GPUgrid and WCG on a GTS450. Just an upgrade to Ubuntu 12.04 along with the new versions of Boinc and nvidia drivers that came with it. It is successfully completing WCG human proteome folding phase 2 work units from WCG which uses Rosetta software.

I will let this machine run for a few days and add another one with this problem to see if this can help.

Thanks Jim

ID: 74970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trigggl
Avatar

Send message
Joined: 20 Apr 09
Posts: 4
Credit: 102,177
RAC: 0
Message 74971 - Posted: 24 Jan 2013, 21:59:17 UTC - in response to Message 74967.  

...
nvidia driver 304.64
linux kernel 3.6.11-gentoo
...


Very interesting. What specific updates did you run?

The nvidia driver and linux kernel were two of them. Still running the same boinc version.
ID: 74971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JAMES DORISIO

Send message
Joined: 25 Dec 05
Posts: 15
Credit: 187,864,195
RAC: 88,583
Message 74976 - Posted: 25 Jan 2013, 12:52:19 UTC

This computer also seems to be affected by this problem. Intel I7-3770, Ubuntu linux 12.04 amd64 ,nvidia driver 310.14, Boinc 7.0.27, all downloaded from the Ubuntu repository. I have the run time preference set at 6 hours.

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1579123

This machine was built in Nov of 2012 when built, using only intel built in graphics it would complete all work units valid. after installing a gtx650ti graphic card and nvidia driver 310.14 all work units show client error. I have tried not using the gpu for boinc and there was no difference all client error. I did not uninstall the nvidia driver, just set no new work from the gpu projects.

Hope this helps
Jim
ID: 74976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ETQuestor

Send message
Joined: 13 Nov 12
Posts: 8
Credit: 957,206
RAC: 0
Message 74980 - Posted: 25 Jan 2013, 15:38:03 UTC - in response to Message 74958.  

Can any of you point me to specific results/computers?


I recently upgraded my NVIDIA driver (Linux x86_64 running Fedora 18) from 310.19 to 310.32 and suddenly started getting the client errors (computation finished normally, I get an invalid result). I am not using the GPU for Rosetta (used for GPUGrid). I will try going back to 310.19 and see what happens.

Sample hosts:

https://boinc.bakerlab.org/rosetta/results.php?hostid=1578095
https://boinc.bakerlab.org/rosetta/results.php?hostid=1578078

Sample WUs:

https://boinc.bakerlab.org/rosetta/result.php?resultid=557936169
https://boinc.bakerlab.org/rosetta/result.php?resultid=557755898


-Andrew
ID: 74980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74982 - Posted: 25 Jan 2013, 20:53:35 UTC - in response to Message 74980.  
Last modified: 25 Jan 2013, 20:54:12 UTC

Can any of you point me to specific results/computers?


I recently upgraded my NVIDIA driver (Linux x86_64 running Fedora 18) from 310.19 to 310.32 and suddenly started getting the client errors (computation finished normally, I get an invalid result). I am not using the GPU for Rosetta (used for GPUGrid). I will try going back to 310.19 and see what happens.

Sample hosts:

https://boinc.bakerlab.org/rosetta/results.php?hostid=1578095
https://boinc.bakerlab.org/rosetta/results.php?hostid=1578078

Sample WUs:

https://boinc.bakerlab.org/rosetta/result.php?resultid=557936169
https://boinc.bakerlab.org/rosetta/result.php?resultid=557755898


-Andrew


Can anyone who is experiencing this issue do what David Anderson recommends (see below)?

It is odd that the boinc client is returning a client error status but our rosetta application return status is okay and the stderr seems normal. These jobs are being marked as invalid because your clients are returning a client error state.

thanks! with your help, I think we can track down this bug.

David K


Email response from David Anderson:

Not sure.
You can ask the users to set in their cc_config.xml,
and send you their message log when the behavior occurs.
-- David


On 25-Jan-2013 12:30 PM, dekim wrote:
Hi David,

Hope all is well.

I had a quick question. What would cause the boinc client to return a client
state of 3 (error) from a job that seemed to have completed successfully
(exit status 0 with normal stderr and stdout)?

I'm trying to debug a weird error some users are reporting and some suggest
it's related to having Nvidia gpu's even though we don't have a gpu app.

thanks,

David Kim
ID: 74982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 74985 - Posted: 26 Jan 2013, 15:46:02 UTC

HERE is another error message:

BOINC:: Error reading and gzipping output datafile: default.out

NO credits for the unit!!!

https://boinc.bakerlab.org/rosetta/result.php?resultid=558345592

Doesn't appear to be MY pc's problem as gzip is a part of the Rosetta program itself!!
ID: 74985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 74987 - Posted: 26 Jan 2013, 19:27:52 UTC

David

here is the information you requested

However the exist code 0 not 3

Work unit is https://boinc.bakerlab.org/rosetta/result.php?resultid=558582819

The log from BOINC:

26.01.2013 20:11:05 | rosetta@home | [task] Process for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 exited, exit code 0, task state 1
26.01.2013 20:11:05 | rosetta@home | [task] task_state=EXITED for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from handle_exited_app
26.01.2013 20:11:05 | rosetta@home | Computation for task P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 finished
26.01.2013 20:11:05 | rosetta@home | [task] result state=FILES_UPLOADING for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from CS::app_finished
26.01.2013 20:11:07 | rosetta@home | Started upload of P2_1_s4_f5_abinitio_design_y024_005_72943_231_0_0
26.01.2013 20:11:13 | rosetta@home | Finished upload of P2_1_s4_f5_abinitio_design_y024_005_72943_231_0_0
26.01.2013 20:11:13 | rosetta@home | [task] result state=FILES_UPLOADED for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from CS::update_results
26.01.2013 20:13:58 | rosetta@home | Sending scheduler request: To report completed tasks.
26.01.2013 20:13:58 | rosetta@home | Reporting 1 completed tasks
26.01.2013 20:13:58 | rosetta@home | Not requesting tasks: "no new tasks" requested via Manager
26.01.2013 20:14:01 | rosetta@home | Scheduler request completed


Regards
Rayburner
ID: 74987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 74988 - Posted: 26 Jan 2013, 19:35:08 UTC

David

here is the information you requested

However the exist code 0 not 3

Work unit is https://boinc.bakerlab.org/rosetta/result.php?resultid=558582819

The log from BOINC:

26.01.2013 20:11:05 | rosetta@home | [task] Process for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 exited, exit code 0, task state 1
26.01.2013 20:11:05 | rosetta@home | [task] task_state=EXITED for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from handle_exited_app
26.01.2013 20:11:05 | rosetta@home | Computation for task P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 finished
26.01.2013 20:11:05 | rosetta@home | [task] result state=FILES_UPLOADING for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from CS::app_finished
26.01.2013 20:11:07 | rosetta@home | Started upload of P2_1_s4_f5_abinitio_design_y024_005_72943_231_0_0
26.01.2013 20:11:13 | rosetta@home | Finished upload of P2_1_s4_f5_abinitio_design_y024_005_72943_231_0_0
26.01.2013 20:11:13 | rosetta@home | [task] result state=FILES_UPLOADED for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from CS::update_results
26.01.2013 20:13:58 | rosetta@home | Sending scheduler request: To report completed tasks.
26.01.2013 20:13:58 | rosetta@home | Reporting 1 completed tasks
26.01.2013 20:13:58 | rosetta@home | Not requesting tasks: "no new tasks" requested via Manager
26.01.2013 20:14:01 | rosetta@home | Scheduler request completed


Regards
Rayburner
ID: 74988 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 74990 - Posted: 26 Jan 2013, 21:40:10 UTC - in response to Message 74988.  

Hi

I forgot to add the specs of my box:

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]

Microsoft Windows 7
Professional x64 Edition, Service Pack 1, (06.01.7601.00)

NVIDIA GeForce GTX 570 (2560MB) driver: 31090

Regards,
Rayburner

David

here is the information you requested

However the exist code 0 not 3

Work unit is https://boinc.bakerlab.org/rosetta/result.php?resultid=558582819

The log from BOINC:

26.01.2013 20:11:05 | rosetta@home | [task] Process for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 exited, exit code 0, task state 1
26.01.2013 20:11:05 | rosetta@home | [task] task_state=EXITED for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from handle_exited_app
26.01.2013 20:11:05 | rosetta@home | Computation for task P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 finished
26.01.2013 20:11:05 | rosetta@home | [task] result state=FILES_UPLOADING for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from CS::app_finished
26.01.2013 20:11:07 | rosetta@home | Started upload of P2_1_s4_f5_abinitio_design_y024_005_72943_231_0_0
26.01.2013 20:11:13 | rosetta@home | Finished upload of P2_1_s4_f5_abinitio_design_y024_005_72943_231_0_0
26.01.2013 20:11:13 | rosetta@home | [task] result state=FILES_UPLOADED for P2_1_s4_f5_abinitio_design_y024_005_72943_231_0 from CS::update_results
26.01.2013 20:13:58 | rosetta@home | Sending scheduler request: To report completed tasks.
26.01.2013 20:13:58 | rosetta@home | Reporting 1 completed tasks
26.01.2013 20:13:58 | rosetta@home | Not requesting tasks: "no new tasks" requested via Manager
26.01.2013 20:14:01 | rosetta@home | Scheduler request completed


Regards
Rayburner


ID: 74990 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,844,503
RAC: 1,768
Message 74992 - Posted: 27 Jan 2013, 0:37:59 UTC - in response to Message 74982.  
Last modified: 27 Jan 2013, 0:38:19 UTC

Email response from David Anderson:

Not sure.
You can ask the users to set <task_debug/> in their cc_config.xml,
and send you their message log when the behavior occurs.
-- David
David Kim


You might mention more on how to set <task_debug/>, so that more of us can do it.
ID: 74992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : Client errors



©2024 University of Washington
https://www.bakerlab.org