High error rate

Message boards : Number crunching : High error rate

To post messages, you must log in.

AuthorMessage
USTL-FIL (Lille Fr)

Send message
Joined: 20 Jan 10
Posts: 5
Credit: 244,371,291
RAC: 858
Message 97245 - Posted: 5 Jun 2020, 16:46:46 UTC

Hello,

I noticed a high error rate today for my linux hosts:
https://boinc.bakerlab.org/rosetta/results.php?userid=367342&offset=0&show_names=0&state=6&appid=

Some errors like this:
"<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
finish file present too long</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_at3_design_boinc_v1.xml @flags_covid_site3 -in:file:silent Mini_Protein_binds_COVID-19_boinc_site3_8_SAVE_ALL_OUT_IGNORE_THE_REST_4wp2gq2j.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip Mini_Protein_binds_COVID-19_boinc_site3_8_SAVE_ALL_OUT_IGNORE_THE_REST_4wp2gq2j.zip @Mini_Protein_binds_COVID-19_boinc_site3_8_SAVE_ALL_OUT_IGNORE_THE_REST_4wp2gq2j.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1497416
Using database: database_357d5d93529_n_methyl/minirosetta_database
======================================================
DONE :: 35 starting structures 7153.19 cpu seconds
This process generated 35 decoys from 35 attempts
======================================================
BOINC :: WS_max 3.47357e+08
18:23:28 (27269): called boinc_finish(0)

</stderr_txt>
]]>"

and others like this:

"<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
.
.
.
"

Thanks
ID: 97245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 97329 - Posted: 11 Jun 2020, 12:32:51 UTC - in response to Message 97245.  

Hello,

I noticed a high error rate today for my linux hosts:
https://boinc.bakerlab.org/rosetta/results.php?userid=367342&offset=0&show_names=0&state=6&appid=

Some errors like this:
"<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
finish file present too long</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_at3_design_boinc_v1.xml @flags_covid_site3 -in:file:silent Mini_Protein_binds_COVID-19_boinc_site3_8_SAVE_ALL_OUT_IGNORE_THE_REST_4wp2gq2j.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip Mini_Protein_binds_COVID-19_boinc_site3_8_SAVE_ALL_OUT_IGNORE_THE_REST_4wp2gq2j.zip @Mini_Protein_binds_COVID-19_boinc_site3_8_SAVE_ALL_OUT_IGNORE_THE_REST_4wp2gq2j.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1497416
Using database: database_357d5d93529_n_methyl/minirosetta_database
======================================================
DONE :: 35 starting structures 7153.19 cpu seconds
This process generated 35 decoys from 35 attempts
======================================================
BOINC :: WS_max 3.47357e+08
18:23:28 (27269): called boinc_finish(0)

</stderr_txt>
]]>"

and others like this:

"<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
*** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 ***
.
.
.
"

Thanks


A Signal 11 error means:
"signal 11, or translated a segmentation error pointing to a problem with your memory, virtual memory (page file) or that it's a bad batch of tasks. However, if you're the only one returning these as an error and consistently over two or more projects, you best go look into a problem with the RAM or page file on that computer."
ID: 97329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Eugene

Send message
Joined: 11 Jan 16
Posts: 1
Credit: 1,901,430
RAC: 0
Message 97373 - Posted: 14 Jun 2020, 7:29:42 UTC

Regarding "Finish file present too long", that is known client issue:
https://github.com/BOINC/boinc/issues/3017
Fixed in recent versions.
ID: 97373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : High error rate



©2024 University of Washington
https://www.bakerlab.org