Quite a few signal 11 errors on a Linux host - what does it mean?

Message boards : Number crunching : Quite a few signal 11 errors on a Linux host - what does it mean?

To post messages, you must log in.

AuthorMessage
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 96083 - Posted: 5 May 2020, 2:07:18 UTC
Last modified: 5 May 2020, 2:08:11 UTC

Hello.
Browsing through my machines on the site today and came across this. Not sure what the logs mean.
[url]https://boinc.bakerlab.org/rosetta/results.php?hostid=4295358&offset=0&show_names=0&state=6&appid=

Log for a task:
<message>
process got signal 11
</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol jhr_boinc_v4.xml @flags -in:file:silent Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1dy7ly8k.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1dy7ly8k.zip @Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1dy7ly8k.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1075497
Using database: database_357d5d93529_n_methyl/minirosetta_database

</stderr_txt

Is this a hardware error? temperatures are fine the machine doesn't seem to be struggling for ram. Unless it's using a version of Boinc which is too old?
I tried to update using ppa:costamagnagianfranco/boinc, but receiving an error. Might try again if this persists.

Any help appreciated. Hate wasting tasks and resources like this.
[/url]
ID: 96083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 5,047,302
RAC: 51,969
Message 96106 - Posted: 5 May 2020, 12:32:33 UTC - in response to Message 96083.  

Hello.
Browsing through my machines on the site today and came across this. Not sure what the logs mean.
[url]https://boinc.bakerlab.org/rosetta/results.php?hostid=4295358&offset=0&show_names=0&state=6&appid=

Log for a task:
<message>
process got signal 11
</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol jhr_boinc_v4.xml @flags -in:file:silent Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1dy7ly8k.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1dy7ly8k.zip @Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1dy7ly8k.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1075497
Using database: database_357d5d93529_n_methyl/minirosetta_database

</stderr_txt

Is this a hardware error? temperatures are fine the machine doesn't seem to be struggling for ram. Unless it's using a version of Boinc which is too old?
I tried to update using ppa:costamagnagianfranco/boinc, but receiving an error. Might try again if this persists.

Any help appreciated. Hate wasting tasks and resources like this.
[/url]


Which computer?
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 96106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William Albert

Send message
Joined: 22 Mar 20
Posts: 23
Credit: 1,061,020
RAC: 0
Message 96117 - Posted: 5 May 2020, 16:25:07 UTC - in response to Message 96083.  

Looks like this is your host in question: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4295358

Your computer is running an AMD Opteron 6128 HE, which is based on the AMD K10 architecture.

AMD K10-based processors have a known issue with Rosetta on Linux where the application assumes that the SSSE3 instruction is present, which it is not for AMD K10. See this bug report for more details.

Hopefully this will be fixed soon. Until then, I would recommend either running Windows (which doesn't have this problem), or using this machine for other projects that you might be interested it.
ID: 96117 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 96132 - Posted: 5 May 2020, 20:58:16 UTC - in response to Message 96117.  

Looks like this is your host in question: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4295358

Your computer is running an AMD Opteron 6128 HE, which is based on the AMD K10 architecture.

AMD K10-based processors have a known issue with Rosetta on Linux where the application assumes that the SSSE3 instruction is present, which it is not for AMD K10. See this bug report for more details.

Hopefully this will be fixed soon. Until then, I would recommend either running Windows (which doesn't have this problem), or using this machine for other projects that you might be interested it.

Thank you so much. This helps a great deal.
I'll yank it off here and perhaps give WCG or TN grid a go. Or maybe I should just retire it.
ID: 96132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
spRocket
Avatar

Send message
Joined: 23 Mar 20
Posts: 22
Credit: 3,008,018
RAC: 0
Message 96139 - Posted: 5 May 2020, 21:57:54 UTC - in response to Message 96132.  

Both my homebrew NAS and an old machine I unretired turned out to be bitten by this bug as well. I've put those two onto WCG, and they've been happily crunching away.
ID: 96139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Quite a few signal 11 errors on a Linux host - what does it mean?



©2024 University of Washington
https://www.bakerlab.org