Computation errors

Message boards : Number crunching : Computation errors

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 7,395,798
RAC: 4,782
Message 93136 - Posted: 3 Apr 2020, 0:11:41 UTC
Last modified: 3 Apr 2020, 0:12:32 UTC

I just had a whole batch of jobs error out all of the had the error " Too many result"

name rb_04_01_20095_19938_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_06_904919_9
application Rosetta
created 2 Apr 2020, 0:50:45 UTC
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1, 1, 1
errors Too many total results


Stderr output
<core_client_version>7.17.0</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_01_20095_19938_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -psipred_ss2 t000_.spider3_ss2 -kill_hairpins t000_.nobuformat.spider3_ss2 -jumps:pairing_file t000_.fasta.bbcontacts.jumps -abinitio::use_filters false -skip_convergence_check -jumps:overlap_chainbreak -seq_sep_stages 1 1 1 -ramp_chainbreaks -sep_switch_accelerate 0.8 -jumps:random_sheets 7 2 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_04_01_20095_19938_ab_t000__robetta.zip -frag3 rb_04_01_20095_19938_ab_t000__robetta.200.3mers.index.gz -fragA rb_04_01_20095_19938_ab_t000__robetta.200.6mers.index.gz -fragB rb_04_01_20095_19938_ab_t000__robetta.200.3mers.index.gz -nstruct 10000 -cpu_run_time 57600 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3752306
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 18587.7s, 14400s + 3600s[2020- 4- 2 16:44:30:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 18587.7 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
16:44:30 (10035): called boinc_finish(0)

</stderr_txt>
]]>

What does this indicate. I do not want to spend all my time running jobs and them just erroring out at the end.

ID: 93136 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,174,417
RAC: 10,123
Message 93142 - Posted: 3 Apr 2020, 1:15:53 UTC - in response to Message 93136.  

I just had a whole batch of jobs error out all of the had the error " Too many result"

errors Too many total results

<message>
process got signal 11</message>

BOINC:: CPU time: 18587.7s, 14400s + 3600s[2020- 4- 2 16:44:30:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 18587.7 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================

What does this indicate. I do not want to spend all my time running jobs and them just erroring out at the end.

I don't know. But could the message really mean 'too many error results to list' ?

Signal 11 gets reported a lot - is that memory again (but I see you've got loads)
Stream information inconsistent - no idea
And the watchdog had to cut in and shut the task down after 1hr runtime +4hrs watchdog

Certainly lots going wrong
ID: 93142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93145 - Posted: 3 Apr 2020, 1:44:33 UTC - in response to Message 93136.  

It shows you are on BOINC 7.17.0, (the current recommended version is 7.4.22)

It also indicates that the task did not complete the first model within the 1hr preferred runtime plus 4hr watchdog timeout. The task was created in a way that causes it not to go out to another host for validation. So the one error was "too many", and the WU (which, sometimes, could be more than just the task that went to you) was ended. And then I guess as the watchdog went to end the task, it found no output file.

In a nutshell, you hit a long running model against the smallest possible runtime, and it was ended for you.
Rosetta Moderator: Mod.Sense
ID: 93145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom Rinehart

Send message
Joined: 28 Mar 20
Posts: 7
Credit: 1,637,467
RAC: 0
Message 93147 - Posted: 3 Apr 2020, 2:11:27 UTC
Last modified: 3 Apr 2020, 2:12:15 UTC

I posted this on ralph@home, but this is probably better. All the 4.12 work units on my Mac quickly end in a computation error:

<core_client_version>7.14.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: rosetta_4.12_x86_64-apple-darwin -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_spike_design_boinc_v1.xml @flags_jhr_cv -in:file:silent 3xc3uf2h_Junior_HalfRoid_vs_COVID-19_design1.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 3xc3uf2h_Junior_HalfRoid_vs_COVID-19_design1.zip @3xc3uf2h_Junior_HalfRoid_vs_COVID-19_design1.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937
Starting watchdog...
Watchdog active.
error: zipfile probably corrupt (illegal instruction)

</stderr_txt>
]]>


One of my other Macs is getting this error:

<core_client_version>7.14.3</core_client_version>
<![CDATA[
<message>
process got signal 4</message>
<stderr_txt>

</stderr_txt>
]]>

I have two linux boxes running Debian Buster that are working fine, so it looks like a Mac app problem.
ID: 93147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 7,395,798
RAC: 4,782
Message 93170 - Posted: 3 Apr 2020, 6:59:58 UTC - in response to Message 93145.  

It shows you are on BOINC 7.17.0, (the current recommended version is 7.4.22)

It also indicates that the task did not complete the first model within the 1hr preferred runtime plus 4hr watchdog timeout. The task was created in a way that causes it not to go out to another host for validation. So the one error was "too many", and the WU (which, sometimes, could be more than just the task that went to you) was ended. And then I guess as the watchdog went to end the task, it found no output file.

In a nutshell, you hit a long running model against the smallest possible runtime, and it was ended for you.



Okay thanks. I just wanted to make sure it was not a hardware issue to prevent issue in the future. This round was lots of wasted computation time.

ID: 93170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,637,805
RAC: 799
Message 93173 - Posted: 3 Apr 2020, 7:39:15 UTC

>>> (the current recommended version is 7.4.22)

7.14.2.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 93173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 95
Credit: 289,903
RAC: 0
Message 93282 - Posted: 3 Apr 2020, 20:46:48 UTC - in response to Message 93145.  

It shows you are on BOINC 7.17.0, (the current recommended version is 7.4.22)

That information is incorrect. It only reflects that no official developer at BOINC has compiled a current version. Linux Boinc has been neglected and abandoned for the past 6 years with no official BOINC distribution since they haven't had any Linux developer active since 2014.
Current Linux Boinc versions can be compiled by the end user or the user can use one of the official ppas or the version distributed in their Linux distributions repository. That is usually minimally the 7.9.3 version in most Debian distros and up to 7.14.2 version in others.
ID: 93282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Computation errors



©2024 University of Washington
https://www.bakerlab.org