Problems with version 5.90/5.91

Message boards : Number crunching : Problems with version 5.90/5.91

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50134 - Posted: 28 Dec 2007, 14:42:03 UTC
Last modified: 28 Dec 2007, 14:49:34 UTC

Here's a list of ALL the watchdog ended tasks for all hosts/OSes using 5.90 or 5.91 since Nov 26th 2007... Do they have something in common???

1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12692_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12604_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12581_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12309_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12265_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12266_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12225_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12206_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11882_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10396_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10466_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_14658_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_115770_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_115774_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_121452_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_156769_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_161145_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_183748_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_36172_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_148361_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_144472_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_144384_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_143544_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11173_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11055_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11186_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11059_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12138_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_15985_1
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_74554_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_74522_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10255_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10233_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10229_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_73415_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_73402_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_73393_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_50756_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_13658_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_13656_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_13652_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11899_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11965_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11959_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11976_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10875_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10838_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10826_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_10090_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_11281_0
1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_30600_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12760_0
1zpy__BOINC_TWIST_RINGS_SYMM_FOLD_AND_DOCK-1zpy_-native__2470_12740_0

There are NO watch dog ended tasks for apps earlier than 5.90/5.91
ID: 50134 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50140 - Posted: 28 Dec 2007, 18:45:34 UTC

thing in common is batchnummer i guess

the nummer 2470 and/or 2477
ID: 50140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 50142 - Posted: 28 Dec 2007, 20:28:35 UTC

Task ID 129178883
Name 1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_254910_0
Workunit 117460024
Created 26 Dec 2007 7:04:33 UTC
Sent 26 Dec 2007 7:10:52 UTC
Received 28 Dec 2007 17:42:00 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 510574
Report deadline 5 Jan 2008 7:10:52 UTC
CPU time 13935.09
stderr out <core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3231021
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -106.099 for 900 seconds
**********************************************************************
GZIP SILENT FILE: .xx1zpy.out

</stderr_txt>
]]>


Validate state Valid
Claimed credit 57.7086028469705
Granted credit 77.8723488576529
application version 5.90




ID: 50142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5573
Credit: 5,565,689
RAC: 916
Message 50148 - Posted: 28 Dec 2007, 21:39:17 UTC

i have 2469 and 2474 zpy and the 2474 ran ok.
the 2469 is running now, seems to be ok so far at 5%
ID: 50148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pieface

Send message
Joined: 20 Sep 05
Posts: 17
Credit: 797,661
RAC: 0
Message 50149 - Posted: 28 Dec 2007, 22:12:37 UTC

Something still fishy with 5.90, I just had two units run for 24hrs without ever finishing a single decoy, sounds like some kinda loop-de-loop going on or it just doesn't know when to say a decoy is 'complete'. Good luck to the next boxes running these two:

wu 117532625
wu 117529650
ID: 50149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 64,598,137
RAC: 4,603
Message 50163 - Posted: 29 Dec 2007, 14:58:20 UTC - in response to Message 50149.  

Something still fishy with 5.90, I just had two units run for 24hrs without ever finishing a single decoy, sounds like some kinda loop-de-loop going on or it just doesn't know when to say a decoy is 'complete'. Good luck to the next boxes running these two:

wu 117532625
wu 117529650


I continue to get computation errors with 5.90.

Windows XP Q6600 with 2MB RAM. It looks like a failure rate of about 20%.

I noticed one failure after 3+ hours of calculation time.
Thx!

Paul

ID: 50163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 50171 - Posted: 29 Dec 2007, 21:11:17 UTC

117198862

Exit status -1073741819 (0xc0000005)
ID: 50171 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1828
Credit: 107,065,606
RAC: 8,400
Message 50207 - Posted: 31 Dec 2007, 12:12:11 UTC
Last modified: 31 Dec 2007, 12:12:35 UTC

i had no net connection last night and got loads of errors on all running PCs:
e.g.: https://boinc.bakerlab.org/rosetta/result.php?resultid=130123580

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 0 cpu seconds
This process generated 0 decoys from 0 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>e003_1_NMRREF_CCR19_id_model_02IGNORE_THE_REST_idl_2479_4179_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 50207 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50208 - Posted: 31 Dec 2007, 12:41:34 UTC

resultid=128931062 left me with a windows crash box prompt asking if I wanted to report it to microshaft.
ID: 50208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 64,598,137
RAC: 4,603
Message 50212 - Posted: 31 Dec 2007, 13:37:37 UTC - in response to Message 50208.  

resultid=128931062 left me with a windows crash box prompt asking if I wanted to report it to microshaft.


I have gone about 2 days without a single problem. I have no idea what changed but all of my work units appear to be running fine now.
Thx!

Paul

ID: 50212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Clare Jarvis

Send message
Joined: 14 Dec 05
Posts: 8
Credit: 874,698
RAC: 0
Message 50220 - Posted: 31 Dec 2007, 16:09:49 UTC

I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am
having is showing completion of 100% and not moving on to the next task.
I will go away for the weekend and find that the machines have been stuck
doing nothing.

If I suspend the task, it never reports and if I resume the task it never completes.
My only option is to abort the task and lose the credits.

Help!


ID: 50220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike.Gibson

Send message
Joined: 3 Nov 07
Posts: 19
Credit: 237,720
RAC: 0
Message 50227 - Posted: 31 Dec 2007, 22:27:19 UTC

Like Paul, I have had no problems for the last 2 days. The current series of WUs (Structural Genomics Target) seem fine.

Cheers

Mike
ID: 50227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
enigma

Send message
Joined: 29 Dec 07
Posts: 1
Credit: 256
RAC: 0
Message 50228 - Posted: 1 Jan 2008, 1:52:26 UTC

OMFG, thanks to MS it's possible to cheat that 5.90 thing through some times.
What a "new year memory mess"... superb calcs i guess, but what happens if that "app" accesses data out of my torrent client's adress space? Mmmhh...
THE pain in the ass.
*DROPPED*
ID: 50228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 50231 - Posted: 1 Jan 2008, 4:37:33 UTC - in response to Message 50220.  

I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am
having is showing completion of 100% and not moving on to the next task.


Your problem report would be more useful if you mentioned what kind of workunits you have problems with. Looking through your computers and unfinished workunits for them the problem workunits appear to be of the 1zpy__BOINC_TWIST_RINGS...2477... variety that a lot of us had problems with.

I would abort them if they don't end on their own.
Team Helix
ID: 50231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50245 - Posted: 1 Jan 2008, 21:19:10 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=130251802

another error
ID: 50245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Landroval

Send message
Joined: 21 Sep 06
Posts: 1
Credit: 773,911
RAC: 0
Message 50246 - Posted: 1 Jan 2008, 21:26:47 UTC

I am having problems with 5.90 on Windows. Specifically, the last several work units have errored out with unhandled exceptions.

Output from the workunits is listed here, here, here, and here.

I've not made any recent changes to the operating environment (OS patches, antivirus changes, etc) since before this started; at least none that I'm aware of. I've just downloaded another workunit on this machine; I'll post again if it crashes as well. Any advice appreciated.

ID: 50246 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 50248 - Posted: 1 Jan 2008, 21:32:08 UTC
Last modified: 1 Jan 2008, 21:33:51 UTC

application Rosetta Beta
created 19 Dec 2007 4:51:54 UTC
name 1eyvA_BOINC_ABINITIO_VF-S25-9-S3-3--1eyvA-vf__2450_2786
canonical result 127731681
granted credit 21.94
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1, 2, 1
Task ID
click for details Computer Sent Time reported
or deadline
explain Server state
explain Outcome
explain Client state
explain CPU time (sec) claimed credit granted credit
127731681 586364 19 Dec 2007 4:52:32 UTC 29 Dec 2007 11:32:54 UTC Over Success Done 10,586.59 21.94 28.41
129795554 510574 29 Dec 2007 4:53:10 UTC 1 Jan 2008 21:14:50 UTC Over Client error Aborted by user 0.00 0.00 ---

This task already completed by another cruncher, received credits so I aborted. Just noticed that original was under 5.89 but 5.90 on my PC.
ID: 50248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
transient
Avatar

Send message
Joined: 30 Sep 06
Posts: 376
Credit: 10,836,395
RAC: 0
Message 50252 - Posted: 2 Jan 2008, 5:26:47 UTC

So, there was no real problem? You saw it had been crunched before and decided to abort it?
ID: 50252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 50283 - Posted: 3 Jan 2008, 12:25:01 UTC

FYI, my WinXP laptop (5.90) with 512mb hit a memory wall with this workunit:

mgth-1-1t43_a_w012_MolecularReplacement_2482_7467

At 96% I got a 'waiting for memory' message.

This laptop has 768 but the two mini-PCI cards need help in seating properly so the contacts make full effect.

So after it was back up to full 768 mem, Rosetta restarted the workunit (back to 0%) and is processing normally.

I'll be curious if it hits another memory wall again.
ID: 50283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 50287 - Posted: 3 Jan 2008, 17:27:13 UTC

FYI, my memory wall didn't happen last night. The workunit completed fine.
ID: 50287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Problems with version 5.90/5.91



©2022 University of Washington
https://www.bakerlab.org