Problems with version 5.90/5.91

Message boards : Number crunching : Problems with version 5.90/5.91

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50140 - Posted: 28 Dec 2007, 18:45:34 UTC

thing in common is batchnummer i guess

the nummer 2470 and/or 2477
ID: 50140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 50142 - Posted: 28 Dec 2007, 20:28:35 UTC

Task ID 129178883
Name 1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_254910_0
Workunit 117460024
Created 26 Dec 2007 7:04:33 UTC
Sent 26 Dec 2007 7:10:52 UTC
Received 28 Dec 2007 17:42:00 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 510574
Report deadline 5 Jan 2008 7:10:52 UTC
CPU time 13935.09
stderr out <core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3231021
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -106.099 for 900 seconds
**********************************************************************
GZIP SILENT FILE: .xx1zpy.out

</stderr_txt>
]]>


Validate state Valid
Claimed credit 57.7086028469705
Granted credit 77.8723488576529
application version 5.90




ID: 50142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,005
RAC: 2,103
Message 50148 - Posted: 28 Dec 2007, 21:39:17 UTC

i have 2469 and 2474 zpy and the 2474 ran ok.
the 2469 is running now, seems to be ok so far at 5%
ID: 50148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pieface

Send message
Joined: 20 Sep 05
Posts: 17
Credit: 797,661
RAC: 0
Message 50149 - Posted: 28 Dec 2007, 22:12:37 UTC

Something still fishy with 5.90, I just had two units run for 24hrs without ever finishing a single decoy, sounds like some kinda loop-de-loop going on or it just doesn't know when to say a decoy is 'complete'. Good luck to the next boxes running these two:

wu 117532625
wu 117529650
ID: 50149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,742,447
RAC: 873
Message 50163 - Posted: 29 Dec 2007, 14:58:20 UTC - in response to Message 50149.  

Something still fishy with 5.90, I just had two units run for 24hrs without ever finishing a single decoy, sounds like some kinda loop-de-loop going on or it just doesn't know when to say a decoy is 'complete'. Good luck to the next boxes running these two:

wu 117532625
wu 117529650


I continue to get computation errors with 5.90.

Windows XP Q6600 with 2MB RAM. It looks like a failure rate of about 20%.

I noticed one failure after 3+ hours of calculation time.
Thx!

Paul

ID: 50163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 50171 - Posted: 29 Dec 2007, 21:11:17 UTC

117198862

Exit status -1073741819 (0xc0000005)
ID: 50171 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,483,446
RAC: 55,198
Message 50207 - Posted: 31 Dec 2007, 12:12:11 UTC
Last modified: 31 Dec 2007, 12:12:35 UTC

i had no net connection last night and got loads of errors on all running PCs:
e.g.: https://boinc.bakerlab.org/rosetta/result.php?resultid=130123580

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 14400
# random seed: 2821752
No heartbeat from core client for 31 sec - exiting
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 0 cpu seconds
This process generated 0 decoys from 0 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>e003_1_NMRREF_CCR19_id_model_02IGNORE_THE_REST_idl_2479_4179_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 50207 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50208 - Posted: 31 Dec 2007, 12:41:34 UTC

resultid=128931062 left me with a windows crash box prompt asking if I wanted to report it to microshaft.
ID: 50208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,742,447
RAC: 873
Message 50212 - Posted: 31 Dec 2007, 13:37:37 UTC - in response to Message 50208.  

resultid=128931062 left me with a windows crash box prompt asking if I wanted to report it to microshaft.


I have gone about 2 days without a single problem. I have no idea what changed but all of my work units appear to be running fine now.
Thx!

Paul

ID: 50212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Clare Jarvis

Send message
Joined: 14 Dec 05
Posts: 8
Credit: 874,698
RAC: 0
Message 50220 - Posted: 31 Dec 2007, 16:09:49 UTC

I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am
having is showing completion of 100% and not moving on to the next task.
I will go away for the weekend and find that the machines have been stuck
doing nothing.

If I suspend the task, it never reports and if I resume the task it never completes.
My only option is to abort the task and lose the credits.

Help!


ID: 50220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike.Gibson

Send message
Joined: 3 Nov 07
Posts: 19
Credit: 311,844
RAC: 0
Message 50227 - Posted: 31 Dec 2007, 22:27:19 UTC

Like Paul, I have had no problems for the last 2 days. The current series of WUs (Structural Genomics Target) seem fine.

Cheers

Mike
ID: 50227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
enigma

Send message
Joined: 29 Dec 07
Posts: 1
Credit: 256
RAC: 0
Message 50228 - Posted: 1 Jan 2008, 1:52:26 UTC

OMFG, thanks to MS it's possible to cheat that 5.90 thing through some times.
What a "new year memory mess"... superb calcs i guess, but what happens if that "app" accesses data out of my torrent client's adress space? Mmmhh...
THE pain in the ass.
*DROPPED*
ID: 50228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 50231 - Posted: 1 Jan 2008, 4:37:33 UTC - in response to Message 50220.  

I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am
having is showing completion of 100% and not moving on to the next task.


Your problem report would be more useful if you mentioned what kind of workunits you have problems with. Looking through your computers and unfinished workunits for them the problem workunits appear to be of the 1zpy__BOINC_TWIST_RINGS...2477... variety that a lot of us had problems with.

I would abort them if they don't end on their own.
Team Helix
ID: 50231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50245 - Posted: 1 Jan 2008, 21:19:10 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=130251802

another error
ID: 50245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Landroval

Send message
Joined: 21 Sep 06
Posts: 1
Credit: 825,914
RAC: 0
Message 50246 - Posted: 1 Jan 2008, 21:26:47 UTC

I am having problems with 5.90 on Windows. Specifically, the last several work units have errored out with unhandled exceptions.

Output from the workunits is listed here, here, here, and here.

I've not made any recent changes to the operating environment (OS patches, antivirus changes, etc) since before this started; at least none that I'm aware of. I've just downloaded another workunit on this machine; I'll post again if it crashes as well. Any advice appreciated.

ID: 50246 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 50248 - Posted: 1 Jan 2008, 21:32:08 UTC
Last modified: 1 Jan 2008, 21:33:51 UTC

application Rosetta Beta
created 19 Dec 2007 4:51:54 UTC
name 1eyvA_BOINC_ABINITIO_VF-S25-9-S3-3--1eyvA-vf__2450_2786
canonical result 127731681
granted credit 21.94
minimum quorum 1
initial replication 1
max # of error/total/success tasks 1, 2, 1
Task ID
click for details Computer Sent Time reported
or deadline
explain Server state
explain Outcome
explain Client state
explain CPU time (sec) claimed credit granted credit
127731681 586364 19 Dec 2007 4:52:32 UTC 29 Dec 2007 11:32:54 UTC Over Success Done 10,586.59 21.94 28.41
129795554 510574 29 Dec 2007 4:53:10 UTC 1 Jan 2008 21:14:50 UTC Over Client error Aborted by user 0.00 0.00 ---

This task already completed by another cruncher, received credits so I aborted. Just noticed that original was under 5.89 but 5.90 on my PC.
ID: 50248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 50283 - Posted: 3 Jan 2008, 12:25:01 UTC

FYI, my WinXP laptop (5.90) with 512mb hit a memory wall with this workunit:

mgth-1-1t43_a_w012_MolecularReplacement_2482_7467

At 96% I got a 'waiting for memory' message.

This laptop has 768 but the two mini-PCI cards need help in seating properly so the contacts make full effect.

So after it was back up to full 768 mem, Rosetta restarted the workunit (back to 0%) and is processing normally.

I'll be curious if it hits another memory wall again.
ID: 50283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 50287 - Posted: 3 Jan 2008, 17:27:13 UTC

FYI, my memory wall didn't happen last night. The workunit completed fine.
ID: 50287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tribaal
Avatar

Send message
Joined: 6 Feb 06
Posts: 80
Credit: 2,754,607
RAC: 0
Message 50307 - Posted: 4 Jan 2008, 7:57:43 UTC

I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am
having is showing completion of 100% and not moving on to the next task.
I will go away for the weekend and find that the machines have been stuck
doing nothing.

If I suspend the task, it never reports and if I resume the task it never completes.
My only option is to abort the task and lose the credits.

Help!


I'm having the exact same issue.
It is quite a problem, since most of my machines run headless - I don't have the luxury to spend time logging into each one to abort tasks manually...
I use 5.91 on GNU/linux (on all of my machines)

- Trib'
ID: 50307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50315 - Posted: 4 Jan 2008, 16:31:30 UTC

if you have one pc with a monitor, and boinc, you can log in to the other computers by that computer, and so abort the tasks, that shouldnt take more time than half an hour i guess, untill you have something like 40 pc's.
ID: 50315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Problems with version 5.90/5.91



©2024 University of Washington
https://www.bakerlab.org