Message boards : Number crunching : Problems with version 5.90/5.91
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
thing in common is batchnummer i guess the nummer 2470 and/or 2477 |
M.L. Send message Joined: 21 Nov 06 Posts: 182 Credit: 180,462 RAC: 0 |
Task ID 129178883 Name 1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_254910_0 Workunit 117460024 Created 26 Dec 2007 7:04:33 UTC Sent 26 Dec 2007 7:10:52 UTC Received 28 Dec 2007 17:42:00 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 510574 Report deadline 5 Jan 2008 7:10:52 UTC CPU time 13935.09 stderr out <core_client_version>5.10.30</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 3231021 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -106.099 for 900 seconds ********************************************************************** GZIP SILENT FILE: .xx1zpy.out </stderr_txt> ]]> Validate state Valid Claimed credit 57.7086028469705 Granted credit 77.8723488576529 application version 5.90 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i have 2469 and 2474 zpy and the 2474 ran ok. the 2469 is running now, seems to be ok so far at 5% |
pieface Send message Joined: 20 Sep 05 Posts: 17 Credit: 797,661 RAC: 0 |
Something still fishy with 5.90, I just had two units run for 24hrs without ever finishing a single decoy, sounds like some kinda loop-de-loop going on or it just doesn't know when to say a decoy is 'complete'. Good luck to the next boxes running these two: wu 117532625 wu 117529650 |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,607,712 RAC: 7,938 |
Something still fishy with 5.90, I just had two units run for 24hrs without ever finishing a single decoy, sounds like some kinda loop-de-loop going on or it just doesn't know when to say a decoy is 'complete'. Good luck to the next boxes running these two: I continue to get computation errors with 5.90. Windows XP Q6600 with 2MB RAM. It looks like a failure rate of about 20%. I noticed one failure after 3+ hours of calculation time. Thx! Paul |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
|
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,860,059 RAC: 5,566 |
i had no net connection last night and got loads of errors on all running PCs: e.g.: https://boinc.bakerlab.org/rosetta/result.php?resultid=130123580 <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 2821752 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 14400 # random seed: 2821752 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 14400 # random seed: 2821752 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 14400 # random seed: 2821752 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 14400 # random seed: 2821752 No heartbeat from core client for 31 sec - exiting Too many restarts with no progress. Keep application in memory while preempted. ====================================================== DONE :: 1 starting structures 0 cpu seconds This process generated 0 decoys from 0 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>e003_1_NMRREF_CCR19_id_model_02IGNORE_THE_REST_idl_2479_4179_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
resultid=128931062 left me with a windows crash box prompt asking if I wanted to report it to microshaft. |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,607,712 RAC: 7,938 |
resultid=128931062 left me with a windows crash box prompt asking if I wanted to report it to microshaft. I have gone about 2 days without a single problem. I have no idea what changed but all of my work units appear to be running fine now. Thx! Paul |
Clare Jarvis Send message Joined: 14 Dec 05 Posts: 8 Credit: 874,698 RAC: 0 |
I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am having is showing completion of 100% and not moving on to the next task. I will go away for the weekend and find that the machines have been stuck doing nothing. If I suspend the task, it never reports and if I resume the task it never completes. My only option is to abort the task and lose the credits. Help! |
Mike.Gibson Send message Joined: 3 Nov 07 Posts: 19 Credit: 311,844 RAC: 0 |
Like Paul, I have had no problems for the last 2 days. The current series of WUs (Structural Genomics Target) seem fine. Cheers Mike |
enigma Send message Joined: 29 Dec 07 Posts: 1 Credit: 256 RAC: 0 |
OMFG, thanks to MS it's possible to cheat that 5.90 thing through some times. What a "new year memory mess"... superb calcs i guess, but what happens if that "app" accesses data out of my torrent client's adress space? Mmmhh... THE pain in the ass. *DROPPED* |
Thomas Leibold Send message Joined: 30 Jul 06 Posts: 55 Credit: 19,627,164 RAC: 0 |
I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am Your problem report would be more useful if you mentioned what kind of workunits you have problems with. Looking through your computers and unfinished workunits for them the problem workunits appear to be of the 1zpy__BOINC_TWIST_RINGS...2477... variety that a lot of us had problems with. I would abort them if they don't end on their own. Team Helix |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=130251802 another error |
Landroval Send message Joined: 21 Sep 06 Posts: 1 Credit: 825,914 RAC: 0 |
I am having problems with 5.90 on Windows. Specifically, the last several work units have errored out with unhandled exceptions. Output from the workunits is listed here, here, here, and here. I've not made any recent changes to the operating environment (OS patches, antivirus changes, etc) since before this started; at least none that I'm aware of. I've just downloaded another workunit on this machine; I'll post again if it crashes as well. Any advice appreciated. |
M.L. Send message Joined: 21 Nov 06 Posts: 182 Credit: 180,462 RAC: 0 |
application Rosetta Beta created 19 Dec 2007 4:51:54 UTC name 1eyvA_BOINC_ABINITIO_VF-S25-9-S3-3--1eyvA-vf__2450_2786 canonical result 127731681 granted credit 21.94 minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 Task ID click for details Computer Sent Time reported or deadline explain Server state explain Outcome explain Client state explain CPU time (sec) claimed credit granted credit 127731681 586364 19 Dec 2007 4:52:32 UTC 29 Dec 2007 11:32:54 UTC Over Success Done 10,586.59 21.94 28.41 129795554 510574 29 Dec 2007 4:53:10 UTC 1 Jan 2008 21:14:50 UTC Over Client error Aborted by user 0.00 0.00 --- This task already completed by another cruncher, received credits so I aborted. Just noticed that original was under 5.89 but 5.90 on my PC. |
cnick6 Send message Joined: 30 May 06 Posts: 29 Credit: 12,597,623 RAC: 0 |
FYI, my WinXP laptop (5.90) with 512mb hit a memory wall with this workunit: mgth-1-1t43_a_w012_MolecularReplacement_2482_7467 At 96% I got a 'waiting for memory' message. This laptop has 768 but the two mini-PCI cards need help in seating properly so the contacts make full effect. So after it was back up to full 768 mem, Rosetta restarted the workunit (back to 0%) and is processing normally. I'll be curious if it hits another memory wall again. |
cnick6 Send message Joined: 30 May 06 Posts: 29 Credit: 12,597,623 RAC: 0 |
FYI, my memory wall didn't happen last night. The workunit completed fine. |
Tribaal Send message Joined: 6 Feb 06 Posts: 80 Credit: 2,754,607 RAC: 0 |
I am running rosetta_beta_5.91_i686-pc-linux-gnu. The problem I am I'm having the exact same issue. It is quite a problem, since most of my machines run headless - I don't have the luxury to spend time logging into each one to abort tasks manually... I use 5.91 on GNU/linux (on all of my machines) - Trib' |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
if you have one pc with a monitor, and boinc, you can log in to the other computers by that computer, and so abort the tasks, that shouldnt take more time than half an hour i guess, untill you have something like 40 pc's. |
Message boards :
Number crunching :
Problems with version 5.90/5.91
©2024 University of Washington
https://www.bakerlab.org