Problems with Rosetta version 5.98

Message boards : Number crunching : Problems with Rosetta version 5.98

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4016
Credit: 0
RAC: 0
Message 54640 - Posted: 24 Jul 2008, 13:27:10 UTC - in response to Message 54624.  
Last modified: 24 Jul 2008, 13:32:19 UTC

...two previous files of Rosetta@home were deleted when the BOINC manager got server request of Rosetta@home! I wondered if this workunit's crash was in connection with the deletion of two previous files.


The deleted files should be the databases used by the "mini" version of Rosetta, and so if this had been a mini task that would be likely. Others have reported that the BOINC client did not seem to recognize the fact that existing tasks required the files. Since it was not a mini task, I do not believe the deleted files is related to the problem you ran in to.

Here is a link to DK's post about that over on Ralph.
Rosetta Moderator: Mod.Sense
ID: 54640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 54679 - Posted: 27 Jul 2008, 0:04:27 UTC

Hello all,

Just having twice a Compute error, Exit status 193 (0xc1) on my Ubuntu 7.10 x86.
Link to results
From Boinc I received the next massages:
za 26 jul 2008 19:04:38 CEST|rosetta@home|Computation for task t499__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t499_-_4244_6669_0 finished
za 26 jul 2008 19:04:38 CEST|rosetta@home|Output file t499__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t499_-_4244_6669_0_0 for task t499__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t499_-_4244_6669_0 absent

zo 27 jul 2008 00:10:28 CEST|rosetta@home|Computation for task t498__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t498_-_4244_13407_0 finished
zo 27 jul 2008 00:10:28 CEST|rosetta@home|Output file t498__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t498_-_4244_13407_0_0 for task t498__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t498_-_4244_13407_0 absent

Have a nice day,
Path7.
ID: 54679 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Robert Gammon

Send message
Joined: 9 Nov 07
Posts: 14
Credit: 969,848
RAC: 0
Message 54690 - Posted: 27 Jul 2008, 13:06:10 UTC

BOINC Client 5.10.45
Rosetta 5.98
WinXP SP2 - intermittently connected to Internet

This problem duplicates with almost any WU.

Scenario is that the laptop is connected to internet long enough to upload completed results and to request/download new work. The laptop then disconnects from Internet to begin number crunching.

Rosetta processes the file a variable amount ( I have seen 55%, 72%, 88%, and 97% completion), then for one reason or another, BOINC shuts down (XP locks up and needs a reboot, power fails, or its time to shutdown for the night).

Note that some of these BOINC shutdowns are orderly, others are not. The result is the same, regardless of how we got there. Rosetta RESTARTS AT ZERO!! The WU gets reprocessed, redoing the work of 2-4 hours compute time.
ID: 54690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Robert Gammon

Send message
Joined: 9 Nov 07
Posts: 14
Credit: 969,848
RAC: 0
Message 54691 - Posted: 27 Jul 2008, 16:40:05 UTC - in response to Message 54690.  

BOINC Client 5.10.45
Rosetta 5.98
WinXP SP2 - intermittently connected to Internet

[snip]
The WU gets reprocessed, redoing the work of 2-4 hours compute time.


I just duplicated this again. I did an orderly shutdown to move the laptop. Rosetta was at 95.583% complete.

When BOINC restarted, Setiathome was the selected task. I let that run for about 5 minutes, then suspended Seti and allowed Rosetta to restart.

In a few moments, 0.00% complete, 3:55:20 to completion!!!

On computers with more than one project active, if this is NOT unique to my laptop, switching to other projects from Rosetta, then back to Rosetta, should show the same characteristic. Note that this is a configuration item on all project Account Info pages, interval between switching tasks. Mine is set to 3 hours.

I cannot do this as I only have access to a single computer.
ID: 54691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Robert Gammon

Send message
Joined: 9 Nov 07
Posts: 14
Credit: 969,848
RAC: 0
Message 54696 - Posted: 27 Jul 2008, 21:35:01 UTC - in response to Message 54691.  

BOINC Client 5.10.45
Rosetta 5.98
WinXP SP2 - intermittently connected to Internet

[snip]
In a few moments, 0.00% complete, 3:55:20 to completion!!!

On computers with more than one project active, if this is NOT unique to my laptop, switching to other projects from Rosetta, then back to Rosetta, should show the same characteristic. Note that this is a configuration item on all project Account Info pages, interval between switching tasks. Mine is set to 3 hours.

I cannot do this as I only have access to a single computer.


I tried again, putting the project on Suspend, waiting 30 minutes while I did some other work, then did a Resume, and EUREKA, it WORKED, execution continued from the spot it left of when the Suspend was issued.

So this makes it seem like the signal BOINC issues when the user EXITS the application leaves the Rosetta work unit in an unstable state, same as an abort due to power fail on the computer. SUSPEND appears to act differently and Rosetta does an orderly pause of the work.

ID: 54696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4016
Credit: 0
RAC: 0
Message 54697 - Posted: 28 Jul 2008, 0:48:36 UTC

Robert, suspend, at least when you have it set to leave applications in memory while suspended... is entirely different then BOINC shutting down.

What you are seeing is normal, and not unexpected. It will differ for different types of work units. Some checkpoint more frequently then others. Some complete models more frequently then others. If you would like to discuss it further, since this is not a problem specific to v5.98, please open a new thread.
Rosetta Moderator: Mod.Sense
ID: 54697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Korz53
Avatar

Send message
Joined: 22 Apr 07
Posts: 2
Credit: 144,960
RAC: 0
Message 54711 - Posted: 29 Jul 2008, 0:07:57 UTC
Last modified: 29 Jul 2008, 0:16:34 UTC

No graphics when clicking show graphics when Rossetta is running. .CPU is high in kernel_task (36.5%) when minirosetta is running . boinc ( not Responding) may be do to minirosetta. well need to quit BOINC and restart to reset boinc. boinc not Responding has been happening on and off





Model Name: iMac
Model Identifier: iMac6,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2.33 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 3 GB
Bus Speed: 667 MHz
Boot ROM Version: IM61.0093.B07
SMC Version: 1.10f2
ID: 54711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 54778 - Posted: 31 Jul 2008, 14:39:04 UTC

A couple of t498__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t498_-_4244_ WUs ended with a segmentation violation on two different Linux computers. The stack trace looks similar in each case.

https://boinc.bakerlab.org/rosetta/result.php?resultid=180458759
https://boinc.bakerlab.org/rosetta/result.php?resultid=180399471
ID: 54778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54782 - Posted: 31 Jul 2008, 17:38:43 UTC
Last modified: 31 Jul 2008, 17:40:10 UTC

t498__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t498_-_4244_5892_0

50 minute computation and then:
Exit status -1073741819 (0xc0000005)
CPU runtime 3013.89 secs
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3449163


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0093E2E5 write attempt to address 0x1610F000

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C911D8F read attempt to address 0xFFFFFFF8

Engaging BOINC Windows Runtime Debugger...

# cpu_run_time_pref: 14400
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...


and alot of other stuff mostly PDB symbols.

Is this going to become a commmon theme?
ID: 54782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 608
Credit: 8,997,172
RAC: 6,691
Message 54788 - Posted: 31 Jul 2008, 19:26:31 UTC

Had this one crash and put up the "Rosetta has encountered an error and needs to close" dialog box - not seen that for a while.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 54788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - The Prof....

Send message
Joined: 5 Nov 06
Posts: 1
Credit: 18,584
RAC: 0
Message 54789 - Posted: 31 Jul 2008, 20:36:12 UTC

Have had so many crash in the last few days with "client error", etc, that I think I am going to go crunch something else for a while till this gets properly de-bugged.
ID: 54789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54790 - Posted: 31 Jul 2008, 21:03:53 UTC - in response to Message 54789.  

Have had so many crash in the last few days with "client error", etc, that I think I am going to go crunch something else for a while till this gets properly de-bugged.


you should post this as well so they know whats going on...
<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 96.3281 cpu seconds
This process generated 0 decoys from 0 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>h001__BOINC_CASP8_ABRELAX_RANGE_tvat_d2r__IGNORE_THE_REST-S25-6-S3-9--h001_-_4307_155_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


You got that how many times? 6 or so?
ID: 54790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4016
Credit: 0
RAC: 0
Message 54802 - Posted: 1 Aug 2008, 13:22:54 UTC

Too many restarts with no progress


Normally this would be due to ending BOINC before the tasks can checkpoint, or suspending the task before it can checkpoint and not leaving suspended tasks in memory... however, I don't see the output lines that indicate how many times it did start, and with what runtime preference that I would normally expect to see if that was truely what had occured.

UBT, does your machine run 24/7? Do you run other projects? Some background may prove helpful.
Rosetta Moderator: Mod.Sense
ID: 54802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54805 - Posted: 1 Aug 2008, 16:39:49 UTC

t499__BOINC_SYMMETRY_D2SYMM_FOLD_AND_DOCK_RELAX-t499_-_4244_11023_0

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3439032
# cpu_run_time_pref: 14400
# random seed: 3439032
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
======================================================
DONE :: 1 starting structures 12695.9 cpu seconds
This process generated 7 decoys from 7 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

i LOST 7 points on this one...why would you lose points instead of break even?
ID: 54805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54806 - Posted: 1 Aug 2008, 16:41:33 UTC

t499__BOINC_SYMMETRY_C4SYMM_FOLD_AND_DOCK_RELAX-t499_-_4244_16487_0
14543.28
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3423568
# cpu_run_time_pref: 14400
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
# cpu_run_time_pref: 14400
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
======================================================
DONE :: 1 starting structures 14543.1 cpu seconds
This process generated 7 decoys from 7 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

.40 credit LOST on this one
ID: 54806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54807 - Posted: 1 Aug 2008, 16:44:13 UTC

t498__BOINC_SYMMETRY_C3SYMM_FOLD_AND_DOCK_RELAX-t498_-_4244_17219_0
CPU time 14213.28
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3447836
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
WARNING! Not sure non-ideal rotamers are compatible with symmetry yet...
======================================================
DONE :: 1 starting structures 14212.5 cpu seconds
This process generated 26 decoys from 26 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

gained some serious credit on this even with the errors
ID: 54807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54808 - Posted: 1 Aug 2008, 16:46:31 UTC - in response to Message 54802.  
Last modified: 1 Aug 2008, 16:47:24 UTC

Mod: check out all the errors in his profile. He has alot of <error_code>-161</error_code>

https://boinc.bakerlab.org/rosetta/results.php?hostid=837076

Too many restarts with no progress


Normally this would be due to ending BOINC before the tasks can checkpoint, or suspending the task before it can checkpoint and not leaving suspended tasks in memory... however, I don't see the output lines that indicate how many times it did start, and with what runtime preference that I would normally expect to see if that was truely what had occured.

UBT, does your machine run 24/7? Do you run other projects? Some background may prove helpful.
ID: 54808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mike

Send message
Joined: 10 Dec 05
Posts: 1
Credit: 77,598
RAC: 0
Message 54897 - Posted: 3 Aug 2008, 22:32:38 UTC

I seem to have a problem with minirosetta application not starting.I just installed the new version of boinc.I keep getting windows error messages,telling me that there was a problem and asking me to send an error report to Microsoft.I tried aborting the work units to get more but it still give me error messages.It seems when it tries to start a work unit it will go immediately to 100% and the error messsage pops up
ID: 54897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4873
Credit: 4,306,376
RAC: 2,125
Message 54904 - Posted: 4 Aug 2008, 5:46:45 UTC - in response to Message 54897.  

I seem to have a problem with minirosetta application not starting.I just installed the new version of boinc.I keep getting windows error messages,telling me that there was a problem and asking me to send an error report to Microsoft.I tried aborting the work units to get more but it still give me error messages.It seems when it tries to start a work unit it will go immediately to 100% and the error messsage pops up


you need to post this over in 1.28 not here in 5.98
ID: 54904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 608
Credit: 8,997,172
RAC: 6,691
Message 54926 - Posted: 5 Aug 2008, 16:00:36 UTC
Last modified: 5 Aug 2008, 16:05:00 UTC

Had this one crash on me today with an unhandled exception, (while I was out of course), and it put up the "Rosetta has encountered..." message box which then leaves that core dead until I return to click the OK button. It should not do this under ANY circumstances, yet is the second time recently.

I have Rosetta on machines at remote sites which I don't visit often. If it happens out there, the core/machine is dead until I next get there.

try
{
     Rosetta
}
catch (...)
{
     Bomb out
}

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 54926 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.98



©2020 University of Washington
https://www.bakerlab.org