Rosetta@home

Minirosetta v1.32 bug thread

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Minirosetta v1.32 bug thread

Sort
AuthorMessage
David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 961
ID: 14
Credit: 2,369,109
RAC: 1,381
Message 54936 - Posted 5 Aug 2008 23:38:40 UTC

Please post bugs/issues with minirosetta v1.32 here.

toribio

Joined: Jul 26 07
Posts: 1
ID: 193617
Credit: 42,493
RAC: 0
Message 54950 - Posted 6 Aug 2008 8:34:39 UTC

I'm sorry if this is irrelevant to the version and not the servers, but, since the new version came out, I have been unable to receive and WUs, and none of my settings ever changed. Once gain I apologize if this is just the servers and not the version. Just a coincidence.

DJStarfox

Joined: Jul 19 07
Posts: 140
ID: 191721
Credit: 575,994
RAC: 722
Message 54971 - Posted 7 Aug 2008 3:41:46 UTC

This WU died on startup:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=165026170

Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 54972 - Posted 7 Aug 2008 4:42:33 UTC
Last modified: 7 Aug 2008 4:46:48 UTC

The task below just failed on my machine. This is the first mini rosetta work unit to get processed since the new release and the error seems to be similar to the errors that I was repeatedly getting on version 1.28.

http://boinc.bakerlab.org/rosetta/result.php?resultid=182934444

BTW, version 5.98 WUs have been running successfully.

netwraith Profile
Avatar

Joined: Sep 3 06
Posts: 80
ID: 109740
Credit: 13,483,227
RAC: 0
Message 54982 - Posted 7 Aug 2008 15:48:49 UTC

Task reported too late to validate ?????

Here are two tasks submitted by a new machine I just brought up. It's a slightly older 2.34 GLIBC machine using 5.8.16 for a client (All the newer ones want GLIBC 2.4+)

The task was obtained, crunched and submitted the next day without any abort or error indicated in the STDOUT.. But with 0.00 credit ??? (The machine is only crunching Rosetta at this point).

What is up with this ?? ... Could it be the machine ?? This is the first time I have seen this one... (and ... yes, the clocks are synchronized by NTP, so TOD should not be an issue)

http://boinc.bakerlab.org/rosetta/result.php?resultid=182943687

stderr out

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 43200
======================================================
DONE :: 1 starting structures 43113 cpu seconds
This process generated 84 decoys from 84 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

Validate state Task was reported too late to validate
Claimed credit 155.951746654641
Granted credit 0
application version 1.32

http://boinc.bakerlab.org/rosetta/result.php?resultid=182943208

stderr out

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 43200
======================================================
DONE :: 1 starting structures 42875.1 cpu seconds
This process generated 123 decoys from 123 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

Validate state Task was reported too late to validate
Claimed credit 155.091236560251
Granted credit 0
application version 1.32



____________
Looking for a team ??? Join BoincSynergy!!


Jim Profile

Joined: Oct 15 06
Posts: 22
ID: 119359
Credit: 5,127,547
RAC: 0
Message 54986 - Posted 7 Aug 2008 18:48:03 UTC - in response to Message ID 54936.

Please post bugs/issues with minirosetta v1.32 here.


Had the first one I got fail.
ERROR: Cannot find file 'minirosetta_database\chemical/residue_type_sets/centroid/CSD_ATOM_TYPE_SET fa_standard'
ERROR:: Exit from: ..\..\src\core\chemical\residue_io.cc line: 132
called boinc_finish

It is WU ID 165103113

____________

netwraith Profile
Avatar

Joined: Sep 3 06
Posts: 80
ID: 109740
Credit: 13,483,227
RAC: 0
Message 54988 - Posted 7 Aug 2008 21:05:11 UTC - in response to Message ID 54982.

Task reported too late to validate ?????

Here are two tasks submitted by a new machine I just brought up. It's a slightly older 2.34 GLIBC machine using 5.8.16 for a client (All the newer ones want GLIBC 2.4+)

The task was obtained, crunched and submitted the next day without any abort or error indicated in the STDOUT.. But with 0.00 credit ??? (The machine is only crunching Rosetta at this point).

What is up with this ?? ... Could it be the machine ?? This is the first time I have seen this one... (and ... yes, the clocks are synchronized by NTP, so TOD should not be an issue)

http://boinc.bakerlab.org/rosetta/result.php?resultid=182943687

stderr out

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 43200
======================================================
DONE :: 1 starting structures 43113 cpu seconds
This process generated 84 decoys from 84 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

Validate state Task was reported too late to validate
Claimed credit 155.951746654641
Granted credit 0
application version 1.32

http://boinc.bakerlab.org/rosetta/result.php?resultid=182943208

stderr out

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 43200
======================================================
DONE :: 1 starting structures 42875.1 cpu seconds
This process generated 123 decoys from 123 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

Validate state Task was reported too late to validate
Claimed credit 155.091236560251
Granted credit 0
application version 1.32




found the problem for this one.... copied directory from another cruncher and forgot to dump the client_state.... my bad...

____________
Looking for a team ??? Join BoincSynergy!!


David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 961
ID: 14
Credit: 2,369,109
RAC: 1,381
Message 54990 - Posted 7 Aug 2008 22:41:33 UTC - in response to Message ID 54986.

Please post bugs/issues with minirosetta v1.32 here.


Had the first one I got fail.
ERROR: Cannot find file 'minirosetta_database\chemical/residue_type_sets/centroid/CSD_ATOM_TYPE_SET fa_standard'
ERROR:: Exit from: ..\..\src\core\chemical\residue_io.cc line: 132
called boinc_finish

It is WU ID 165103113



You can ignore these quick failures. They are old jobs that are getting reissued due to errors or past deadlines. These old jobs will fail because they use an incompatible database. There shouldn't be too many of these so please just ignore them.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55002 - Posted 8 Aug 2008 18:31:46 UTC

are there no new 1.32 work units?
i only got 2 so far and the rest are all 5.98
last download of 1.32 was the 6th if the report date is the 16th?

BrnmccO1

Joined: Jun 26 07
Posts: 17
ID: 186323
Credit: 578,825
RAC: 0
Message 55013 - Posted 10 Aug 2008 5:17:04 UTC

Hey David, thanks for clearing that up for us, I was about to say "Is there some files missing from the Database?"

Anyhow, just out of curiosity, I checked the Projects folder on all my machines, and it appears that the Mini-Rosetta rev 1.28 executable is gone but the old Database, "minirosetta_database_rev23035" is still there. The new 23513 is there as well. Will there be any more use for the 23035 that went with 1.28? Or is it safe to delete it?

Thanks, keep up the good work guys!

P.S. When is/does CASP8 wrap up?
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55020 - Posted 10 Aug 2008 14:13:42 UTC
Last modified: 10 Aug 2008 14:25:25 UTC

now using boinc mgr 6.2.16 and had to restart the project as folders and files from the old 5.10.45 were not being picked up including tasks.

The errors I post here are from a new batch of work:

http://boinc.bakerlab.org/rosetta/result.php?resultid=183864069
http://boinc.bakerlab.org/rosetta/result.php?resultid=183864068
http://boinc.bakerlab.org/rosetta/result.php?resultid=183862855
http://boinc.bakerlab.org/rosetta/result.php?resultid=183862854

there are many more errors from various 1.32 tasks that would take to much time to post, but the error message is more or less the same....

<core_client_version>6.2.16</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: in::file::zip minirosetta_database_rev23035.zip does not exist!
ERROR:: Exit from: ..\..\src\apps\public\boinc\minirosetta.cc line: 105
called boinc_finish
</stderr_txt>


here is another random sample from over a total of 30+ failed work units between 5.98 and 1.32

http://boinc.bakerlab.org/rosetta/result.php?resultid=183857588
core_client_version>6.2.16</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR: Option file open failed for: 062408_bblum_abrelax_flags

</stderr_txt>
]]>

0 secs run time

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 521,019
RAC: 10
Message 55050 - Posted 12 Aug 2008 6:22:00 UTC - in response to Message ID 54990.


You can ignore these quick failures. They are old jobs that are getting reissued due to errors or past deadlines. These old jobs will fail because they use an incompatible database. There shouldn't be too many of these so please just ignore them.

What's up with this task?

Task ID 184003550
ERROR: Cannot find file 'minirosetta_database\chemical/residue_type_sets/fa_standard/CSD_ATOM_TYPE_SET fa_standard'
ERROR:: Exit from: ..\..\src\core\chemical\residue_io.cc line: 132
called boinc_finish

Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Validate state Invalid
Yet the wing man has completed it good as gold.
Is this normal?
Speedy
____________
Have a crunching good day!!

Sue

Joined: Aug 11 08
Posts: 1
ID: 273242
Credit: 146
RAC: 0
Message 55075 - Posted 14 Aug 2008 0:32:25 UTC

I had 3 that wouldn't run, and it crashed my boinc. I don't have a log since when it closed it erased it. I have no more work units left to be processed.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 961
ID: 14
Credit: 2,369,109
RAC: 1,381
Message 55078 - Posted 14 Aug 2008 4:27:31 UTC - in response to Message ID 55050.


You can ignore these quick failures. They are old jobs that are getting reissued due to errors or past deadlines. These old jobs will fail because they use an incompatible database. There shouldn't be too many of these so please just ignore them.

What's up with this task?

Task ID 184003550
ERROR: Cannot find file 'minirosetta_database\chemical/residue_type_sets/fa_standard/CSD_ATOM_TYPE_SET fa_standard'
ERROR:: Exit from: ..\..\src\core\chemical\residue_io.cc line: 132
called boinc_finish

Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Validate state Invalid
Yet the wing man has completed it good as gold.
Is this normal?
Speedy



The success was from version 1.28. 1.32 versions will fail for these older tasks , few of which are still lingering around.

BrnmccO1

Joined: Jun 26 07
Posts: 17
ID: 186323
Credit: 578,825
RAC: 0
Message 55085 - Posted 14 Aug 2008 20:41:25 UTC
Last modified: 14 Aug 2008 20:42:00 UTC

Here's another Rosetta Mini unhandled exception error. I got quite a lot of these from 1.28, and this is the only one I have got from ver 1.32 so far. It's always on just the one computer, and always from the Rosetta Mini's. I've never gotten one of these errors ever from 5.82 or 5.98 on any system. Here's a link to the failed WU:

184522226
____________

BrnmccO1

Joined: Jun 26 07
Posts: 17
ID: 186323
Credit: 578,825
RAC: 0
Message 55090 - Posted 15 Aug 2008 20:53:37 UTC

Another one bites the dust: 184906376

What causes this access violation?? And again, it only happens with the Mini's, not Beta.
____________

eaglezero

Joined: Feb 27 08
Posts: 1
ID: 244294
Credit: 7,826
RAC: 0
Message 55094 - Posted 16 Aug 2008 6:11:38 UTC

Every morning for the last week, I find the computer frozen, and the mini rosetta on the task bar. I have detached from the project, to see what happens with the other projects.. just for info.

Max DesGeorges

Joined: Oct 1 05
Posts: 35
ID: 2201
Credit: 942,527
RAC: 0
Message 55125 - Posted 17 Aug 2008 9:22:30 UTC - in response to Message ID 55094.

When I close the graphics window, Vista tell me that there is an error with "minirosetta_graphics_1.30_windows_intelx86...". Even so the WU still running normally.

Windows Vista 32, BOINC 6.3.8.
____________

OneMeanSloth

Joined: Jan 1 08
Posts: 1
ID: 231600
Credit: 109,239
RAC: 0
Message 55141 - Posted 17 Aug 2008 18:18:53 UTC - in response to Message ID 55094.

Every morning for the last week, I find the computer frozen, and the mini rosetta on the task bar. I have detached from the project, to see what happens with the other projects.. just for info.



I am having this same issue.

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 55155 - Posted 18 Aug 2008 11:20:54 UTC

This was a strange one. 185494746
The library was activated and an email was sent off. The work unit stopped responding. Boinc had to be shut down and restarted at which point the unit went to 100% and reported.

The stderr txt showed 'needs psipred_ss2 to run filters'
access violation

Despite this it was reported to be a success


____________

Philosopher2

Joined: Mar 28 06
Posts: 3
ID: 69257
Credit: 111,037
RAC: 0
Message 55156 - Posted 18 Aug 2008 11:43:06 UTC - in response to Message ID 54936.

Please post bugs/issues with minirosetta v1.32 here.


I downloaded and installed v1.32 and the wu came up!

WU 2reb_JUMPRELAX_PREDJUMP_FROMPREDFRAG_SAVE_ALL_OUT-2re_-_4420_740_0 has been running.

This Wu completed 95 per cent of processing in approx 3 to 4 hours.

From 95.360 percent I have observed that it has taken 9 hours of processing to move upto 98.809 percent!

This Wu is targeted to complete on 8/22/08 ! at this rate of progress I wonder if it will!

IS this the predicted behaviour ?

The time to completion has moved from 00:09.51 to 00:09:54 during these last five days.

I am running two other BOINC applications, hence time is available sequentially for 50 minutes to each application.

Please advise - should I abort or let it carry on till the -whatever ?

____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55157 - Posted 18 Aug 2008 12:26:57 UTC

Philosopher2

It all sounded normal up until you said it has been running for 5 days. But since you are running other projects, we would need to look at the CPU time used to understand how much of that time this task was actually running.

You see the initial runtime is just an estimate. But the actual is related to the runtime preference in your Rosetta preferences (3hrs is the default). At this is a target, not a hard and fast limit. Rosetta will do it's best to complete within that target time if possible. But it is not always possible. In cases where it is not possible to complete in your desired runtime, the time estimate will get down to about 11 minutes and then move exponentially slower until it completes and jumps to 100%.

There is a watchdog thread that will check in on the tasks every 15 minutes or so and see if it thinks things are running normally or not.

I suggest the following:

If that task has 5 or more days of CPU time (120 hours), which is pretty unlikely, then abort it. If it has run for 9 hours, let it run.

If not, take a look at what your Rosetta preferences have configured for the venue of that PC, and post back here with the details of both your preference, and the actual CPU time you see for the task. The watchdog task will end the task if it runs longer then 5 times your runtime preference. So that would be 15 hours with the defaults. You should follow the same guideline.

I would also suggest you review your computing preferences and check the box to keep tasks in memory while suspended. Since you are switching projects every 50 minutes, you will be losing a lot of work if you do not keep the tasks in memory.

Oh, and you shouldn't have to download anything manually. You said you downloaded 1.32. I wasn't positive if you meant that you did this manually, or if BOINC did this during the normal file transfers.
____________
Rosetta Moderator: Mod.Sense

lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 55173 - Posted 19 Aug 2008 5:46:02 UTC - in response to Message ID 54936.
Last modified: 19 Aug 2008 5:52:15 UTC

Please post bugs/issues with minirosetta v1.32 here.


Debian Linux;Boinc Manager 6.2.14

errors from 1.32 tasks

http://boinc.bakerlab.org/rosetta/result.php?resultid=185499939
http://boinc.bakerlab.org/rosetta/result.php?resultid=185493038
http://boinc.bakerlab.org/rosetta/result.php?resultid=185493027
http://boinc.bakerlab.org/rosetta/result.php?resultid=185493026
http://boinc.bakerlab.org/rosetta/result.php?resultid=185489375


<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
SIGSEGV: segmentation violation
Stack trace (19 frames):
[0x8926f8f]
[0x89514e0]
[0xb7f19400]
[0x880c924]
[0x834349c]
[0x88bcc81]
[0x880c5d6]
[0x829591c]
[0x85f20f6]
[0x8072b5d]
[0x807e2e7]
[0x8165bee]
[0x80abecc]
[0x80a9ea4]
[0x80d7044]
[0x80d8651]
[0x804b9f8]
[0x89acfdc]
[0x8048111]

Exiting...

</stderr_txt>
]]>
____________

Philosopher2

Joined: Mar 28 06
Posts: 3
ID: 69257
Credit: 111,037
RAC: 0
Message 55174 - Posted 19 Aug 2008 5:59:36 UTC - in response to Message ID 55157.
Last modified: 19 Aug 2008 6:33:19 UTC

Thank you Moderator.
I have changed the prefernce to 120 minutes per application run.
The application will remain in memory as suggested.
This WU has already been running (CPU time) for 15 hours and it is only 98.900 done!
Should I let it go on - I am a bit curious whether it will complete by 22 Aug?
Take care.


Philosopher2

It all sounded normal up until you said it has been running for 5 days. But since you are running other projects, we would need to look at the CPU time used to understand how much of that time this task was actually running.

You see the initial runtime is just an estimate. But the actual is related to the runtime preference in your Rosetta preferences (3hrs is the default). At this is a target, not a hard and fast limit. Rosetta will do it's best to complete within that target time if possible. But it is not always possible. In cases where it is not possible to complete in your desired runtime, the time estimate will get down to about 11 minutes and then move exponentially slower until it completes and jumps to 100%.

There is a watchdog thread that will check in on the tasks every 15 minutes or so and see if it thinks things are running normally or not.

I suggest the following:

If that task has 5 or more days of CPU time (120 hours), which is pretty unlikely, then abort it. If it has run for 9 hours, let it run.

If not, take a look at what your Rosetta preferences have configured for the venue of that PC, and post back here with the details of both your preference, and the actual CPU time you see for the task. The watchdog task will end the task if it runs longer then 5 times your runtime preference. So that would be 15 hours with the defaults. You should follow the same guideline.

I would also suggest you review your computing preferences and check the box to keep tasks in memory while suspended. Since you are switching projects every 50 minutes, you will be losing a lot of work if you do not keep the tasks in memory.

Oh, and you shouldn't have to download anything manually. You said you downloaded 1.32. I wasn't positive if you meant that you did this manually, or if BOINC did this during the normal file transfers.

____________

joergent

Joined: Feb 17 08
Posts: 1
ID: 242433
Credit: 32,031
RAC: 0
Message 55180 - Posted 19 Aug 2008 16:34:40 UTC - in response to Message ID 55094.

Every morning for the last week, I find the computer frozen, and the mini rosetta on the task bar. I have detached from the project, to see what happens with the other projects.. just for info.



Add to this, that every time my screen saver is running with Minirosetta and the screen has turned black, the PC (windows XP SP3) cannot be returned to its previous state. The screen with rosetta appears, but mouse is not working and I can barely use the keyboard to shut down the PC, which goes very slowly just until the rosetta is killed.

Rosetta has been disabled on my PC !!!

Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55182 - Posted 20 Aug 2008 3:18:51 UTC

I've been having a lot of WU errors with mini rosetta ever since version 1.28 and now have had 5 in the past two days. Here's the link to my WUs showing the recent falures: http://boinc.bakerlab.org/rosetta/results.php?userid=254884

I've also had to do a hard shutdown and reboot several times recently after the RAH screen saver apparently locked up the machine. Not sure if v1.32 or 5.98 was running at the time. This seemed to happen after I changed the options setting so the screen would go to black after a few minutes. I undid the setting and have had no application crashes since then, but I'm not totally sure the crashes were due to that change.

David Emigh Profile
Avatar

Joined: Mar 13 06
Posts: 158
ID: 65176
Credit: 417,178
RAC: 0
Message 55183 - Posted 20 Aug 2008 4:07:10 UTC - in response to Message ID 55182.

{...}
I've also had to do a hard shutdown and reboot several times recently after the RAH screen saver apparently locked up the machine. {...}


There is a workaround for this:

Ctrl + Shift + Esc to force the Task Manager, then carefully move the mouse around until you can "find" it in the Task Manager window, at which point you can kill the screensaver process without having to shutdown the computer. As always, YMMV, but I've not lost any crunching time using this method.

But the simplest solution is to not use the BOINC screensaver...

____________
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!

Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55188 - Posted 20 Aug 2008 6:27:13 UTC - in response to Message ID 55183.
Last modified: 20 Aug 2008 6:27:51 UTC



But the simplest solution is to not use the BOINC screensaver...


I disabled the screensaver and will see if this minimizes the WUs crashing as well. Thanks for the info.

Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55196 - Posted 21 Aug 2008 3:53:17 UTC
Last modified: 21 Aug 2008 4:01:07 UTC

I decided to disable the BOINC screensaver yesterday to see if that would have any affect on the number of failed WUs, but it didn't seem to make any difference. Two more failed today:
http://boinc.bakerlab.org/rosetta/result.php?resultid=186050178
http://boinc.bakerlab.org/rosetta/result.php?resultid=186005807

Jim Wilkins

Joined: Feb 5 08
Posts: 1
ID: 240244
Credit: 4,513
RAC: 0
Message 55201 - Posted 21 Aug 2008 13:19:56 UTC
Last modified: 21 Aug 2008 13:20:15 UTC

I successfully completed a 1.32 run but had a lot of this message in my stderr file:

needs psipred_ss2 to run filters

Is that a problem?
____________
Thanks,
Jim

AdeB Profile
Avatar

Joined: Dec 12 06
Posts: 45
ID: 135244
Credit: 2,473,178
RAC: 1,976
Message 55207 - Posted 21 Aug 2008 18:26:55 UTC

Compute error in this workunit.

stderr out:
<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# cpu_run_time_pref: 43200

ERROR: NANs occured in hbonding!
ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>

____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55219 - Posted 22 Aug 2008 3:12:05 UTC

The graphic is freezing, meaning, I assume, that the WU is a dead fish. I am getting this on three machines so far. I have to abort too often.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55222 - Posted 22 Aug 2008 8:14:04 UTC

Just found this one bombed in my results
http://boinc.bakerlab.org/rosetta/result.php?resultid=184820750
1ughI_BOINC_ABINITIO_IGNORE_THE_REST-S25-13-S3-3--1ughI-_4309_84_1
Exit status 1 (0x1)
CPU time 1.328125
stderr out <core_client_version>6.2.16</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: Cannot find file 'minirosetta_database\chemical/residue_type_sets/fa_standard/CSD_ATOM_TYPE_SET fa_standard'
ERROR:: Exit from: ..\..\src\core\chemical\residue_io.cc line: 132
called boinc_finish

</stderr_txt>
]]>

David Emigh Profile
Avatar

Joined: Mar 13 06
Posts: 158
ID: 65176
Credit: 417,178
RAC: 0
Message 55226 - Posted 22 Aug 2008 13:44:49 UTC - in response to Message ID 55219.
Last modified: 22 Aug 2008 13:46:50 UTC

The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...}


Not necessarily. Try this workaround.
____________
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!

lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 55233 - Posted 23 Aug 2008 7:37:11 UTC

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)

I observed that the rosetta model I was processing failed with this error after a ntp daemon resynch on my linux mashine.
System clock, when adjusted on a routine resynch, caused the running model to fail because its understanding of time steps changed outside of the model
I temporarily stop ntp daemon and not see this error.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55234 - Posted 23 Aug 2008 10:53:01 UTC - in response to Message ID 55233.

I temporarily stop ntp daemon and not see this error.


Have you seen it fail during a resynch before? Consistently?

Am I correct to presume that if the resynch did not cause a change to the clock, then there is no problem?

Do you have any perspective on whether resynch caused failures on older BOINC releases?

Which Linux distribution are you running?

Do you configure the machine to run at 100% of CPU? Or less?
____________
Rosetta Moderator: Mod.Sense

lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 55237 - Posted 23 Aug 2008 13:40:15 UTC - in response to Message ID 55234.

I temporarily stop ntp daemon and not see this error.


Have you seen it fail during a resynch before? Consistently?

Am I correct to presume that if the resynch did not cause a change to the clock, then there is no problem?

Do you have any perspective on whether resynch caused failures on older BOINC releases?

Which Linux distribution are you running?

Do you configure the machine to run at 100% of CPU? Or less?


Thanks and sorry for my english, its not my own language.

Debian Linux
Kernel is 2.6.26-1-686 SMP

Boinc Manager 6.2.14
Machine configure to run at 100 % CPU

In linux system log ..... ntp time change

Aug 23 03:09:12 alpha ntpd[13389]: time reset -0.175490 s
Aug 23 03:09:33 alpha ntpd[13389]: synchronized to 77.234.200.98, stratum 4
Aug 23 03:10:29 alpha ntpd[13389]: synchronized to 87.236.24.179, stratum 2

...and in BOINC stderr.txt at this time (i'm set task_debug on) ....

23-Aug-2008 03:07:27 [rosetta@home] Started download of boinc_homfrags_aa1pxuA03_05.200_v1_3.gz
23-Aug-2008 03:08:05 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:08:44 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:08:45 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:09:23 [rosetta@home] [task_debug] result abinitio_only62_A_2chf__4434_6914_0 checkpointed
23-Aug-2008 03:09:38 [rosetta@home] [task_debug] result abinitio_homfrag_71_A_2hboA_4443_1214_0 checkpointed
23-Aug-2008 03:09:52 [rosetta@home] Finished download of boinc_homfrags_aa1pxuA03_05.200_v1_3.gz
23-Aug-2008 03:09:52 [rosetta@home] Started download of boinc_homfrags_aa1pxuA09_05.200_v1_3.gz
23-Aug-2008 03:10:32 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:10:44 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:10:56 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:11:28 [rosetta@home] Sending scheduler request: To fetch work. Requesting 3081 seconds of work, reporting 0 completed tasks
23-Aug-2008 03:11:31 [rosetta@home] [task_debug] result abinitio_only62_A_2chf__4434_6914_0 checkpointed
23-Aug-2008 03:11:33 [rosetta@home] Scheduler request succeeded: got 1 new tasks
23-Aug-2008 03:11:33 [rosetta@home] [task_debug] result state=NEW for abinitio_only62_A_1ptq__4438_5437_0 from handle_scheduler_reply
23-Aug-2008 03:11:34 [rosetta@home] [task_debug] result state=FILES_DOWNLOADING for abinitio_only62_A_1ptq__4438_5437_0 from CS::update_results
23-Aug-2008 03:12:00 [rosetta@home] [task_debug] result abinitio_homfrag_71_A_2hboA_4443_1214_0 checkpointed
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] Process for abinitio_only62_A_2chf__4434_6914_0 exited
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] task_state=EXITED for abinitio_only62_A_2chf__4434_6914_0 from handle_exited_app
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_2chf__4434_6914_0 from CS::report_result_error
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:11 [rosetta@home] Computation for task abinitio_only62_A_2chf__4434_6914_0 finished
23-Aug-2008 03:12:11 [rosetta@home] Output file abinitio_only62_A_2chf__4434_6914_0_0 for task abinitio_only62_A_2chf__4434_6914_0 absent
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_2chf__4434_6914_0 from CS::app_finished
23-Aug-2008 03:12:11 [rosetta@home] Starting abinitio_only62_A_1pgx__4438_2667_0
23-Aug-2008 03:12:12 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4030
23-Aug-2008 03:12:12 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_only62_A_1pgx__4438_2667_0 from start
23-Aug-2008 03:12:12 [rosetta@home] Starting task abinitio_only62_A_1pgx__4438_2667_0 using minirosetta version 132
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] Process for abinitio_homfrag_71_A_2hboA_4443_1214_0 exited
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] task_state=EXITED for abinitio_homfrag_71_A_2hboA_4443_1214_0 from handle_exited_app
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hboA_4443_1214_0 from CS::report_result_error
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:13 [rosetta@home] Computation for task abinitio_homfrag_71_A_2hboA_4443_1214_0 finished
23-Aug-2008 03:12:13 [rosetta@home] Output file abinitio_homfrag_71_A_2hboA_4443_1214_0_0 for task abinitio_homfrag_71_A_2hboA_4443_1214_0 absent
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hboA_4443_1214_0 from CS::app_finished
23-Aug-2008 03:12:13 [rosetta@home] Starting abinitio_homfrag_71_A_2hl7A_4443_1633_0
23-Aug-2008 03:12:13 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4042
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from start
23-Aug-2008 03:12:13 [rosetta@home] Starting task abinitio_homfrag_71_A_2hl7A_4443_1633_0 using minirosetta version 132
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] Process for abinitio_only62_A_1pgx__4438_2667_0 exited
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] task_state=EXITED for abinitio_only62_A_1pgx__4438_2667_0 from handle_exited_app
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_1pgx__4438_2667_0 from CS::report_result_error
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:17 [rosetta@home] Computation for task abinitio_only62_A_1pgx__4438_2667_0 finished
23-Aug-2008 03:12:17 [rosetta@home] Output file abinitio_only62_A_1pgx__4438_2667_0_0 for task abinitio_only62_A_1pgx__4438_2667_0 absent
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_1pgx__4438_2667_0 from CS::app_finished
23-Aug-2008 03:12:17 [rosetta@home] Starting abinitio_only62_A_1cc8A_4438_3695_0
23-Aug-2008 03:12:18 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4061
23-Aug-2008 03:12:18 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_only62_A_1cc8A_4438_3695_0 from start
23-Aug-2008 03:12:18 [rosetta@home] Starting task abinitio_only62_A_1cc8A_4438_3695_0 using minirosetta version 132
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] Process for abinitio_homfrag_71_A_2hl7A_4443_1633_0 exited
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] task_state=EXITED for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from handle_exited_app
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from CS::report_result_error
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] exit status 193

I see my old rosetta result (my own stats) - in BOINC 5.10.45 and 5.96 rosetta client - 4 errors in month
After upgrading BOINC to version 6.2.X and new minirosetta app i see many more errors if ntp is on ....
Stop ntp damon - all works fine without error.
I try manually run ntpdate ( not daemon, only once sync with time server) - after sync, two workunuts fails and then works again without error.
I run rosetta 3 years ago and i do not know in what a problem in my system cause it - kernel, boinc manger, science app or ntp.

____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55240 - Posted 23 Aug 2008 17:18:41 UTC
Last modified: 23 Aug 2008 17:20:23 UTC

lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?
____________
Rosetta Moderator: Mod.Sense

lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 55243 - Posted 23 Aug 2008 17:41:40 UTC - in response to Message ID 55240.

lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?


I crunch rosetta@home only, but for experiment and resolving this problem
i will try to be connected to other project for linux platforms and inform results after 1-2 days. Thanks you.
____________

lusvladimir

Joined: Oct 18 05
Posts: 12
ID: 5401
Credit: 1,784,854
RAC: 0
Message 55258 - Posted 24 Aug 2008 10:27:29 UTC - in response to Message ID 55240.

lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?


Mod.Sense, thank you for advice about negative time!!!
I read more manual about time synchronization and I was able to tune my system so that the time shift was very very small (millisecons per several hours) and still positive.
NTP daemon now do not need to synchronize the time often, adn rosetta workunits work without errors.
I did not replicate the error on another project (Einstein @ Home), but too little time has passed.
I will continue to monitor the state of the system and in case of errors will announce their way to reproduce.
____________

BeemerBiker Profile
Avatar

Joined: May 7 07
Posts: 14
ID: 174398
Credit: 1,430,883
RAC: 1,600
Message 55261 - Posted 24 Aug 2008 13:35:11 UTC

Seeing segfaults periodically but all tasks seem to finish OK. How is that?

All results returned seem valid except some messages about not having a filter
http://boinc.bakerlab.org/rosetta/results.php?hostid=874157

I am running boinc 6.2.15 amd64 on ubuntu 8.0.4.1 and looking in the logs I see a bunch of seg faults

jstateson@jyslinux3:/var/log$ grep -i segfault kern.log
kern.log:Aug 22 09:33:52 jyslinux3 kernel: [53889.135133] minirosetta_1.3[7041]: segfault at ff3fbff8 rip 89bd380 rsp ff3fbed8 error 6
kern.log:Aug 22 17:41:43 jyslinux3 kernel: [83133.993543] minirosetta_1.3[7372]: segfault at ff3fbff8 rip 89bd380 rsp ff3fbed8 error 6
kern.log:Aug 23 18:24:55 jyslinux3 kernel: [75871.459737] minirosetta_1.3[20077]: segfault at ff5fbff8 rip 89bd380 rsp ff5fbed8 error 6
kern.log:Aug 23 18:24:55 jyslinux3 kernel: [75871.559667] minirosetta_1.3[19621]: segfault at ff5fbff8 rip 89bd380 rsp ff5fbed8 error 6

I switch to boinc 6.2.15 from 5.15.45 after result time 21 Aug 2008 22:30:58
and those 4 segfaults occured afterwards. Since all results were returned and all were valid I am unsure what effect the segfaults had.



____________
[img]http://www.boincstats.com/signature/user_610944.gif[/img

Gray Handcock

Joined: Sep 26 05
Posts: 14
ID: 1255
Credit: 1,242,864
RAC: 0
Message 55263 - Posted 24 Aug 2008 17:52:29 UTC

hi

thus far what I have processed has been rated a success.

I am vaguely curious as to why a discrepancy between claimed credit and granted credit - I mean this in a spirit of enquiry only, as I am not in a position to compete with anyone: I crunch here and there as and when I can... :(

187039158 170829845 24 Aug 2008 14:40:04 UTC 24 Aug 2008 17:24:10 UTC Over Success Done 6,949.45 16.82 7.76
186859960 170670039 23 Aug 2008 20:48:21 UTC 24 Aug 2008 16:18:07 UTC Over Success Done 7,501.63 18.15 6.32
186500690 170342018 22 Aug 2008 8:35:39 UTC 24 Aug 2008 13:43:03 UTC Over Success Done 9,431.64 22.82 8.09
185665231 169581780 18 Aug 2008 22:28:03 UTC 22 Aug 2008 20:22:17 UTC Over Success Done 6,588.55 15.76 5.32
185659319 169576641 18 Aug 2008 21:46:50 UTC 19 Aug 2008 9:17:56 UTC Over Success Done 7,270.64 17.39 6.88

I underline again: this is NOT a major issue - I am merely curious.

I am running on a winXP box with SP3 and Boinc Manager 6.3.10

Gray
____________

Storeytime

Joined: Oct 10 06
Posts: 2
ID: 117574
Credit: 2,207,638
RAC: 0
Message 55272 - Posted 25 Aug 2008 2:34:11 UTC

Approximately 3 out of every 4 work units ends in computation error. Its gotta to the point where I only have a 4 Daily WU quota. I have attached to other projects with no problems its getting annoying. some computers lockup.
____________

Pilgrim57

Joined: Jul 31 08
Posts: 2
ID: 271620
Credit: 1,602,807
RAC: 502
Message 55276 - Posted 25 Aug 2008 8:06:22 UTC

I am getting this reported in 1.32 finished WUs
"needs psipred_ss2 to run filters" also only get about 2/3 to 3/4 of claimed credit for all mini rosetta WUs!
186967182

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55280 - Posted 25 Aug 2008 10:22:29 UTC
Last modified: 25 Aug 2008 10:24:32 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=186255800
http://boinc.bakerlab.org/rosetta/result.php?resultid=186219940
http://boinc.bakerlab.org/rosetta/result.php?resultid=186175420
http://boinc.bakerlab.org/rosetta/result.php?resultid=186135295
http://boinc.bakerlab.org/rosetta/result.php?resultid=186097857
http://boinc.bakerlab.org/rosetta/result.php?resultid=186051535

abinitio_only62_A_1tif__4438_3601_0
abinitio_only62_A_1louA_4438_3430_0
abinitio_only62_A_1fna__4438_2176_0
abinitio_homfrag_71_A_1mjcA_4443_534_0
abinitio_only62_A_1iibA_4434_6107_0
abinitio_only62_A_1louA_4434_4589_0

this shows the same thing as Pilgrim57's machine (needs psipred_ss2 to run filters)
full run time,granted credit is above claimed like normal and no other errors

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55286 - Posted 25 Aug 2008 17:51:48 UTC - in response to Message ID 55280.

http://boinc.bakerlab.org/rosetta/result.php?resultid=186255800
http://boinc.bakerlab.org/rosetta/result.php?resultid=186219940
http://boinc.bakerlab.org/rosetta/result.php?resultid=186175420
http://boinc.bakerlab.org/rosetta/result.php?resultid=186135295
http://boinc.bakerlab.org/rosetta/result.php?resultid=186097857
http://boinc.bakerlab.org/rosetta/result.php?resultid=186051535

abinitio_only62_A_1tif__4438_3601_0
abinitio_only62_A_1louA_4438_3430_0
abinitio_only62_A_1fna__4438_2176_0
abinitio_homfrag_71_A_1mjcA_4443_534_0
abinitio_only62_A_1iibA_4434_6107_0
abinitio_only62_A_1louA_4434_4589_0

this shows the same thing as Pilgrim57's machine (needs psipred_ss2 to run filters)
full run time,granted credit is above claimed like normal and no other errors



Now the following show the same message
Task ID 186291840
Name abinitio_homfrag_71_A_1tzaA_4443_1287_0

Task ID 186332312
Name abinitio_homfrag_71_A_2hx5A_4443_1477_0

same result,full run time and perfect credit

Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55292 - Posted 26 Aug 2008 1:56:36 UTC

Can someone take a look at my failed work units (link below) and tell me if there is any more debugging info I can provide to help fix the problems I've been having with the rosetta mini WUs? Is there a problem with the WUs themselves, or my machine or the mini app? Almost every one I've received in the past month or two has failed immediately. I don't think I've had any failures on the non-mini WUs nor on the Seti WUs I process as well.

http://boinc.bakerlab.org/rosetta/results.php?userid=254884

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 55293 - Posted 26 Aug 2008 3:06:53 UTC - in response to Message ID 55292.

Can someone take a look at my failed work units (link below) and tell me if there is any more debugging info I can provide to help fix the problems I've been having with the rosetta mini WUs? Is there a problem with the WUs themselves, or my machine or the mini app? Almost every one I've received in the past month or two has failed immediately. I don't think I've had any failures on the non-mini WUs nor on the Seti WUs I process as well.

http://boinc.bakerlab.org/rosetta/results.php?userid=254884


G'day Terrasapiens.

Looking at your tasks couple things i can think of.

1/ You have 1 Gig of ram shared between two cores could be a problem

with some tasks, has been known to cause problems for some people.

2/ Are you using onboard graphics because the error code looks like it

could be a hardware conflict.

3/ If not no:2 can you run tests on your ram or try some other sticks.

pete






____________


Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55294 - Posted 26 Aug 2008 3:48:35 UTC - in response to Message ID 55293.

G'day Terrasapiens.

Looking at your tasks couple things i can think of.

1/ You have 1 Gig of ram shared between two cores could be a problem

with some tasks, has been known to cause problems for some people.

2/ Are you using onboard graphics because the error code looks like it

could be a hardware conflict.

3/ If not no:2 can you run tests on your ram or try some other sticks.

pete


Pete, as far as graphics I have an ATI All-In-Wonder series w/ 128Mb of RAM. I didn't seem to have any problem with rosetta WUs crashing until v1.28 came out. Since then only the mini rosetta ones crash. All else runs fine. I had 2 RAM issues a while back with the machine that caused it to to randomly reboot. I found out that when I removed and reset the RAM cards everything worked fine. I've had no apps crashing on this machine other than the MR. Maybe sometime in the next couple of days I'll try removing the RAM again and then running the RAM test program I have to see if anything shows up. I don't have other sticks to try. Maybe some time this year the box will get a full gutting and upgrade, but that could be a while.

Thanks




Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55296 - Posted 26 Aug 2008 5:48:01 UTC - in response to Message ID 55292.

Can someone take a look at my failed work units (link below) and tell me if there is any more debugging info I can provide to help fix the problems I've been having with the rosetta mini WUs? Is there a problem with the WUs themselves, or my machine or the mini app? Almost every one I've received in the past month or two has failed immediately. I don't think I've had any failures on the non-mini WUs nor on the Seti WUs I process as well.

http://boinc.bakerlab.org/rosetta/results.php?userid=254884



that link shows 'no access'

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55301 - Posted 26 Aug 2008 12:59:52 UTC - in response to Message ID 55292.

Can someone take a look at my failed work units...


Terrasapiens, I see you are running BOINC 6.2.18. Do have any history running mini on older versions of BOINC? Are you using BOINC as your screensaver?
____________
Rosetta Moderator: Mod.Sense

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55318 - Posted 27 Aug 2008 13:48:45 UTC

I have rather a serious problem that I'd appreciate some help or advice with.

I've just upgraded my computer to an AMD Phenom 9850 Quad Core running Vista 64-bit so I grabbed Boinc Mgr 6.2.18 for Windows 64-bit.

24/08/2008 18:23:59||Starting BOINC client version 6.2.18 for windows_x86_64
24/08/2008 18:23:59||log flags: task, file_xfer, sched_ops
24/08/2008 18:23:59||Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3
24/08/2008 18:23:59||Running as a daemon
24/08/2008 18:23:59||Data directory: C:\ProgramData\BOINC
24/08/2008 18:23:59||Running under account boinc_master
24/08/2008 18:23:59||Processor: 4 AuthenticAMD AMD Phenom(tm) 9850 Quad-Core Processor [AMD64 Family 16 Model 2 Stepping 3]
24/08/2008 18:23:59||Processor features: fpu tsc pae nx sse sse2 pni
24/08/2008 18:23:59||OS: Microsoft Windows Vista: Home Premium x64 Editon, Service Pack 1, (06.00.6001.00)
24/08/2008 18:23:59||Memory: 8.00 GB physical, 16.05 GB virtual
24/08/2008 18:23:59||Disk: 457.85 GB total, 378.16 GB free
24/08/2008 18:23:59||Local time is UTC +1 hours
24/08/2008 18:23:59||No coprocessors
24/08/2008 18:23:59|rosetta@home|URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 878134; location: home; project prefs: default
24/08/2008 18:23:59||General prefs: from rosetta@home (last modified 11-Feb-2008 13:52:58)
24/08/2008 18:23:59||Computer location: home
24/08/2008 18:23:59||General prefs: no separate prefs for home; using your defaults
24/08/2008 18:23:59||Reading preferences override file
24/08/2008 18:23:59||Preferences limit memory usage when active to 4914.23MB
24/08/2008 18:23:59||Preferences limit memory usage when idle to 7780.86MB
24/08/2008 18:23:59||Preferences limit disk usage to 4.66GB

My problem is that 79 of my last 151 WU's (last 4 days only) failed with a "Compute Error" as shown here.

I noticed this a while back but saw the note above that old WUs were in the system, so I've left it for those to clear through.

Examining those WUs that failed, there were zero failures for Rosetta 5.98 files, though I've actually had very few in the last week. All the failures have been with Mini 1.32 - maybe 40% success rate, 60% failure. The majority of these failures come after considerable processing - sometimes even 80% the way through.

Example of some errors:

Task 187626469 and
27/08/2008 11:18:36|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 11:18:36|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 11:19:18|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 11:19:18|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 11:19:18|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132
27/08/2008 11:19:59|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 11:19:59|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 11:20:40|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 11:20:40|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 11:20:40|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132

[And on repeatedly until finally...]

27/08/2008 12:25:39|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 12:25:39|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 12:25:40|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132
27/08/2008 12:26:20|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 12:26:20|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 12:26:20|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132
27/08/2008 12:26:39|rosetta@home|Task abinitio_homfrag_71_A_1prqA_4443_9291_0 exited with zero status but no 'finished' file
27/08/2008 12:26:39|rosetta@home|If this happens repeatedly you may need to reset the project.
27/08/2008 12:27:01|rosetta@home|Computation for task abinitio_homfrag_71_A_2o7kA_4443_8695_0 finished

It's the same story with:
Task 187600159
Task 187437714
Task 187389606

All of them come up with the same errors:

needs psipred_ss2 to run filters and\or
Can't acquire lockfile - exiting

Those that have been "Aborted by User" were where I noticed the progress on tasks had stopped ticking up, saw the same old error messages and just aborted to move on hoping to have more luck with the next WU.

Thing is, some of these tasks have been taken on and completed by others:
Task 187600159 completed on an Intel Quad Core machine running XP SP3
Task 187389606 completed on an Intel Duo running XP SP2

I don't know whether this is an AMD issue, a Vista issue, a 64-bit issue or a MiniRosetta 1.32 issue - or whether it's some flakiness on my own machine. But I also don't know why all Rosetta 5.98 tasks run perfectly and 40% of Mini tasks go through ok.

Any help or advice gratefully received.
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55319 - Posted 27 Aug 2008 13:58:12 UTC - in response to Message ID 55318.
Last modified: 27 Aug 2008 14:11:42 UTC

27/08/2008 11:19:59|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file
27/08/2008 11:19:59|rosetta@home|If this happens repeatedly you may need to reset the project.

One rather obvious note I forgot.

While waiting for those old WUs to pass through, I did go through the process of 'resetting the project' with no discernible improvement.

Edit again:
I just looked at a Mini WU that ran successfully 187651948 and it also came up with several 'needs psipred_ss2 to run filters' errors but not enough to make it fail.
____________

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 55321 - Posted 27 Aug 2008 14:32:42 UTC

needs psipred_ss2 to run filters and\or
I have just run a batch of min 1.32s on Ralph and none of them had the above message.
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55323 - Posted 27 Aug 2008 17:01:04 UTC - in response to Message ID 55321.
Last modified: 27 Aug 2008 17:10:05 UTC

needs psipred_ss2 to run filters

I have just run a batch of min 1.32s on Ralph and none of them had the above message.

Thanks for the feedback, Evan. Appreciated.

I took a look at your setup (hope you don't mind) because I'm concerned it's to do with me having an AMD or Vista or 64bit before it's a Mini 1.32 problem. I see you run an Intel P4 with XP SP2 under Boinc 5.10.20. Maybe that's a difference.

But I looked at your last WU 187113607 at Rosetta and lo and behold you actually did get several "needs psipred_ss2 to run filters" errors, the same as me, but not enough to make the WU fall over - again like some of mine. In fact all your WUs show that error.

Because our machines, OS and software are so very different, this seems to point to it being a Mini 1.32 bug after all.

What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then.
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55328 - Posted 27 Aug 2008 17:51:37 UTC - in response to Message ID 55323.

needs psipred_ss2 to run filters

I have just run a batch of min 1.32s on Ralph and none of them had the above message.

Thanks for the feedback, Evan. Appreciated.

I took a look at your setup (hope you don't mind) because I'm concerned it's to do with me having an AMD or Vista or 64bit before it's a Mini 1.32 problem. I see you run an Intel P4 with XP SP2 under Boinc 5.10.20. Maybe that's a difference.

But I looked at your last WU 187113607 at Rosetta and lo and behold you actually did get several "needs psipred_ss2 to run filters" errors, the same as me, but not enough to make the WU fall over - again like some of mine. In fact all your WUs show that error.

Because our machines, OS and software are so very different, this seems to point to it being a Mini 1.32 bug after all.

What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then.



keep any eye on this thread to see if anyone posts answers.

i am getting alot of those ss2 messages but everything completes normally.

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 55331 - Posted 27 Aug 2008 20:26:56 UTC

I took a look at your setup (hope you don't mind) because I'm concerned it's to do with me having an AMD or Vista or 64bit before it's a Mini 1.32 problem. I see you run an Intel P4 with XP SP2 under Boinc 5.10.20. Maybe that's a difference.


Without any information from the backroom boys I would take a guess that they have been working at a fix.


____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55343 - Posted 28 Aug 2008 2:20:04 UTC - in response to Message ID 55328.

What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then.

keep any eye on this thread to see if anyone posts answers.

I am getting a lot of those ss2 messages but everything completes normally.

I've seen it now. It looks like it's annoying message but not fatal and not related to my new machine\hardware\OS. I'll rest easy on that one.

It's the cause\meaning of the lockfile error then. Let's hope the backroom boys really are working on it and know where to point me (or themselves).

I'd appreciate an acknowledgement, but I guess the issue of the day is the lack of any WUs right now. I'm crunching my last one (a 5.98 one) and it's gone to 6 hours with no sign of completing yet for some reason. Better that than nothing I guess.
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 658
ID: 264600
Credit: 3,743,487
RAC: 7,157
Message 55351 - Posted 28 Aug 2008 12:54:18 UTC - in response to Message ID 55328.
Last modified: 28 Aug 2008 12:56:44 UTC

needs psipred_ss2 to run filters

I have just run a batch of min 1.32s on Ralph and none of them had the above message.

Thanks for the feedback, Evan. Appreciated.

I took a look at your setup (hope you don't mind) because I'm concerned it's to do with me having an AMD or Vista or 64bit before it's a Mini 1.32 problem. I see you run an Intel P4 with XP SP2 under Boinc 5.10.20. Maybe that's a difference.

But I looked at your last WU 187113607 at Rosetta and lo and behold you actually did get several "needs psipred_ss2 to run filters" errors, the same as me, but not enough to make the WU fall over - again like some of mine. In fact all your WUs show that error.

Because our machines, OS and software are so very different, this seems to point to it being a Mini 1.32 bug after all.

What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then.



keep any eye on this thread to see if anyone posts answers.

i am getting alot of those ss2 messages but everything completes normally.


I also get a lot of those ss2 messages, but on an AMD processor using Vista SP1 and BOINC 5.10.45. I haven't seen any of the lockfile messages. I wonder if some of the current workunits are missing the ss2 file since they don't need filtering, but 1.32 doesn't have a way built in to just turn off any attempts to use this file.

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55381 - Posted 29 Aug 2008 13:39:06 UTC - in response to Message ID 55226.

The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...}


Not necessarily. Try this workaround.


David-

Yes, I know, that was what I had been doing and did not want to.

>>RSM

____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55382 - Posted 29 Aug 2008 13:40:27 UTC - in response to Message ID 55226.

The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...}


Not necessarily. Try this workaround.


David-

Yes, I know, that was what I had been doing and did not want to.

>>RSM

____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55384 - Posted 29 Aug 2008 13:44:11 UTC - in response to Message ID 55226.

The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...}


Not necessarily. Try this workaround.


David-

Yes, I know, that was what I had been doing and did not want to.

>>RSM

____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


(_KoDAk_) Profile

Joined: Jul 18 06
Posts: 109
ID: 100677
Credit: 1,859,263
RAC: 0
Message 55402 - Posted 30 Aug 2008 21:30:06 UTC

too many exit(0)s
http://boinc.bakerlab.org/rosetta/result.php?resultid=188137329
http://boinc.bakerlab.org/rosetta/result.php?resultid=187374554
http://boinc.bakerlab.org/rosetta/result.php?resultid=187233250
Watchdog shutting down...
http://boinc.bakerlab.org/rosetta/result.php?resultid=187377874
- exit code -1073741819 (0xc0000005)
http://boinc.bakerlab.org/rosetta/result.php?resultid=185996255
____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55404 - Posted 30 Aug 2008 23:07:49 UTC
Last modified: 30 Aug 2008 23:09:59 UTC

So, here is the deal:

People are detaching from Rosetta, even though it may be the single most important project at BOINC. We have no choice but to endure these problems, or detach, because this mini runs along with what else is going.

Detach this buggy process from the rest of Rosetta, Run it in Ralph@home. Get some volunteers to run it if you can not find the problems in the lab, and let us get back to crunching for Rosetta.

If in fact you get it to run properly, put the news in the RSS feed, so we know it is safe to go back in the water.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


jasonwishart Profile
Avatar

Joined: Nov 7 05
Posts: 14
ID: 9994
Credit: 936,419
RAC: 0
Message 55406 - Posted 31 Aug 2008 3:51:58 UTC

It seems that since the minirosetta 1.32, I not had a completed job on my AMD 3000+ machine, yet it is fine on my other two machines?? All jobs now end, either slowly or quickly, with a "Output file absent" error, see output messages below....

8/31/2008 12:58:36 PM|rosetta@home|Computation for task abinitio_homfrag_71_A_1l6pA_4443_21507_0 finished
8/31/2008 12:58:36 PM|rosetta@home|Output file abinitio_homfrag_71_A_1l6pA_4443_21507_0_0 for task abinitio_homfrag_71_A_1l6pA_4443_21507_0 absent
8/31/2008 1:00:37 PM|rosetta@home|Computation for task abinitio_homfrag_71_A_1zd0A_4443_21518_0 finished
8/31/2008 1:00:37 PM|rosetta@home|Output file abinitio_homfrag_71_A_1zd0A_4443_21518_0_0 for task abinitio_homfrag_71_A_1zd0A_4443_21518_0 absent

This has been going on for a few weeks...thought it was a irregularity and have put up with it. But now it is really getting to me. Is there any way to complete these jobs (future jobs) on this machine, or might I just as well abandon R@H on this machine??
____________
Will work for bandwidth!!

Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55407 - Posted 31 Aug 2008 5:07:49 UTC - in response to Message ID 55301.
Last modified: 31 Aug 2008 5:10:48 UTC

Terrasapiens, I see you are running BOINC 6.2.18. Do have any history running mini on older versions of BOINC? Are you using BOINC as your screensaver?


I think the mini WUs ran fine about two versions prior to the current one but I'm not sure exactly. I do know that the mini rosetta WUs didn't start failing until v1.28. I did turn off the BOINC screen saver a couple of weeks ago but that made no difference in terms of failed WUs.

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 521,019
RAC: 10
Message 55408 - Posted 31 Aug 2008 5:39:51 UTC
Last modified: 31 Aug 2008 5:43:07 UTC

For people having trouble This may help you out hope it helps.
Cheers
Speedy
____________
Have a crunching good day!!

(_KoDAk_) Profile

Joined: Jul 18 06
Posts: 109
ID: 100677
Credit: 1,859,263
RAC: 0
Message 55409 - Posted 31 Aug 2008 6:32:45 UTC

too many exit(0)s
http://boinc.bakerlab.org/rosetta/result.php?resultid=188137329
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55436 - Posted 31 Aug 2008 23:59:44 UTC - in response to Message ID 55402.

What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then.

I also get a lot of those ss2 messages, but on an AMD processor using Vista SP1 and BOINC 5.10.45. I haven't seen any of the lockfile messages. I wonder if some of the current workunits are missing the ss2 file since they don't need filtering, but 1.32 doesn't have a way built in to just turn off any attempts to use this file.

Thanks Robert. It seems you run an AMD Dual Core with Vista 32-bit.

too many exit(0)s
http://boinc.bakerlab.org/rosetta/result.php?resultid=188137329
http://boinc.bakerlab.org/rosetta/result.php?resultid=187374554
http://boinc.bakerlab.org/rosetta/result.php?resultid=187233250
Watchdog shutting down...
http://boinc.bakerlab.org/rosetta/result.php?resultid=187377874
- exit code -1073741819 (0xc0000005)
http://boinc.bakerlab.org/rosetta/result.php?resultid=185996255

Thanks (_KoDAk_). 1 of these is a 5.98 WU, one I can't see, but the other 3 (and the later 1) all show the same "Can't acquire lockfile - exiting" error that I get.

I note you run an Intel Quad core with the 64-bit version of Windows Server Enterprise.

Based on poor evidence of just one example I'd be inclined to look at an incompatibility between MiniRosetta 1.32 and any Windows 64-bit OS as a matter of urgency.

Speedy: I appreciate the suggestion, but a bug thread on v1.32 isn't served by avoiding running it altogether (thought it may well be better sorted out on Ralph, I agree).

It's made worse at the moment by the fact nearly all WUs are 1.32s and hardly any 5.98s. Though, for what little it's worth, my failure rate has reduced from 79/151 (52%) to 50/115 (43%) since my last posts.
____________

Heidi1 Profile
Avatar

Joined: Aug 11 07
Posts: 49
ID: 197494
Credit: 934,993
RAC: 0
Message 55437 - Posted 1 Sep 2008 1:02:46 UTC

I've been having the same screensaver/blackout problem that others have had. I don't know which WU it is, but it definitely is a Mini 1.32, as that's all my machine has right now (it's one of these two: abinitio_homfrag_71_A_2uzrA_4443_23491_0 ; abinitio_homfrag_71_A_2ib0A_4443_23507_0). I used to have BOINC be the Windows screensaver, then I changed it to an actual Windows one when the Mini started doing the black screen thing, then I later changed the screensaver to None when Mini was continuing to produce black screens. Sometimes using Ctrl+Alt+Del got me to where the taskbar would show, sometimes not; another trick I found out about is pressing the key for the Start button (my keyboard has it between the left Ctrl and Alt keys) would also allow me to see the taskbar. One of the open programs on the taskbar at that time is MiniRosetta, and I don't know if that is the screensaver that is trying to kick in. Sometimes the Mini screensaver kicks in way before the Windows screensaver is supposed to start (once the Mini one started after the computer was idle for a whopping 30 seconds!).

I have not paid attention as to what the WU crunching is doing, if it's still running during the black screen or if it's frozen up.
____________

Heidi1 Profile
Avatar

Joined: Aug 11 07
Posts: 49
ID: 197494
Credit: 934,993
RAC: 0
Message 55438 - Posted 1 Sep 2008 5:32:21 UTC

Update: It's also blacking out on this WU: abinitio_homfrag_71_A_2hh6A_4443_24307_0. It seems to be doing it at some of the checkpoints, but not all of them. What's more, the other WU that is running does show graphics okay but not this particular WU (it's just a black screen if it's able to show a separate window at all). And the main black screen issue doesn't care if you're doing anything else, even if you're watching a movie or moving your mouse or typing, it will supercede anything else on the screen. The Windows button has worked each time I've pressed it, a better success rate than Ctrl+Alt+Del. One additional thing of note: when exiting from the black screen and I see the taskbar, the icon for the Mini WU that's on the taskbar is that of "minirosetta_graphics_1.30_windows_intelx86.exe", not a BOINC or generic icon (it's a miniature version of Rosetta's logo).

I hope this helps you fix this bug. It's sure bugging me! :)
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55443 - Posted 1 Sep 2008 11:29:54 UTC

To Mod,

Why has no one from the team or for that matter anyone from RAH in general said anything about these 'needs psipred_ss2 to run filters' errors?

This is what annoys alot of people and causes them to leave, no news, no reply to posts about a common issue, no communication at all.

Could you give us a update on this issue or see if the team has something they can say in general about this?

Thanks

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 55444 - Posted 1 Sep 2008 13:51:57 UTC

Update: It's also blacking out on this WU: abinitio_homfrag

snap! with a variation on a theme.
Its blacking out on abinitio_homfrag_71_A_2flsA_4443_19927_0 and the graphics window has to be shut via Task Manager
____________

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 55449 - Posted 1 Sep 2008 16:01:03 UTC

The graphics have now come back on line. It would appear that it was being upset by a usb drive that I had just installed. The old hard drive I had installed was still set up as a slave drive which was causing the problems.
____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55461 - Posted 1 Sep 2008 18:06:51 UTC

I just re-attached a PIII and a Core 2 Duo, two machines which I had previously used to crunch for Rosetta. They both got WU's within minutes. After the WU started on each machine, I let it run as long as it stayed active. Within about seven minutes, each machine froze. So, I detached both again.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Keith T.
Avatar

Joined: Mar 1 07
Posts: 37
ID: 150379
Credit: 12,959
RAC: 0
Message 55477 - Posted 2 Sep 2008 11:50:12 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=187758720 has been running on my AMD Athlon 2200+ since 28/08/08 08:45 (UTC+1).

So far it has run for 11:33:44 and is currently "waiting to run".

boinc_checkpoint_count.txt shows 97
boinc_init_count shows 1 87

The task appears to still be on the first decoy or model.

I have changed my runtime prefs while this task has been running, to try to get it to finish and grant some credit.

stderr.txt
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600

I think it may have got stuck in a loop and is repeating the first model, as when I look at the graphics, the task does seem to be making progress.

How far past the required runtime does a task need to go before it gets stopped by the watchdog?

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55482 - Posted 2 Sep 2008 14:30:11 UTC - in response to Message ID 55477.

How far past the required runtime does a task need to go before it gets stopped by the watchdog?

The messages in stderr indicate that this task has been removed from memory and resumed many times. If 5 such restarts occur with no progress being made (i.e. a checkpoint saved) the task will be ended.

Otherwise, if the tasks uses more then 4 times the runtime preference, the watchdog will end it. Since it is waiting to run, it probably has to begin running again for the watchdog to get any time. The watchdog checks on it every 15 minutes.

So, I would have expected it to have ended on the first restart where your runtime preference was lowered to 2 hours. Since it did not end, I guess I would abort that one. 11.5 hours and not to complete a single model, and not responding to the watchdog, sounds like something may be wrong there.
____________
Rosetta Moderator: Mod.Sense

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55483 - Posted 2 Sep 2008 14:42:18 UTC - in response to Message ID 55443.

To Mod,

Why has no one from the team or for that matter anyone from RAH in general said anything about these 'needs psipred_ss2 to run filters' errors?

This is what annoys alot of people and causes them to leave, no news, no reply to posts about a common issue, no communication at all.

Could you give us a update on this issue or see if the team has something they can say in general about this?

Thanks


Greg, I can't post information I do not have. The message does not seem to adversely effect the running of the tasks. I'm sure it will be addressed in a future release.

Rest assured that the Project Team does review and react to the posts in these "problems with..." threads.
____________
Rosetta Moderator: Mod.Sense

JordanWeber

Joined: Apr 24 08
Posts: 4
ID: 254803
Credit: 716,009
RAC: 0
Message 55484 - Posted 2 Sep 2008 15:02:33 UTC

Still can't run mini on this 1 computer, but at least I get computation errors now on all the tasks, with a message, getting warmer :):
189123860


<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C91152A write attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>

Phil Hirons, Jr.

Joined: Jan 5 06
Posts: 1
ID: 47273
Credit: 44,233
RAC: 0
Message 55504 - Posted 3 Sep 2008 17:50:52 UTC

I've had 3 Mini WU that go alon fine until about 10 minutes are estimated to be left. They then take hours of CPU time to complete (>15)
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55507 - Posted 3 Sep 2008 19:27:23 UTC - in response to Message ID 55483.

Rest assured that the Project Team does review and react to the posts in these "problems with..." threads.

That's all I need to know Mod.Sense. Thanks. I'll leave it in their hands.

32/64 of my most recent tasks have failed (32 passed) so my problem persists. I look forward to being more productive when a solution (or advice for me) appears.
____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55508 - Posted 3 Sep 2008 20:28:15 UTC - in response to Message ID 55507.

[quote]Rest assured that the Project Team does review and react to the posts in these "problems with..." threads.


Well, I sure would like to be back crunching for Rosetta.

Why can't they take this problem process and put it into Ralph and ask for volunteers? That would let those of us who just want to run WU's not need to constantly baby this thing along.

>>RSM


____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55510 - Posted 3 Sep 2008 21:40:33 UTC

I don't understand how you guys are getting failures so often.
I am crunching stuff with the h1_0953.45.xxxx and get no failures at all.
Only the Psp stuff, but that is just an annoyance. I still get the usual 80-90 claimed and 130-160 granted credit all the time. I run windows xp home with sp3 and a intel dual core with some good memory and I just chug along no problems.

jasonwishart Profile
Avatar

Joined: Nov 7 05
Posts: 14
ID: 9994
Credit: 936,419
RAC: 0
Message 55511 - Posted 3 Sep 2008 21:53:56 UTC - in response to Message ID 55408.

For people having trouble This may help you out hope it helps.
Cheers
Speedy


Thanks Speedy...this did indeed help me out. It took a few go's for me to get it right, but yes this did solve my problem.

Cheers to you as well.
____________
Will work for bandwidth!!

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55516 - Posted 4 Sep 2008 3:19:34 UTC - in response to Message ID 55510.

I don't understand how you guys are getting failures so often.
I am crunching stuff with the h1_0953.45.xxxx and get no failures at all.
Only the Psp stuff, but that is just an annoyance. I still get the usual 80-90 claimed and 130-160 granted credit all the time. I run windows xp home with sp3 and a intel dual core with some good memory and I just chug along no problems.


In spite of everything, I do have one machine, a PIII with 512 megs of DRAM and a I gig processor which has kept sailing right along in Rosetta.

So, that is compared to my three machines, 2 Core 2 Duo machines and 1 PIII that have failed to run this bugger.

I am delighted for those who are having no problems; but you can tell from this thread that things are no going well.

>>RSM

____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 521,019
RAC: 10
Message 55525 - Posted 4 Sep 2008 8:33:59 UTC - in response to Message ID 55511.


Thanks Speedy...this did indeed help me out. It took a few go's for me to get it right, but yes this did solve my problem.

jasonwishart. Glad I could assist you

I've been running mini 1.32 as good as gold on my Dual Core AMD Opteron Processor 180 for the last day & a bit
Speedy
____________
Have a crunching good day!!

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55530 - Posted 4 Sep 2008 12:14:21 UTC - in response to Message ID 55525.


Thanks Speedy...this did indeed help me out. It took a few go's for me to get it right, but yes this did solve my problem.

jasonwishart. Glad I could assist you

I've been running mini 1.32 as good as gold on my Dual Core AMD Opteron Processor 180 for the last day & a bit
Speedy


you finally got out of the rut of bad tasks?
glad to see someone else is running good again

Jim_Clark Profile
Avatar

Joined: Sep 11 07
Posts: 7
ID: 204423
Credit: 38,439
RAC: 0
Message 55535 - Posted 4 Sep 2008 16:04:48 UTC

Minirosetta 1.28 WUs always crashed my PC, and v1.32 does the same. Updating BOINC to the latest version didn't help. Upgrading my Windows XP Pro to SP3 didn't change the situation, either. This happens on my AMD Athlon Dual Core.

My BOINC Manager is now configured in a protected mode now, so Minirosetta v1.32 WUs can't lock up my PC, but the WUs fail with a 'compute error' after 5 hours of wasted work. I don't want to waste my PC on WUs that will fail, so I abort all the Minirosetta WUs when I see them, hoping that I will get Rosetta Beta WUs to replace them.

Some project sites such as PrimeGrid and World Community Grid give the user an option to choose which applications (sub-projects) to run. I would like to be able to opt out of Minirosetta until this bug is fixed.
____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55536 - Posted 4 Sep 2008 16:06:45 UTC

Kudos to Jim Clark for simply and eloquently explaining the pain.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55545 - Posted 4 Sep 2008 18:41:14 UTC

can someone explain to papa_protien why this task granted only 9 points?

his question is here

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 521,019
RAC: 10
Message 55556 - Posted 5 Sep 2008 5:30:04 UTC - in response to Message ID 55530.

glad to see someone else is running good again

I'm not sure if it was luck but I haven't run into any bad work units. I hope the rest of you are crunching good work units now
Speedy
____________
Have a crunching good day!!

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 521,019
RAC: 10
Message 55575 - Posted 6 Sep 2008 6:40:24 UTC

This is the 1st task to fail on me in a long time Task ID 189527757 Can anyone spot anything out of place with this task?
Speedy
____________
Have a crunching good day!!

leonari

Joined: Dec 11 05
Posts: 4
ID: 34777
Credit: 2,352,023
RAC: 1,411
Message 55581 - Posted 6 Sep 2008 16:16:23 UTC

For the third time this month Rosetta Mini 1.32 has "hung", stopping everything else from running, after completing at 100% - only releasing after the "Abort" button was pressed.
The last has only just occurred (@ circa 16:30, 06/09/08) running abinitio_homfrag_71_A_1prqA_4443_32677_0. Luckily I noticed that it was still running at 100% complete and was locking everything else out - the last time, it locked everything out for three days before it was noticed.
I am starting to get a bit sick and tired of Rosetta, with all of the problems I have experienced over the last year or so!
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 658
ID: 264600
Credit: 3,743,487
RAC: 7,157
Message 55585 - Posted 6 Sep 2008 18:46:48 UTC

Those of you who are having problems with minirosetta v1.32 may want to add a description of the machine where you were running it when it had problems in case the problem appears only on specific types of machines.

For example, I don't remember any problems recently; I'm running BOINC 5.10.45 under 32-bit Vista SP1 on an HP Compaq Presario SR5125CL PC with an AMD Athlon 64 X2 Dual Core 3600+ 1.90 GHz processor with 2 GB memory.

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55586 - Posted 6 Sep 2008 22:56:21 UTC - in response to Message ID 55585.

Those of you who are having problems with minirosetta v1.32 may want to add a description of the machine where you were running it when it had problems in case the problem appears only on specific types of machines.

For example, I don't remember any problems recently; I'm running BOINC 5.10.45 under 32-bit Vista SP1 on an HP Compaq Presario SR5125CL PC with an AMD Athlon 64 X2 Dual Core 3600+ 1.90 GHz processor with 2 GB memory.


With Rosetta problems,I am running two Core-2-Duo Vista machines with 2 gig of DRAM, and one Intel PIII machine with 512 megs of DRAM.

With no problem, one Intel PIII with 512 megs of DRAM.

Go figure!

>>RSM

____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Terrasapiens

Joined: Apr 25 08
Posts: 15
ID: 254884
Credit: 241,058
RAC: 39
Message 55587 - Posted 6 Sep 2008 23:25:40 UTC - in response to Message ID 55535.

Some project sites such as PrimeGrid and World Community Grid give the user an option to choose which applications (sub-projects) to run. I would like to be able to opt out of Minirosetta until this bug is fixed.


Since just about every mini WU I've had since v1.28 has failed on me right from the start I'd certainly like to have this option as well. The v5.98 WUs seem to run fine but ironically, the BOINC client doesn't seem to be requesting these lately. I guess I'll have to try the suggestion mentioned in post 55511 when I have time. Per a suggestion from Peter Leman that I check my RAM, I've removed and reset the RAM cards and run Memtest86+ (v2.01) on my machine but no errors were found. Check out the last 4 pages or so of results and see that there's nothing but failures:

http://boinc.bakerlab.org/rosetta/results.php?userid=254884

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55589 - Posted 7 Sep 2008 6:39:18 UTC - in response to Message ID 55587.
Last modified: 7 Sep 2008 6:41:11 UTC

i tried looking up that code, but it is hard to find relevant information.
one place i saw made mention of the screensaver in dc programs and another made mention of the graphics card. maybe there is something in those pages.

here is one of the pages: http://boinc.berkeley.edu/dev/forum_thread.php?id=163
http://boinc.bakerlab.org/rosetta/results.php?userid=254884, it is an old thread from 2005 but is the closest to the error you are getting.

there is also this boinc wiki thread that mentions climate at home was having issues, but again does not go into to great of detail unless the links lead to something good. http://www.boinc-wiki.info/Unrecoverable_error_for_result_%27(result)%27_(_-_exit_code_-%27(number)%27_(%27(hex-number)%27))

lastly here is something mod posted in the forum area: http://boinc.bakerlab.org/rosetta/forum_thread.php?id=3372&nowrap=true#43480

from the looks of it, OC, overheat or the screensaver could be culprits.

This stuff is a good place to start....anyone else got ideas?


Some project sites such as PrimeGrid and World Community Grid give the user an option to choose which applications (sub-projects) to run. I would like to be able to opt out of Minirosetta until this bug is fixed.


Since just about every mini WU I've had since v1.28 has failed on me right from the start I'd certainly like to have this option as well. The v5.98 WUs seem to run fine but ironically, the BOINC client doesn't seem to be requesting these lately. I guess I'll have to try the suggestion mentioned in post 55511 when I have time. Per a suggestion from Peter Leman that I check my RAM, I've removed and reset the RAM cards and run Memtest86+ (v2.01) on my machine but no errors were found. Check out the last 4 pages or so of results and see that there's nothing but failures:


mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55590 - Posted 7 Sep 2008 13:10:51 UTC - in response to Message ID 55589.

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.

I am running 8 projects, 9 projects, and 4 projects, respectively, on the computers having difficulty with Rosetta. I suspect that many others experiencing the problems are doing much the same.

To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project. The managers of the project ought to realize this by now, and pull this away. That way, those of us who have been loyally trying to do our best for what is arguably the most important project at BOINC could proceed with this very valuable work. Instead, what they are doing is earning the enmity of some people who will detach and never come back.

>>RSM






____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


sslickerson Profile

Joined: Oct 14 05
Posts: 101
ID: 4578
Credit: 484,477
RAC: 0
Message 55593 - Posted 7 Sep 2008 17:12:53 UTC
Last modified: 7 Sep 2008 17:14:35 UTC

I bought a new laptop a few days ago. I'm running Vista (x64) and I seem to be getting a lot errors. This error has shown up a couple times (can't acquire lock file). There are also 4 or 5 of theseError code:200.
____________



Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55603 - Posted 8 Sep 2008 11:18:33 UTC - in response to Message ID 55590.

so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not?

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.

I am running 8 projects, 9 projects, and 4 projects, respectively, on the computers having difficulty with Rosetta. I suspect that many others experiencing the problems are doing much the same.

To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project. The managers of the project ought to realize this by now, and pull this away. That way, those of us who have been loyally trying to do our best for what is arguably the most important project at BOINC could proceed with this very valuable work. Instead, what they are doing is earning the enmity of some people who will detach and never come back.

>>RSM






mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55604 - Posted 8 Sep 2008 12:21:01 UTC - in response to Message ID 55603.

Greg-

Surely, I can not answer your question. But the fact is, my experience is typical. People are experiencing this problem on some machines and not on others, so it seems to me that the problem lies with the project.

It behooves the project managers to get this figured out. Rosetta is losing crunchers every day.

This is not just a WU freezing or anything like that. When the problem occurs, it freezes the whole computer necessitating a trip to Task Manager to clear out the process.

>>RSM

so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not?

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.

I am running 8 projects, 9 projects, and 4 projects, respectively, on the computers having difficulty with Rosetta. I suspect that many others experiencing the problems are doing much the same.

To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project. The managers of the project ought to realize this by now, and pull this away. That way, those of us who have been loyally trying to do our best for what is arguably the most important project at BOINC could proceed with this very valuable work. Instead, what they are doing is earning the enmity of some people who will detach and never come back.

>>RSM








____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55605 - Posted 8 Sep 2008 12:22:19 UTC - in response to Message ID 55603.

Greg-

Surely, I can not answer your question. But the fact is, my experience is typical. People are experiencing this problem on some machines and not on others, so it seems to me that the problem lies with the project.

It behooves the project managers to get this figured out. Rosetta is losing crunchers every day.

This is not just a WU freezing or anything like that. When the problem occurs, it freezes the whole computer necessitating a trip to Task Manager to clear out the process.

>>RSM

so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not?

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.

I am running 8 projects, 9 projects, and 4 projects, respectively, on the computers having difficulty with Rosetta. I suspect that many others experiencing the problems are doing much the same.

To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project. The managers of the project ought to realize this by now, and pull this away. That way, those of us who have been loyally trying to do our best for what is arguably the most important project at BOINC could proceed with this very valuable work. Instead, what they are doing is earning the enmity of some people who will detach and never come back.

>>RSM








____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


robertmiles Profile

Joined: Jun 16 08
Posts: 658
ID: 264600
Credit: 3,743,487
RAC: 7,157
Message 55606 - Posted 8 Sep 2008 12:26:12 UTC - in response to Message ID 55603.

There are different versions of the programs for different kinds of machines. That's why it's useful to describe what type of machine you have. For example, if the problem affected only machines running 64-bit operating systems, or only machines running 64-bit Windows Vista SP1, they could find the types of machines are affected faster if the people with problems mentioned what operating system they were running. Also, I've heard of problems affecting only machines running certain versions of BOINC, in which case it would be useful for the people with the problems to mention which version they were running.

By the way, what's corsair memory - a brand name without the usual capital letter?

so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not?

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.

I am running 8 projects, 9 projects, and 4 projects, respectively, on the computers having difficulty with Rosetta. I suspect that many others experiencing the problems are doing much the same.

To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project. The managers of the project ought to realize this by now, and pull this away. That way, those of us who have been loyally trying to do our best for what is arguably the most important project at BOINC could proceed with this very valuable work. Instead, what they are doing is earning the enmity of some people who will detach and never come back.

>>RSM







Keith T.
Avatar

Joined: Mar 1 07
Posts: 37
ID: 150379
Credit: 12,959
RAC: 0
Message 55607 - Posted 8 Sep 2008 12:32:26 UTC - in response to Message ID 55482.
Last modified: 8 Sep 2008 12:38:02 UTC

How far past the required runtime does a task need to go before it gets stopped by the watchdog?

The messages in stderr indicate that this task has been removed from memory and resumed many times. If 5 such restarts occur with no progress being made (i.e. a checkpoint saved) the task will be ended.

Otherwise, if the tasks uses more then 4 times the runtime preference, the watchdog will end it. Since it is waiting to run, it probably has to begin running again for the watchdog to get any time. The watchdog checks on it every 15 minutes.

So, I would have expected it to have ended on the first restart where your runtime preference was lowered to 2 hours. Since it did not end, I guess I would abort that one. 11.5 hours and not to complete a single model, and not responding to the watchdog, sounds like something may be wrong there.


I did not abort the task, I suspended all other tasks and then ran it again, deliberatly stopping it 5 times before a checkpoint.

I also monitored the graphics and confirmed that the task was re-starting from model 0 at each restart. The task crashed after the 5th restart.

http://boinc.bakerlab.org/rosetta/result.php?resultid=187758720

Here are the details as that url will probably be gone in a few days:


Task ID 187758720
Name abinitio_homfrag_71_A_2ib0A_4443_10507_0
Workunit 171490091
Created 28 Aug 2008 4:31:58 UTC
Sent 28 Aug 2008 4:42:45 UTC
Received 2 Sep 2008 18:33:08 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 428259
Report deadline 7 Sep 2008 4:42:45 UTC
CPU time 43431.4
stderr out <core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 28800
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
# cpu_run_time_pref: 3600
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 43430.7 cpu seconds
This process generated 0 decoys from 0 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>abinitio_homfrag_71_A_2ib0A_4443_10507_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Validate state Invalid
Claimed credit 117.814813845302
Granted credit 0
application version 1.32


I hope that I may get some credit retrospectively for my > 12 hours of CPU time.

Keith.

[edit]
BTW the "wingman" who also used an AMD processor, got a "success".

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=171490091


CPU time 7905.563
stderr out <core_client_version>6.3.8</core_client_version>
<![CDATA[
<stderr_txt>
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
======================================================
DONE :: 1 starting structures 7905.28 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>
Validate state Valid
Claimed credit 25.8675057245436
Granted credit 20.8349439063512
application version 1.32

[/edit]

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55610 - Posted 8 Sep 2008 13:22:13 UTC - in response to Message ID 55590.

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.
...
To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project.


mitrichr, I don't believe anyone intended to imply that your machine is a suspect. To use the crime analogy, it is not the "suspect", it is the "victim". And when the police investigate crimes, they ask the victims a lot of questions.

100,000 people carried out their day yesterday and were not mugged. This certainly doesn't mean that person that was mugged did anything wrong. But knowing the details, and understanding the circumstances, can help prevent it from happening in the future.

There are new mini releases being tested now on Ralph. But I am not certain how many of the reported issues have been addressed.
____________
Rosetta Moderator: Mod.Sense

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55614 - Posted 8 Sep 2008 14:26:31 UTC - in response to Message ID 55590.

This comment is not helpful. Different platforms\OSs run a different application. The WU's that fail on one platform with one application run happily on another platform running another application. This indicates that the WUs themselves are mostly fine, but there's an issue somewhere with the application on that platform\OS - specifically the mini-rosetta one rather than the Rosetta beta. Detailing the specific WUs that fall over and comparing them to the WUs that succeed may allow the coders to pinpoint what the one WU specifically asks for that the other one doesn't.

This is an entirely normal process for bug\beta-testing, which I've been involved in to a greater or lesser degree since the early 90s on other platforms. Impatience doesn't help. If the problem was easy to solve it wouldn't have gone wrong in the first place - or it may just be a simple oversight that'll get cured in the next release.

For all the complaints, 235000 (mostly mini-WUs?) completed successfully on Rosetta in the last 24 hours. That being the case it's not unfair at all to look into potential software\machine conflicts with the rare individual machines having an issue.

If that weren't the case, this thread would have a thousand different people posting. And they're not, are they. There's about ten. So I'm grateful I'm getting any attention at all tbh.

Thanks again to Mod.Sense for a good post.

With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca.

I am running 8 projects, 9 projects, and 4 projects, respectively, on the computers having difficulty with Rosetta. I suspect that many others experiencing the problems are doing much the same.

To suspect any given machine, when all of the other projects on these machines are doing fine, especially and including all those who, like myself, find the screen savers useful, is silly.

The problem lies solely with the project. The managers of the project ought to realize this by now, and pull this away. That way, those of us who have been loyally trying to do our best for what is arguably the most important project at BOINC could proceed with this very valuable work. Instead, what they are doing is earning the enmity of some people who will detach and never come back.

____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55615 - Posted 8 Sep 2008 14:36:35 UTC

Sid-

Please know that I mean no disrespect. My only concern is to be able to run Rosetta on all four of my machines as I did previously. Right now, I have one XP which has had nary a problem and continues to crunch whatever comes from Rosetta. I have an almost identical XP machine and two Core-2-Duo's that are off the project.

I think that Rosetta might be the single most important project at BOINC.

But I can not leave machines on the project when WU's totally freeze the computer and no other crunching on other projects or other work that I do can not go forward.

So, I am going to bow out of the debate. I hope that I will see notice of resolution in the RSS feed so that I can continue work which I deem very valuable.
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55619 - Posted 8 Sep 2008 17:55:08 UTC

mitrichr - only 2 of your computers have returned results. The others were detached or never replied. Based on that information it is kind of hard to see what the problems were with the other tasks that did not complete. I hope you are running the last of the Rosetta tasks so that the team and others can see what if any errors come up.

One of your tasks from computer 2 got sent to 2 other users. 1 of which had a file transfer error and the other was running Linux on a 2.66 ghz machine and completed the task ok. Another user completed the other task that did not report from computer 2 and was running a 1.86 ghz dual core on Vista SP1.

Computer 4's one task got sent to a intel xenon machine running MS Server 2003 and completed the task ok.

That is 3 different machines with 3 different OS packages that completed ok of which 2 machines were older than yours.

---------

Keith T - the task you linked to got stopped to many times and Boinc Manager terminated it on your machine due to that. It does not matter if you stopped it or it restarted itself due to machine reboot or otherwise, a stoppage is a stoppage.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55620 - Posted 8 Sep 2008 17:55:33 UTC
Last modified: 8 Sep 2008 17:55:59 UTC

**Sorry for the double post**

mitrichr - FYI only 2 of your computers have returned results. The others were detached or never replied. Based on that information it is kind of hard to see what the problems were with the other tasks that did not complete. I hope you are running the last of the Rosetta tasks so that the team and others can see what if any errors come up.

One of your tasks from computer 2 got sent to 2 other users. 1 of which had a file transfer error and the other was running Linux on a 2.66 ghz machine and completed the task ok. Another user completed the other task that did not report from computer 2 and was running a 1.86 ghz dual core on Vista SP1.

Computer 4's one task got sent to a intel xenon machine running MS Server 2003 and completed the task ok.

That is 3 different machines with 3 different OS packages that completed ok of which 2 machines were older than yours.

---------

Keith T - the task you linked to got stopped to many times and Boinc Manager terminated it on your machine due to that. It does not matter if you stopped it or it restarted itself due to machine reboot or otherwise, a stoppage is a stoppage.

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55621 - Posted 8 Sep 2008 19:52:28 UTC

Greg-

I was forced to detach on three computers, and did so probably about 7-10 days ago. I just could not keep minding Rosetta when I have work to do, stuff to read, or I am out cycling or hiking. That is a burden I can not assume. I have a bunch of projects on which I work, I have a job, etc.

I am surprised you are seeing anything current on two machines, except of course that I did not detach as much as 30 days ago.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55624 - Posted 8 Sep 2008 21:05:25 UTC - in response to Message ID 55621.

Well good luck with your other projects. I see your also on my second project.

Greg-

I was forced to detach on three computers, and did so probably about 7-10 days ago. I just could not keep minding Rosetta when I have work to do, stuff to read, or I am out cycling or hiking. That is a burden I can not assume. I have a bunch of projects on which I work, I have a job, etc.

I am surprised you are seeing anything current on two machines, except of course that I did not detach as much as 30 days ago.

>>RSM

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55625 - Posted 9 Sep 2008 2:01:20 UTC - in response to Message ID 55624.

[quote]Well good luck with your other projects. I see your also on my second project.

[quote]

Sorry, I do not understand, what do you mean "...my second project...."?

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55626 - Posted 9 Sep 2008 3:10:28 UTC - in response to Message ID 55615.

Sid-

Please know that I mean no disrespect. My only concern is to be able to run Rosetta on all four of my machines as I did previously. Right now, I have one XP which has had nary a problem and continues to crunch whatever comes from Rosetta. I have an almost identical XP machine and two Core-2-Duo's that are off the project.

I think that Rosetta might be the single most important project at BOINC.

But I can not leave machines on the project when WU's totally freeze the computer and no other crunching on other projects or other work that I do can not go forward.

So, I am going to bow out of the debate. I hope that I will see notice of resolution in the RSS feed so that I can continue work which I deem very valuable.

No offence taken at all and I agree with everything you say - I run no other project. This is frustration coming out and if you were to look at my recent results I have nearly as much reason to be as frustrated as you (90% failure rate in the last couple of days after as much as 2 hours runtime). I could do with a few 5.98 WUs to go at too.

Can someone point me to how I can reduce the WU runtimes to 2 hours or less? I'm hoping to get a few more completed before they crash out.
____________

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 521,019
RAC: 10
Message 55627 - Posted 9 Sep 2008 3:42:22 UTC - in response to Message ID 55626.


Can someone point me to how I can reduce the WU runtimes to 2 hours or less? I'm hoping to get a few more completed before they crash out.

log into your account click on Rosetta@home preferences & you can change your work unit runtime in there. the way i understand it is the shorter the runtime the more bandwidth you use up.
Speedy
____________
Have a crunching good day!!

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55634 - Posted 9 Sep 2008 12:22:44 UTC - in response to Message ID 55625.

[quote]Well good luck with your other projects. I see your also on my second project.

[quote]

Sorry, I do not understand, what do you mean "...my second project...."?

>>RSM


meant to say the second Boinc project I am part of, Einstein at home

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55635 - Posted 9 Sep 2008 12:26:12 UTC - in response to Message ID 55626.

Maybe you should point out to the group that your failures are related to this error: Can't acquire lockfile - exiting

Can someone explain this message?


Sid-

Please know that I mean no disrespect. My only concern is to be able to run Rosetta on all four of my machines as I did previously. Right now, I have one XP which has had nary a problem and continues to crunch whatever comes from Rosetta. I have an almost identical XP machine and two Core-2-Duo's that are off the project.

I think that Rosetta might be the single most important project at BOINC.

But I can not leave machines on the project when WU's totally freeze the computer and no other crunching on other projects or other work that I do can not go forward.

So, I am going to bow out of the debate. I hope that I will see notice of resolution in the RSS feed so that I can continue work which I deem very valuable.

No offence taken at all and I agree with everything you say - I run no other project. This is frustration coming out and if you were to look at my recent results I have nearly as much reason to be as frustrated as you (90% failure rate in the last couple of days after as much as 2 hours runtime). I could do with a few 5.98 WUs to go at too.

Can someone point me to how I can reduce the WU runtimes to 2 hours or less? I'm hoping to get a few more completed before they crash out.

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55648 - Posted 9 Sep 2008 15:20:34 UTC - in response to Message ID 55627.

Can someone point me to how I can reduce the WU runtimes to 2 hours or less? I'm hoping to get a few more completed before they crash out.

Log into your account click on Rosetta@home preferences & you can change your work unit runtime in there. The way I understand it is the shorter the runtime the more bandwidth you use up.
Speedy

I was reading that in the FAQ last night and just went blind when it came to finding it. Just looked again now and it's obvious. Got there in the end - thanks.
Maybe you should point out to the group that your failures are related to this error: Can't acquire lockfile - exiting

I thought I did - in msgs: 55318, 55323, 55343 and 55436.

Peculiar thing is, as soon as I had a little moan about most WUs falling over I just had a great little run of successes, including one in excess of 3 hours. Go figure...
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55649 - Posted 9 Sep 2008 15:38:12 UTC - in response to Message ID 55593.
Last modified: 9 Sep 2008 15:48:02 UTC

I bought a new laptop a few days ago. I'm running Vista (x64) and I seem to be getting a lot errors. This error has shown up a couple times (can't acquire lock file). There are also 4 or 5 of these Error code:200.

Overlooked this. An Intel Core2 Duo running Vista64 crashing out with too many exits after the same "Can't acquire lockfile - exiting" as I get on my AMD Quad Core Phenom running Vista64 - both with Boinc 6.2.18 and presumably the x64 version.

Vista64 is the common factor again.
____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55650 - Posted 9 Sep 2008 16:50:04 UTC

I forgot to note-

Sid said earlier that there are only about 10 people complaining among all the folks running Rosetta.

That 10 people may represent 10,000 who are having trouble, getting disillusioned and detaching. Not everyone with a problem comes here. Also, seeing all of the technical terminology, many might be put off.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


PJCN88

Joined: Aug 11 07
Posts: 2
ID: 197379
Credit: 149,276
RAC: 0
Message 55654 - Posted 9 Sep 2008 21:10:04 UTC

I will also stop for the moment with Rosetta.
In the weekend I reinstalled BOINC on my computer to change the data directory and from that point I got troubles with Minirosetta

or I have computation errors or I have to abort because it says running but nothing is happening

system info :
09/09/08 20:18:35||Starting BOINC client version 6.2.18 for windows_x86_64
09/09/08 20:18:35||log flags: task, file_xfer, sched_ops
09/09/08 20:18:35||Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3
09/09/08 20:18:35||Running as a daemon
09/09/08 20:18:35||Data directory: D:\boinc\data
09/09/08 20:18:35||Running under account boinc_master
09/09/08 20:18:35||Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz [Intel64 Family 6 Model 23 Stepping 6]
09/09/08 20:18:35||Processor features: fpu tsc pae nx sse sse2 pni
09/09/08 20:18:35||OS: Microsoft Windows Vista: Ultimate x64 Editon, Service Pack 1, (06.00.6001.00)
09/09/08 20:18:35||Memory: 4.00 GB physical, 8.17 GB virtual
09/09/08 20:18:35||Disk: 368.10 GB total, 352.50 GB free
09/09/08 20:18:35||Local time is UTC +2 hours
09/09/08 20:18:35||No coprocessors

Patrick

PS : no problem with the 5.98 rosetta beta

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 55655 - Posted 9 Sep 2008 21:11:31 UTC

Sid - your message threads make mention of the error, but no one has answered it.

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55656 - Posted 9 Sep 2008 22:25:00 UTC - in response to Message ID 55648.
Last modified: 9 Sep 2008 22:26:34 UTC

Peculiar thing is, as soon as I had a little moan about most WUs falling over I just had a great little run of successes, including one in excess of 3 hours.

And the successes continue. The only change I made was to "Leave applications in memory while suspended?" I thought I had this as 'Yes' in my Boinc Manager settings, but it was marked as 'No' online. Hmm. And now 5.98 WUs are coming through too.

That 10 people may represent 10,000 who are having trouble, getting disillusioned and detaching.

It might do. Is there any evidence of that? The home page shows more users and more hosts each day and 239k successes in the last 24hours (up from 235k the previous time I mentioned it). These graphs support that. What's the basis of your assertion?

09/09/08 20:18:35||Starting BOINC client version 6.2.18 for windows_x86_64
[...]
09/09/08 20:18:35||OS: Microsoft Windows Vista: Ultimate x64 Editon, Service Pack 1, (06.00.6001.00)
[...]
PS : no problem with the 5.98 rosetta beta

I was about to highlight this for being another Vista64 issue, then I glanced at the error messages being given and I'm staying clear. Way out of my depth on that one!

Except I noticed all WUs succeeded prior to 7 Sept, which makes me wonder if my whole issue has been about leaving applications suspended in memory or not. I'll keep an eye on my progress now (with WU run time set to default 3 hours again).
____________

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55657 - Posted 9 Sep 2008 23:31:37 UTC - in response to Message ID 55656.

Not clear on "Leave applications in memory while suspended?". Mine are Yes, should I switch to no and try again?

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3389
ID: 106194
Credit: 0
RAC: 0
Message 55658 - Posted 9 Sep 2008 23:44:18 UTC - in response to Message ID 55657.

Not clear on "Leave applications in memory while suspended?". Mine are Yes, should I switch to no and try again?

>>RSM


...only if you wish to test to see if undoing what apparently improved Sid's situation, and thus putting you with the settings that Sid thinks may have contributed to having problems.

In theory the setting will not effect whether a task runs properly or not. Sid may be building evidence that there is a flaw making the theory not match the reality.

In practice, you want to leave tasks in memory (virtual memory is where they are really) while suspended to preserve all the work possible. Otherwise you are shutting the task down (every hour be default) and it may not have had a chance to save a checkpoint for the work it has done, so the work is lost, and done again when it starts again later.
____________
Rosetta Moderator: Mod.Sense

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55661 - Posted 10 Sep 2008 0:26:27 UTC - in response to Message ID 55657.

Mine are Yes, should I switch to no and try again?

Trawling back through other threads for clues, the advice seems to be to set it as 'Yes'. But your problems\symptoms seem very different to mine so I can't help, unfortunately.

I was commenting as much on the fact that the online setting was different to what I had in my Boinc Manager. I thought they synchronised on each Update.

In theory the setting will not effect whether a task runs properly or not. Sid may be building evidence that there is a flaw making the theory not match the reality.

Don't think for a minute I have any idea what I'm doing or that I have a plan - I don't! But until something else changes I could hardly make things worse than they were!

Maybe I've stumbled on some oversensitivity to one setting. I don't know. A couple of 2 hour WUs are going through now and 1 of 3 hours. Fingers crossed. I like to think I'm making a difference, even if I'm just deluding myself. (Probably the latter...)
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55672 - Posted 10 Sep 2008 14:18:35 UTC - in response to Message ID 55661.

A couple of 2 hour WUs are going through now and 1 of 3 hours. Fingers crossed.

All those went through ok, but 2 further ones failed overnight. Now it's all 5.98 WUs and 100% successes as usual.
____________

leonari

Joined: Dec 11 05
Posts: 4
ID: 34777
Credit: 2,352,023
RAC: 1,411
Message 55699 - Posted 11 Sep 2008 22:43:32 UTC

I noticed at 09:30 BST today, the 1th of September, 2008, that the Work Unit(WU) Rosette Mini 1.32 abinitio_homfrag_71A_1jfvA_4443_45274_0 had 10 minutes to run at 95% complete (after circa five and a half hours CPU run time). About 30 minutes later the WU had only completed another 1%, which could have been because of business work I was doing but because of the problems with Rosetta that I have had in the recent past, I suspended everything else to allow it to finish, and to see what would happen.
About two hours later, the CPU run time had increased to eight hours forty-seven minutes but had not finished at 98.139% complete, yet the SET application, which should run 75% of the time, had not moved. At that point I suspended the Rosetta aplication to allow SETIi@home Enhanced 6.03 to run.

With the time now at 22:54 BST, SETI has not apparently done anything since (that is no progess and no increase in CPU time. However on checking Windows 2000 Task Manager , the Rosetta Mini 1.32 is running at circa 30 to 90% CPU utilisation (even though it is suspended - allegedly), and SETI is running at 0%. I also checked the graphics for both SETI and Rosetta, neither worked.
Question I asked myself: what is at fault here: Rosetta mini 1.32; SETI 6.03 or BOINC 5.10.45?
To see what happens next I have reset Rosetta Mini. SETI has started, the SETI graphics now work, and Windows 2000 Task Manager shows SETI at 90 to 95% utilisation.
Comments, please?
By the way, am I so unlucky with Rosetta, or is this a common occurance?
Also, by the way, I filed another problem with Rosseta Mini on this thread a few days ago but, although I am sure it appeared in the "Thread Record", it has since disappeared. Reasons?
____________

leonari

Joined: Dec 11 05
Posts: 4
ID: 34777
Credit: 2,352,023
RAC: 1,411
Message 55700 - Posted 11 Sep 2008 23:05:30 UTC - in response to Message ID 55699.

I noticed at 09:30 BST today, the 1th of September, 2008, that the Work Unit(WU) Rosette Mini 1.32 abinitio_homfrag_71A_1jfvA_4443_45274_0 had 10 minutes to run at 95% complete (after circa five and a half hours CPU run time). About 30 minutes later the WU had only completed another 1%, which could have been because of business work I was doing but because of the problems with Rosetta that I have had in the recent past, I suspended everything else to allow it to finish, and to see what would happen.
About two hours later, the CPU run time had increased to eight hours forty-seven minutes but had not finished at 98.139% complete, yet the SET application, which should run 75% of the time, had not moved. At that point I suspended the Rosetta aplication to allow SETIi@home Enhanced 6.03 to run.

With the time now at 22:54 BST, SETI has not apparently done anything since (that is no progess and no increase in CPU time. However on checking Windows 2000 Task Manager , the Rosetta Mini 1.32 is running at circa 30 to 90% CPU utilisation (even though it is suspended - allegedly), and SETI is running at 0%. I also checked the graphics for both SETI and Rosetta, neither worked.
Question I asked myself: what is at fault here: Rosetta mini 1.32; SETI 6.03 or BOINC 5.10.45?
To see what happens next I have reset Rosetta Mini. SETI has started, the SETI graphics now work, and Windows 2000 Task Manager shows SETI at 90 to 95% utilisation.
Comments, please?
By the way, am I so unlucky with Rosetta, or is this a common occurance?
Also, by the way, I filed another problem with Rosseta Mini on this thread a few days ago but, although I am sure it appeared in the "Thread Record", it has since disappeared. Reasons?


Stupid person that I am, I now see that some messages are hidden so my last question on why my previous message had disappeared is of no consquence.
My Laptop is a Dell 2.2 GHz C640i running Windows 2000 5.00.2195 SP4 - fairly old but has no problems running SETI or Ralph (strangely)!

____________

BrnmccO1

Joined: Jun 26 07
Posts: 17
ID: 186323
Credit: 578,825
RAC: 0
Message 55724 - Posted 12 Sep 2008 18:09:16 UTC

Well, on this comp I've had quite a few 1.32 errors and some 1.28 errors as well. Like other people it run's 5.98's 100%.

191481586 is a typical example of the usual "Unhandled Exception Error" that bombs out the WU.

Hopefully 1.34 will be better! In any case, I for one won't be missing 1.32 RIP.
____________

Sid Celery

Joined: Feb 11 08
Posts: 806
ID: 241409
Credit: 10,030,156
RAC: 9,347
Message 55744 - Posted 14 Sep 2008 1:43:08 UTC
Last modified: 14 Sep 2008 1:43:49 UTC

Task ID 191460060

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00475A96 read attempt to address 0x00000044

Engaging BOINC Windows Runtime Debugger...

Someone else took on this WU and it didn't fare any better either.
____________

Roger L. Cousins

Joined: Nov 5 05
Posts: 1
ID: 9494
Credit: 5,468,733
RAC: 3,404
Message 55822 - Posted 16 Sep 2008 23:55:20 UTC

MiniRosetta seems to be spawning multiple threads. I have run out of Page file several times. I see eighteen threads in process right now, and many of them are using up to 170 Meg. What's up with that? How do I terminate them, short of using Task Manager to stop them one by one?

R Cousins

Pepo
Avatar

Joined: Sep 28 05
Posts: 115
ID: 1676
Credit: 101,358
RAC: 0
Message 55830 - Posted 17 Sep 2008 12:35:42 UTC - in response to Message ID 55822.

MiniRosetta seems to be spawning multiple threads. I have run out of Page file several times. I see eighteen threads in process right now, and many of them are using up to 170 Meg.

Do you see it on your WinXP host 361486? Using which application? Single threads of any Windows process do not have their 'own' allocated memory (in the context described here), memory is allocated (and accessible) 'per process'. What's the total physical/virtual memory usage of the Minirosetta process? Your pagefile size?

What's up with that? How do I terminate them, short of using Task Manager to stop them one by one?

Task manager does not support terminating single threads. Are you sure you are seeing threads, not processes?

Peter

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55869 - Posted 18 Sep 2008 18:42:44 UTC

It looks as if I may have solved my particular problems with Rosetta by giving up BOINC screen savers. I switched to standard Windows screen savers and re-attached the three computers which I had been forced to detach from Rosetta.

The problem had been that something in Rosetta was rendering the three machines totally useless, pinning the CPU at 100% an generally making me miserable. I would get the machines back by using Task Manager and shutting down the errant application.

Once I got rid of the BOINC screen saver, everything seemed to go back too normal.

As I said, I re-attached to Rosetta, now about 36 hours ago. No machine has had any problems and I believe that I have results now in all four machines.

I do not remember seeing any discussion here or screen savers. Maybe I just missed something.

Let me say that I know that there are different philosophical positions on using the screen savers. I favor using them, many of them let me know what is going on in the 10 or so projects to which I am attached with a quick glance at the monitor.

Any comments?

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Mike Tyka

Joined: Oct 20 05
Posts: 96
ID: 5612
Credit: 2,190
RAC: 0
Message 55985 - Posted 23 Sep 2008 19:23:51 UTC - in response to Message ID 55869.

Weired - we'll look into this. Has anyone else experienced problems like this ?


Once I got rid of the BOINC screen saver, everything seemed to go back too normal.
....
Any comments?

>>RSM


____________
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 55986 - Posted 23 Sep 2008 19:41:04 UTC

Mike-

Just to let you know, things are still going quite well on all four machines, I suppose you can look at my results.

I have just yesterday detached the two PIII's, to make room for another nproject which they can handle; but the two Core 2 Duos, which are really about 90% of my crunching ability, are of course still running Rosetta. I mean, the PIII's only achieve what they do running 24/7, whereas the other two do not.

You guys have a major responsibility in that Rosetta because of its originating software may be the most important project running on BOINC software. At least, I believe it is Proteome at WCG which uses your software.

Best ever always.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


leonari

Joined: Dec 11 05
Posts: 4
ID: 34777
Credit: 2,352,023
RAC: 1,411
Message 56375 - Posted 15 Oct 2008 11:56:09 UTC

This is about problems with Minirosetta v1.34 as there does not appear to a "thread" for it!
As can be seen from the three incidents below, Rosetta sometimes continues to run regardless of the rules on how long it is allowed to run (may be a BOINC Manager problem?). It then "locks up" and continues to run at 100% stopping anything else from running!
Note: all of the message sequences below are sequential messages extracted from the "Messages" tab in BOINC.

Incident 1

05/10/2008 12:50:17|rosetta@home|Starting abinitio_nohomfrag_70_A_1ynvA_4466_27265_0
05/10/2008 12:50:35|rosetta@home|Starting task abinitio_nohomfrag_70_A_1ynvA_4466_27265_0 using minirosetta version 134

Rosetta locked up running at 100% - presumably for one and a half days!

Aborted at 09:34 07/10/2008
07/10/2008 09:34:17|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
07/10/2008 09:34:22|rosetta@home|Scheduler request succeeded: got 0 new tasks
07/10/2008 09:34:50|SETI@home|Resuming task 22au08ac.21313.9479.16.8.9_1 using setiathome_enhanced version 603


Incident 2

10/10/2008 11:29:03||Starting BOINC client version 5.10.45 for windows_intelx86
10/10/2008 11:29:03||log flags: task, file_xfer, sched_ops
10/10/2008 11:29:03||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
10/10/2008 11:29:03||Data directory: C:\Program Files\BOINC
10/10/2008 11:29:07||Processor: 1 GenuineIntel Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz [x86 Family 15 Model 2 Stepping 7]
10/10/2008 11:29:07||Processor features: fpu tsc sse mmx
10/10/2008 11:29:07||OS: Microsoft Windows 2000: Professional Edition, Service Pack 4, (05.00.2195.00)
10/10/2008 11:29:07||Memory: 511.43 MB physical, 1.21 GB virtual
10/10/2008 11:29:07||Disk: 17.70 GB total, 2.26 GB free
10/10/2008 11:29:07||Local time is UTC +1 hours
10/10/2008 11:29:11|rosetta@home|URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 97037; location: home; project prefs: default
10/10/2008 11:29:11|ralph@home|URL: http://ralph.bakerlab.org/; Computer ID: 1760; location: home; project prefs: default
10/10/2008 11:29:11|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 1960189; location: work; project prefs: default
10/10/2008 11:29:11||General prefs: from http://setiathome.ssl.berkeley.edu/ (last modified 08-Jun-2006 10:33:55)
10/10/2008 11:29:11||Host location: work
10/10/2008 11:29:11||General prefs: no separate prefs for work; using your defaults
10/10/2008 11:29:11||Reading preferences override file
10/10/2008 11:29:11||Preferences limit memory usage when active to 255.71MB
10/10/2008 11:29:11||Preferences limit memory usage when idle to 460.29MB
10/10/2008 11:29:11||Preferences limit disk usage to 2.26GB
10/10/2008 11:29:18|SETI@home|Restarting task 19au08ab.15460.9479.6.8.46_1 using setiathome_enhanced version 603
10/10/2008 11:33:21|SETI@home|Sending scheduler request: Requested by user. Requesting 36 seconds of work, reporting 1 completed tasks
10/10/2008 11:33:24|SETI@home|Scheduler request succeeded: got 1 new tasks
10/10/2008 11:33:27|SETI@home|Started download of 26au08ad.24455.4162.7.8.218
10/10/2008 11:33:38|SETI@home|Finished download of 26au08ad.24455.4162.7.8.218
10/10/2008 12:16:18|SETI@home|Computation for task 19au08ab.15460.9479.6.8.46_1 finished
10/10/2008 12:16:18|SETI@home|Starting 26au08ad.15112.2526.6.8.181_1
10/10/2008 12:16:18|SETI@home|Starting task 26au08ad.15112.2526.6.8.181_1 using setiathome_enhanced version 603
10/10/2008 12:16:20|SETI@home|Started upload of 19au08ab.15460.9479.6.8.46_1_0
10/10/2008 12:16:28|SETI@home|Finished upload of 19au08ab.15460.9479.6.8.46_1_0
10/10/2008 14:14:37|rosetta@home|Restarting task abinitio_nohomfrag_70_A_1unrA_4466_47644_0 using minirosetta version 134

17:31 on the 10/10/2008 - Because Rosetta was still going at circa 85% but with no increase in either of the two SETI tasks (SETI should run 75% of the time), Rosetta was suspended at 17:31 on the 10/10/2008.

10/10/2008 17:31:02|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
10/10/2008 17:31:07|SETI@home|Scheduler request succeeded: got 0 new tasks
10/10/2008 17:31:42|SETI@home|Resuming task 26au08ad.15112.2526.6.8.181_1 using setiathome_enhanced version 603

At 21:54 on the 11/10/08, it was observed that Rosetta had increase to 100% complete even though it was still suspended. Bt the way, there was no message to report that it had restarted. Aborted Rosetta.

11/10/2008 21:54:48|SETI@home|Starting 26au08ad.24455.4162.7.8.218_0
11/10/2008 21:54:51|SETI@home|Starting task 26au08ad.24455.4162.7.8.218_0 using setiathome_enhanced version 603

21:58 on the 11/10/08 - Rosetta still going, even though it had been aborted, but SETI was still not – "Screen capture" available. Terminated Rosetta task. SETI then started


Incident 3

14/10/2008 11:53:21|rosetta@home|Restarting task abinitio_nohomfrag_70_A_1zd0A_4466_59245_0 using minirosetta version 134
14/10/2008 12:38:32|SETI@home|Started download of 25au08af.7275.890.10.8.52
(Note: First SET
14/10/2008 12:38:53|SETI@home|Finished download of 25au08af.7275.890.10.8.52
14/10/2008 12:41:15|ralph@home|Finished download of looprelax_tex_cst_oneparam.looprelax_tex_cst.t328_.tex.boinc_files.zip
14/10/2008 12:47:32|rosetta@home|Finished download of foldcst_simple.foldcst_simple.t313_.mtyka.boinc_files.zip

15/10/08 - Aborted “abinitio_nohomfrag_70_A_1zd0A_4466_59245_0” after the task was running at 100% for over twelve hours and stopping anything else from working – “Screen print" available.
I also suspect that after this task first started, sometime on the 14th, no other task was allowed to start.

Note that Rosetta was still taking processing power before the “abort” – “Screen print" available.

15/10/2008 10:10:01|SETI@home|Starting 25au08af.7275.890.10.8.52_1
15/10/2008 10:10:04|SETI@home|Starting task 25au08af.7275.890.10.8.52_1 using setiathome_enhanced version 603
15/10/2008 10:10:07|rosetta@home|Computation for task abinitio_nohomfrag_70_A_1zd0A_4466_59245_0 finished

Every thing now working as expected.

____________

Pepo
Avatar

Joined: Sep 28 05
Posts: 115
ID: 1676
Credit: 101,358
RAC: 0
Message 56376 - Posted 15 Oct 2008 12:11:10 UTC - in response to Message ID 56375.

This is about problems with Minirosetta v1.34 as there does not appear to a "thread" for it!

Sure there is one ;-) --> Minirosetta v1.34 bug thread

Peter

mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 56383 - Posted 15 Oct 2008 16:55:22 UTC

Latest results are not good. 1.32 and 1.34 I had to abort WU's, but, still, I think that at least my problem relates to the screen saver locking everything up. If I use a different screen saver, I seem to have no problems.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


mitrichr
Avatar

Joined: May 23 07
Posts: 44
ID: 179558
Credit: 1,005,660
RAC: 0
Message 56384 - Posted 15 Oct 2008 16:56:56 UTC

Latest results are not good. 1.32 and 1.34 I had to abort WU's, but, still, I think that at least my problem relates to the screen saver locking everything up. If I use a different screen saver, I seem to have no problems.

>>RSM
____________
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


Message boards : Number crunching : Minirosetta v1.32 bug thread


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^