Rosetta@home

Problems with Rosetta version 5.89

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Problems with Rosetta version 5.89

Sort
AuthorMessage
Ingemar

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 49676 - Posted 13 Dec 2007 22:46:41 UTC

Please post any problems with rosetta 5.89 here. Thanks!
____________

David Addis

Joined: Dec 13 07
Posts: 3
ID: 226075
Credit: 8
RAC: 0
Message 49679 - Posted 14 Dec 2007 1:40:31 UTC

I only joined Rosetta today. I have been running seti since '99. When BOINC started on the Rosetta program I got the following error message "Windows Runtime C++ error, can't run program ...rosetta_beta_5.89_windows_intelx86.exe". I'm running Windows XP with BOINC 5.10.28. No problems running seti.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 49686 - Posted 14 Dec 2007 14:30:16 UTC

Welcome David. Your computers are hidden. Any chance you are running an ancient version of BOINC?
____________
Rosetta Moderator: Mod.Sense

j2satx

Joined: Sep 17 05
Posts: 97
ID: 253
Credit: 3,670,592
RAC: 0
Message 49690 - Posted 14 Dec 2007 18:12:01 UTC - in response to Message ID 49676.

Please post any problems with rosetta 5.89 here. Thanks!


Why is Rosetta running 5.89 if it is still under test at Ralph?

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 49692 - Posted 14 Dec 2007 19:58:42 UTC

Ralph is used to test new releases, and new work units. The changes between 5.88 and 5.89 were minor and have both been tested on Ralph prior to the release of 5.89 on Rosetta. When new work units are created, they will be run on Ralph with the appropriate Rosetta version. So you may see work on Ralph 3 weeks from now using v5.89. It doesn't mean that 5.89 has not completed it's testing there.
____________
Rosetta Moderator: Mod.Sense

Angus Profile

Joined: Sep 17 05
Posts: 412
ID: 83
Credit: 321,053
RAC: 0
Message 49710 - Posted 15 Dec 2007 23:11:21 UTC - in response to Message ID 49692.
Last modified: 15 Dec 2007 23:12:11 UTC

Ralph is used to test new releases, and new work units. The changes between 5.88 and 5.89 were minor and have both been tested on Ralph prior to the release of 5.89 on Rosetta. When new work units are created, they will be run on Ralph with the appropriate Rosetta version. So you may see work on Ralph 3 weeks from now using v5.89. It doesn't mean that 5.89 has not completed it's testing there.


So this was one of those one day tests again, I see. Released on ralph on the 12th, and released here on the 13th.

Ludicrous to call that "testing".
____________
Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)

Ed Parker

Joined: May 8 07
Posts: 11
ID: 174742
Credit: 132,966
RAC: 0
Message 49716 - Posted 16 Dec 2007 2:03:10 UTC - in response to Message ID 49676.

Please post any problems with rosetta 5.89 here. Thanks!


So how does this version address the fact that the project doesn't function on machines with 256mb of memory, which is what your "System Requirements" page suggests?

Thomas Leibold

Joined: Jul 30 06
Posts: 55
ID: 102494
Credit: 19,627,164
RAC: 0
Message 49717 - Posted 16 Dec 2007 2:18:17 UTC - in response to Message ID 49710.


So this was one of those one day tests again, I see. Released on ralph on the 12th, and released here on the 13th.

Ludicrous to call that "testing".


I have one system participating in Ralph and it received the first 5.89 workunit late on the 14th. This machine had gotten 'no work from project' from Ralph since the 12th.

That very same server also participates in Rosetta and received the first 5.89 workunit already on the 13th!

I think I agree with the 'ludicrous' sentiment. I find it unacceptable to experience problems in Rosetta that proper testing in Ralph could have avoided.
____________
Team Helix

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 49718 - Posted 16 Dec 2007 3:38:31 UTC - in response to Message ID 49716.

Please post any problems with rosetta 5.89 here. Thanks!


So how does this version address the fact that the project doesn't function on machines with 256mb of memory, which is what your "System Requirements" page suggests?


Your statement is incorrect. The project runs on machines with 256MB of memory and the base system requirements have not changed. There are still tasks available for such systems.

Having said that, there are some people seeing large virtual memory requirements (which not specifically defined on the requirements page). Which BOINC should have detected and handled, but apparently did not. Which is why we have a thread such as this, to explain such observations that cannot be seen from the returning results.

The difference in memory requirements is due to the task being performed... not the specific version of Rosetta.
____________
Rosetta Moderator: Mod.Sense

hedera Profile
Avatar

Joined: Jul 15 06
Posts: 66
ID: 100139
Credit: 2,486,188
RAC: 1,390
Message 49719 - Posted 16 Dec 2007 5:16:16 UTC

Ver. 5.89 seems to be better than the disastrous 5.85 but I'm still seeing very large memory requirements: right now the 2 Rosetta tasks I can run are taking up 220MB of memory between them, and the virtual machine sizes are respectively 426MB and 398MB. This makes me just a tad nervous although Win Task Manager shows memory use more moderate than it has been: peak memory is about 130MB.

What I'm concerned about is that my machine over the last few weeks has taken to suddenly slowing to a standstill, and when I check the task manager, I see huge memory figures for the Rosetta tasks. I'm considering reducing the memory allocation from 50% during machine use to 40% - would this impact my ability to run the tasks on a 1 GB machine??


____________
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

David Addis

Joined: Dec 13 07
Posts: 3
ID: 226075
Credit: 8
RAC: 0
Message 49720 - Posted 16 Dec 2007 9:56:35 UTC - in response to Message ID 49686.

Welcome David. Your computers are hidden. Any chance you are running an ancient version of BOINC?

No I don't think so. 5.10.28 is pretty recent.

Ed Parker

Joined: May 8 07
Posts: 11
ID: 174742
Credit: 132,966
RAC: 0
Message 49723 - Posted 16 Dec 2007 16:06:07 UTC - in response to Message ID 49718.

Please post any problems with rosetta 5.89 here. Thanks!


So how does this version address the fact that the project doesn't function on machines with 256mb of memory, which is what your "System Requirements" page suggests?


Your statement is incorrect. The project runs on machines with 256MB of memory and the base system requirements have not changed. There are still tasks available for such systems.

Having said that, there are some people seeing large virtual memory requirements (which not specifically defined on the requirements page). Which BOINC should have detected and handled, but apparently did not. Which is why we have a thread such as this, to explain such observations that cannot be seen from the returning results.

The difference in memory requirements is due to the task being performed... not the specific version of Rosetta.


You say the requirements haven't changed, but after the crash of a few weeks back, both of my 512mb machines stopped getting work because it said I didn't have enough memory. Once it got a work unit, it stopped after a while with a "waiting for memory" status. I look at the team stats for Primary Schools and see that RAC for everyone has dropped to the point that there is almost no activity. My guess is there IS no activity because of the memory requirements.

Angus Profile

Joined: Sep 17 05
Posts: 412
ID: 83
Credit: 321,053
RAC: 0
Message 49725 - Posted 16 Dec 2007 16:20:28 UTC - in response to Message ID 49717.


So this was one of those one day tests again, I see. Released on ralph on the 12th, and released here on the 13th.

Ludicrous to call that "testing".


I have one system participating in Ralph and it received the first 5.89 workunit late on the 14th. This machine had gotten 'no work from project' from Ralph since the 12th.

That very same server also participates in Rosetta and received the first 5.89 workunit already on the 13th!

I think I agree with the 'ludicrous' sentiment. I find it unacceptable to experience problems in Rosetta that proper testing in Ralph could have avoided.


It appears that "testing" here is defined as: "The application compiles, and someone was able to struggle through at least one WU. Let's throw it over on Rosetta and see what breaks."

____________
Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 49728 - Posted 16 Dec 2007 17:16:03 UTC - in response to Message ID 49725.
Last modified: 16 Dec 2007 17:16:52 UTC


So this was one of those one day tests again, I see. Released on ralph on the 12th, and released here on the 13th.

Ludicrous to call that "testing".


I have one system participating in Ralph and it received the first 5.89 workunit late on the 14th. This machine had gotten 'no work from project' from Ralph since the 12th.

That very same server also participates in Rosetta and received the first 5.89 workunit already on the 13th!

I think I agree with the 'ludicrous' sentiment. I find it unacceptable to experience problems in Rosetta that proper testing in Ralph could have avoided.


It appears that "testing" here is defined as: "The application compiles, and someone was able to struggle through at least one WU. Let's throw it over on Rosetta and see what breaks."


thought that was ralph's job not rosie's, or else someone slipped him a mickey
i'm still days away from any 5.89 stuff and now i wonder if it will work or not on my minimal memory system.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 49731 - Posted 16 Dec 2007 19:33:33 UTC - in response to Message ID 49723.

You say the requirements haven't changed, but after the crash of a few weeks back, both of my 512mb machines stopped getting work because it said I didn't have enough memory. Once it got a work unit, it stopped after a while with a "waiting for memory" status. I look at the team stats for Primary Schools and see that RAC for everyone has dropped to the point that there is almost no activity. My guess is there IS no activity because of the memory requirements.


After an outage, it is normal for project servers to be overloaded trying to deliver work to everyone at once. BOINC's project code doesn't handle the different types of WUs very well and it becomes possible for the "short list" of available work to become depleted of "regular memory" work units. Possible, especially when under the stress of getting back up and running. This is why there was a period of time when you did not receive work. It was a very short-lived problem.

If your work was waiting for memory then BOINC is working trying to enforce your memory preferences as it runs. Others are saying it ignores these preferences. It has always been the case that some types of work use more memory then others, and some take longer to run then others. It has little to do with one Rosetta version over another.

I know the large WUs and large virtual memory requirements are causing some problems for some machines. But, rest assured, overall the results are looking good. I typically review the credit granted last 24hrs on the homepage. It has been near 60Tflops for some time now. Here is a chart. You can see the blimp when the server recovered from the outage and this throws off the scale, but the memory problems seem to be fairly limited in scope. Not that they are unimportant, but they are not as widespread as you fear.
____________
Rosetta Moderator: Mod.Sense

David Addis

Joined: Dec 13 07
Posts: 3
ID: 226075
Credit: 8
RAC: 0
Message 49735 - Posted 16 Dec 2007 21:48:46 UTC - in response to Message ID 49725.



It appears that "testing" here is defined as: "The application compiles, and someone was able to struggle through at least one WU. Let's throw it over on Rosetta and see what breaks."



Gee is this normal here in Rosetta? Things are much more polite in Seti.

AM Profile

Joined: Jul 15 06
Posts: 7
ID: 100201
Credit: 311,022
RAC: 577
Message 49736 - Posted 16 Dec 2007 22:24:49 UTC

Dealing with a memory hog on the WU I am running.
____________

Thomas Leibold

Joined: Jul 30 06
Posts: 55
ID: 102494
Credit: 19,627,164
RAC: 0
Message 49737 - Posted 16 Dec 2007 22:35:23 UTC - in response to Message ID 49735.


Gee is this normal here in Rosetta? Things are much more polite in Seti.


I'm not sure what it is that you are considering impolite:
- Developers pushing insufficiently tested applications into the main project Rosetta without giving it enough exposure in the test project Ralph ?
- Project participants complaining about the insufficient testing ?

Mind you, I haven't seen anything yet that indicates that there are problems with 5.89. It is just that those quick releases without proper testing have caused problems before.
____________
Team Helix

Ed Parker

Joined: May 8 07
Posts: 11
ID: 174742
Credit: 132,966
RAC: 0
Message 49738 - Posted 17 Dec 2007 1:29:41 UTC - in response to Message ID 49731.

You say the requirements haven't changed, but after the crash of a few weeks back, both of my 512mb machines stopped getting work because it said I didn't have enough memory. Once it got a work unit, it stopped after a while with a "waiting for memory" status. I look at the team stats for Primary Schools and see that RAC for everyone has dropped to the point that there is almost no activity. My guess is there IS no activity because of the memory requirements.


After an outage, it is normal for project servers to be overloaded trying to deliver work to everyone at once. BOINC's project code doesn't handle the different types of WUs very well and it becomes possible for the "short list" of available work to become depleted of "regular memory" work units. Possible, especially when under the stress of getting back up and running. This is why there was a period of time when you did not receive work. It was a very short-lived problem.

If your work was waiting for memory then BOINC is working trying to enforce your memory preferences as it runs. Others are saying it ignores these preferences. It has always been the case that some types of work use more memory then others, and some take longer to run then others. It has little to do with one Rosetta version over another.

I know the large WUs and large virtual memory requirements are causing some problems for some machines. But, rest assured, overall the results are looking good. I typically review the credit granted last 24hrs on the homepage. It has been near 60Tflops for some time now. Here is a chart. You can see the blimp when the server recovered from the outage and this throws off the scale, but the memory problems seem to be fairly limited in scope. Not that they are unimportant, but they are not as widespread as you fear.


I've been doing @home projects since 1999, so I'm used to crashes,and outages, and I get upset when something I do to my computers causes my crunching @home projects to stop. Since the two computers involved have had no changes since the last BOINC update to 5.10.28, my two other computers that are running that version have no problems with their projects (SETI, Einstein) and another computer doing ABC@home is also fine, I will assume that BOINC is not the problem. All four of these units were at one time doing four projects, and having no troubles. When the Einstein work units started taking close to a hundred hours per unit on the two slowest 512mb machines, one 1.2g AMD, one 2.4g Intel, I started dedicating each computer to a project and telling each which project not to get new work from.
The two slower machines finally finished their Rosetta units and each had several "Ready to report" units just sitting there.... for days. Finally I guess somebody realized something was broken, and the units were reported. Then, no new work because the "project is down". Then no work for several days because of lack of memory. Then finally a work unit that just stops with "waiting for memory". Now I really don't care what version of the program you're up to and unless it is automatically sent to me, the only thing I keep updated is BOINC, and I don't twiddle around with BOINC settings. Mine have been the same since I started.
If you guys want everyone to upgrade to supercomputers just so they can handle your project in their spare time, forget it.
Seti, Einstein and ABC are all running fine here, and Rosetta is now a project I used to crunch.

Dr Who Fan
Avatar

Joined: May 28 06
Posts: 35
ID: 85050
Credit: 62,140
RAC: 99
Message 49743 - Posted 17 Dec 2007 9:20:51 UTC
Last modified: 17 Dec 2007 9:21:24 UTC

Outcome Validate error

http://boinc.bakerlab.org/rosetta/result.php?resultid=127005250

Client state Done
Exit status 0 (0x0)

CPU time 6608.081955
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 2330183
==
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 8.83546666063523
Granted credit 0
application version 5.89

AM Profile

Joined: Jul 15 06
Posts: 7
ID: 100201
Credit: 311,022
RAC: 577
Message 49750 - Posted 17 Dec 2007 14:00:17 UTC

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 49753 - Posted 17 Dec 2007 16:43:44 UTC - in response to Message ID 49750.

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.
____________
Rosetta Moderator: Mod.Sense

AM Profile

Joined: Jul 15 06
Posts: 7
ID: 100201
Credit: 311,022
RAC: 577
Message 49754 - Posted 17 Dec 2007 17:18:02 UTC - in response to Message ID 49753.

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.


Sorry, here it is below...

1cc8AWHS_ETABLE_SVM_TESTS-1cc8A-frags83__2452_282
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 49763 - Posted 17 Dec 2007 19:49:11 UTC

Can anyone explain......

Task ID 126999841
Name 1tul__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-9-S3-8--1tul_-vf__2434_88_0
Workunit 115452516
Created 15 Dec 2007 20:31:18 UTC
Sent 15 Dec 2007 20:32:33 UTC
Received 17 Dec 2007 16:47:46 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 510574
Report deadline 25 Dec 2007 20:32:33 UTC
CPU time 14202.75
stderr out <core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 2330093
sin_cos_range ERROR: -1.0408722 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: -1.0408722 is outside of [-1,+1] sin and cos value legal range
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 14202.7 cpu seconds
This process generated 42 decoys from 42 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

Am curious as to what sin_cos_range Error means,sees that Rosie seems happy to accept this task and gave credits. This is the second task i have had with the same comment, the other was under 5.82. Anyone help?

</stderr_txt>
]]>


Validate state Valid
Claimed credit 58.8835581491606
Granted credit 53.3430028257213
application version 5.89

Thomas Leibold

Joined: Jul 30 06
Posts: 55
ID: 102494
Credit: 19,627,164
RAC: 0
Message 49767 - Posted 17 Dec 2007 21:52:16 UTC

Task ID 126856692
Name 1lis_WHS_ETABLE_SVM_TESTS-1lis_-frags83__2453_41_0
Workunit 115320625
Created 15 Dec 2007 4:37:53 UTC
Sent 15 Dec 2007 4:38:18 UTC
Received 15 Dec 2007 19:55:18 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 193 (0xc1)
Computer ID 687330
Report deadline 25 Dec 2007 4:38:18 UTC
CPU time 24587.680633
stderr out
<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 28800
# random seed: 1054640
SIGILL: illegal instruction
Stack trace (18 frames):
[0x8da95f7]
[0x8da43ec]
[0xffffe500]
[0x8c18d60]
[0x8775f6f]
[0x877651e]
[0x877a4cf]
[0x878b883]
[0x879128d]
[0x8cfb408]
[0x8b45e94]
[0x8b48f13]
[0x80d8c55]
[0x85f4d25]
[0x8732b67]
[0x8732c12]
[0x8e0d944]
[0x8048111]

Exiting...

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 94.8036591472954
Granted credit 0
application version 5.89

This one died after more than 6.5 hours (I run with 8 hours per workunit) when it was almost finished with an illegal instruction trap. Computer has two Quad-Core Opteron 2346HE processors and 8GB of memory running OpenSuSE 10.3 in 64-bit mode. No other error have been reported from this fairly new system as far as I can tell.
____________
Team Helix

marc zubrin

Joined: Nov 13 07
Posts: 1
ID: 220404
Credit: 6,392
RAC: 0
Message 49769 - Posted 18 Dec 2007 6:36:40 UTC - in response to Message ID 49676.

Please post any problems with rosetta 5.89 here. Thanks!


I had 4 "client errors" in the last days among a dozen good with rosetta beta
(version 5.89 I think).
None of my other clients has had any error during this lapse of time (crunching about 400 credits per day).
Are you sure there ain't anything wrong with either the software or the workunits you send us ?...

marc zubrin

Astro
Avatar

Joined: Oct 2 05
Posts: 987
ID: 2322
Credit: 500,253
RAC: 0
Message 49774 - Posted 18 Dec 2007 13:00:48 UTC
Last modified: 18 Dec 2007 13:08:09 UTC

I have been alternating between Windows and Linux on 3 of my 6 machines for the last for 4 days. Last nite the three that were running linux decided to stop working properly. I woke to find that the monitor program "gkrellm" showed NO CPU useage on either of my single cores or on one of both cpus on my dual core. Up until now these machines have run rosetta flawlessly, and that's what makes this strange.

All are using 64b Boinc 5.10.21 (official). All have been working well since Nov 26th 2007. I have my run time pref to 2 hours(was 1 hour until 5 days ago), so I have plenty of samples of good work already processed.

The machines in question are:

1) hostid=692479 AMD64 2800 w/768M ram
2) hostid=692481 AMD64 3700 w/1G ram
3) hostid=692483 AMD64 X2 4800 w/2G ram

Why all the machines running linux would decide to break, while the other three running Windows kept working is a mystery. I have a 1 day cache and switch OSes every 12 hours (around every 12) so after these last days there should be "similar" work for both WIN and LIN. Two of the three froze with the same job type, while one didn't. The one that had the unique job also only had the one error. While the two with the identical job types had numerous errors overnite (which is unusual as well). You can see what the results page looks like for my AMD64 3700 below:



Here's what the results page for the AMD64 2800 looked like:



All were/are running Rosetta Beta 5.89 at this time. Many SIGSEGV faults are showing up for those compute errors. Such as:

process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 7200
# random seed: 2133824
No heartbeat from core client for 31 sec - exiting
SIGSEGV: segmentation violation

AM Profile

Joined: Jul 15 06
Posts: 7
ID: 100201
Credit: 311,022
RAC: 577
Message 49776 - Posted 18 Dec 2007 13:25:03 UTC

Memory hog WU:

1b72__BOINC_TETHER_DURING_STAGE1_POSE_ABRELAX-1b72_-_2458_1145

Astro
Avatar

Joined: Oct 2 05
Posts: 987
ID: 2322
Credit: 500,253
RAC: 0
Message 49777 - Posted 18 Dec 2007 14:03:29 UTC
Last modified: 18 Dec 2007 14:07:41 UTC

Looking deeper into the issues addressed in my last post. I don't see any "wu" specific issues (I.E it doesn't appear to be ONE particular wu). These are the compute errors from overnite.

AMD64 2800 wuid=115793636 1npsAWHS_ETABLE_SVM_TESTS-1npsA-frags83__2455_980
AMD64 2800 wuid=115810033 2chf__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-7-S3-6--2chf_-vf__2445_13
AMD64 3700 wuid=115907962 2chf__BOINC_ABINITIO_VF-S25-9-S3-3--2chf_-vf__2450_102
AMD64 3700 wuid=115457762 1ubi__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-6-S3-11--1ubi_-vf__2435_66 This one errored for both parties.
AMD64 3700 wuid=115881653 2vik__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-10-S3-11--2vik_-vf__2447_57
AMD64 3700 wuid=115795643 1shfAWHS_ETABLE_SVM_TESTS-1shfA-frags83__2455_984
AMD64 3700 wuid=115778690 1wit__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-16-S3-5--1wit_-vf__2442_38
AMD64 3700 wuid=115762923 1aiu_WHS_ETABLE_SVM_TESTS-1aiu_-frags83__2453_919

Dr Who Fan
Avatar

Joined: May 28 06
Posts: 35
ID: 85050
Credit: 62,140
RAC: 99
Message 49782 - Posted 18 Dec 2007 18:56:57 UTC

This one crashed and burned almost immediately with a 161 exit error.
Message highlighted in red below says to keep in memory... All tasks are kept in memory when preempted.

w099_1_homologymodel_strictosidine_synthase_2352_99743_1

CPU time 1.402016
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3900258
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
# random seed: 3900258
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 1.35194 cpu seconds
This process generated 0 decoys from 0 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>w099_1_homologymodel_strictosidine_synthase_2352_99743_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid
Claimed credit 0.00186677223308905
Granted credit 0
application version 5.89
____________

Jon C Melusky Profile
Avatar

Joined: Nov 29 05
Posts: 12
ID: 23144
Credit: 154,502
RAC: 41
Message 49787 - Posted 18 Dec 2007 23:51:24 UTC

I've been getting errors with rosetta for years on my 384 ram box running xp home. Just thought it was normal. 1 out of every 5 Wu's fail. Spinhenge, MalariaControl, SETI, BOINCSIMAP, TANPAKU, and Leiden Classical all run fine with zero errors over the years. I wish Rosetta could actually meet or email those people in those other groups and ask them for help with virtual memory settings. Pretty much every week, I get a few virtual memory warnings down by the clock. Pretty much every week I get a few error windows (got one today) about Rosetta runtime. No big deal, you just click ok and they go away. They all say "client error", but I have no idea if I am the client or if Rosetta is the client or if Boinc is the client or something else ?

Task ID Work unit ID Sent Time reported or deadline Server state Outcome Client state CPU time (sec) claimed credit granted credit
127686448 116086135 18 Dec 2007 23:38:15 UTC 28 Dec 2007 23:38:15 UTC In Progress Unknown New --- --- ---

127660442 116061746 18 Dec 2007 21:06:57 UTC 18 Dec 2007 23:38:14 UTC Over Client error Compute error 1.28 0.00 ---

126724871 115201235 14 Dec 2007 13:44:09 UTC 18 Dec 2007 21:06:57 UTC Over Client error Compute error 1.09 0.00 ---

126591603 115079314 13 Dec 2007 22:50:25 UTC 14 Dec 2007 18:50:18 UTC Over Success Done 6,776.48 12.64 8.76


126190225 114717930 12 Dec 2007 1:53:00 UTC 13 Dec 2007 22:50:25 UTC Over Client error Done 1.30 0.00 ---

126093431 114461562 11 Dec 2007 16:00:14 UTC 12 Dec 2007 1:53:00 UTC Over Client error Compute error 1.08 0.00 ---

Jonathan
____________

Laurent BISSON

Joined: Nov 15 07
Posts: 1
ID: 220986
Credit: 238,951
RAC: 0
Message 49808 - Posted 19 Dec 2007 20:04:21 UTC

hi i'm in the roseatta project since november this year but boinc manager cannot bring back from internet new work . what can i do ?

(i'm on macintosh intel core 2 duo Mac os X10.4.11)

thanks for help

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 49811 - Posted 19 Dec 2007 20:15:56 UTC

Laurent, here is a thread with a number of ideas on things to check if you are not getting new work. If you have further questions about getting work, please post them in that thread.
____________
Rosetta Moderator: Mod.Sense

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 49812 - Posted 19 Dec 2007 20:24:17 UTC

1hz6A_BOINC_ABINITIO_VF-S25-9-S3-3--1hz6A-vf__2450_4023_0 hung at 0.470% on Intel iMac2 ( Mac OS X 10.4.11 ): aborting.



____________

Rhiju
Forum moderator

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 49834 - Posted 20 Dec 2007 20:11:15 UTC - in response to Message ID 49753.

Hey everybody -- we've been listening, and we've been especially concerned regarding the "memory hogs". We think we've fixed this problem and are updating Rosetta@home!

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.


____________

Message boards : Number crunching : Problems with Rosetta version 5.89


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^