Problems with version 5.96

Message boards : Number crunching : Problems with version 5.96

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,153,404
RAC: 660
Message 52577 - Posted: 17 Apr 2008, 23:44:17 UTC

I now have my 4GB of memory installed (came this morning! whee!) so the memory problem is no longer bothering me, but:

work units https://boinc.bakerlab.org/rosetta/workunit.php?wuid=142644377
and https://boinc.bakerlab.org/rosetta/workunit.php?wuid=142636856

are still running and the peak memory usage for both of them was around 420K. Whatever these are, they are LARGE.


--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 52577 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 52581 - Posted: 18 Apr 2008, 6:40:25 UTC
Last modified: 18 Apr 2008, 6:41:59 UTC

I've a FRA_t038 task running now, i had a look and it's using about

480mb's, not a problem. Just a side note it takes about 4hrs to do

1 model on these with my old P4's.

pete.
ID: 52581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Venturini Dario[VENETO]

Send message
Joined: 25 May 07
Posts: 22
Credit: 245,028
RAC: 0
Message 52658 - Posted: 22 Apr 2008, 1:11:16 UTC

I forgot to copy the number, but I had a few WUs worth 800mb of RAM+VirtualSpace... is that normal?
ID: 52658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,239,073
RAC: 373
Message 52663 - Posted: 22 Apr 2008, 19:48:21 UTC - in response to Message 52658.  

I forgot to copy the number, but I had a few WUs worth 800mb of RAM+VirtualSpace... is that normal?


Don't count virtual space. But yes, that is OK as long as nothing else is wrong.
ID: 52663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [KWSN]John Galt 007
Avatar

Send message
Joined: 4 Aug 06
Posts: 6
Credit: 1,017,647
RAC: 0
Message 52710 - Posted: 25 Apr 2008, 16:28:36 UTC

Computational error

I think the first one this Mac has had...strange....
ID: 52710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,745,312
RAC: 1,021
Message 52733 - Posted: 26 Apr 2008, 11:48:56 UTC - in response to Message 52710.  

I tried to get the R@H work units to run 100% in RAM. I set the use at most % of page file (swap space) to 0.00%. It looks like the WUs still use some swap space. Does anyone know how to keep 100% of the WU in RAM? I have plenty of RAM to keep one or 2 WUs in RAM all the time.

I have a thread on this topic but it is not getting very much attention.

thx
Thx!

Paul

ID: 52733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 52748 - Posted: 27 Apr 2008, 2:14:41 UTC - in response to Message 52733.  

I tried to get the R@H work units to run 100% in RAM. I set the use at most % of page file (swap space) to 0.00%. It looks like the WUs still use some swap space. Does anyone know how to keep 100% of the WU in RAM? I have plenty of RAM to keep one or 2 WUs in RAM all the time.

I have a thread on this topic but it is not getting very much attention.

thx

If you are running Windows it may decide that you don't for its own reasons ...

What are your preference settings for BOINC memory usage? 90%, 90% ???
ID: 52748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 52750 - Posted: 27 Apr 2008, 6:23:47 UTC

Workunit 144383830 and
Workunit 144815278 and
Workunit 144380616 failed with
<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 28800
# random seed: 1633461
ERROR:: Exit from: pack.cc line: 5278

</stderr_txt>
]]>
Team Helix
ID: 52750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 52751 - Posted: 27 Apr 2008, 6:26:50 UTC

Workunit 144639617 failed with:
<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 28800
# random seed: 1637471
ERROR:: Exit from: minimize.cc line: 2088

</stderr_txt>
]]>

Team Helix
ID: 52751 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 52752 - Posted: 27 Apr 2008, 6:28:58 UTC

Workunit 144799448 crashed:

<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 28800
# random seed: 3471563
SIGSEGV: segmentation violation
Stack trace (25 frames):
[0x8e1b49b]
[0x8e15d8c]
[0xffffe500]
[0x85493e7]
[0x8c836d4]
[0x804c8c0]
[0x86579a0]
[0x87ac2be]
[0x87ac646]
[0x87ae444]
[0x87bcf50]
[0x87c5a86]
[0x865c88d]
[0x87c6c19]
[0x804e502]
[0x8d6dfef]
[0x89efd7e]
[0x866a2c7]
[0x88ff31d]
[0x89d141d]
[0x8628b0e]
[0x8768a2a]
[0x8768b4a]
[0x8e80034]
[0x8048111]

Exiting...

</stderr_txt>
]]>

Team Helix
ID: 52752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,745,312
RAC: 1,021
Message 52760 - Posted: 27 Apr 2008, 20:25:55 UTC - in response to Message 52748.  

I have my preferences set at 80% memory when the system is in use and 100% when the system is idle. These systems are idle 99% of the time because they are dedicated crunchers. I now have 5 systems dedicated to R@H 24x7. I may need to add some A/C to my house.

I disabled windows swap file but every system experienced an error with no swap file so I have the swap files enabled again.

Thanks for any help you can provide.

Paul
Thx!

Paul

ID: 52760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CharlyD

Send message
Joined: 1 Dec 06
Posts: 5
Credit: 135,227
RAC: 0
Message 52776 - Posted: 28 Apr 2008, 17:44:33 UTC

Why does Rosetta keeps in memory, although I set it "off"???
ID: 52776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 52777 - Posted: 28 Apr 2008, 19:57:36 UTC - in response to Message 52776.  

Why does Rosetta keeps in memory, although I set it "off"???


Have you performed an update from the client since you last changed your preferences?

Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 52777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 52779 - Posted: 28 Apr 2008, 20:50:58 UTC - in response to Message 52776.  

Why does Rosetta keeps in memory, although I set it "off"???

A task that haven't checkpointed since started, will be kept in memory.

This to minimize the risk of a Task getting in a loop there crunches a couple minutes, exits due to "user active" or something, and repeat loop, with end-result can crunch for many hours a day, but never make it to a checkpoint so in practice has just wasted the same hours of cpu-time. Now, this won't help in case computer or BOINC is shut-down before reaches checkpoint...

Not sure how often Rosetta@home does checkpoint, but AFAIK for some wu's it can be over 30 minutes between checkpoints.




"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 52779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,703,329
RAC: 2,182
Message 52786 - Posted: 29 Apr 2008, 9:45:22 UTC
Last modified: 29 Apr 2008, 9:46:21 UTC

ID: 52786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52787 - Posted: 29 Apr 2008, 12:10:47 UTC
Last modified: 29 Apr 2008, 12:15:49 UTC

Failure on two crunchers, mine, and computer 773014



t033_1_NMRREF_1_t033_1_id_model_14_idlIGNORE_THE_REST_core_3089_6470

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 1508931
ERROR:: Exit from: .pack.cc line: 5278

</stderr_txt>
]]>
ID: 52787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Matthias Lehmkuhl

Send message
Joined: 20 Nov 05
Posts: 10
Credit: 2,115,357
RAC: 41
Message 52791 - Posted: 29 Apr 2008, 19:33:16 UTC

got also an validate error on resultid=158368356

stderr out
<core_client_version>6.1.0</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 1637406
# cpu_run_time_pref: 10800
===
</stderr_txt>
]]>
Validate state Invalid
looks like the "stderr_txt" was written wrong

two other result on the same computer finished valid


my wingman finished valid

my wingman's stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 10800
# random seed: 1637406
======================================================
DONE :: 1 starting structures 7973.24 cpu seconds
This process generated 2 decoys from 2 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

Matthias

ID: 52791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 52811 - Posted: 30 Apr 2008, 14:26:15 UTC

My f1_atpase_beta_relax_3104_42290_0 on Linux was forcibly stopped by watchdog:

<core_client_version>5.10.43</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 1356934
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
CPU time: 60899.9 seconds. Greater than 4X preferred time: 14400 seconds
**********************************************************************
GZIP SILENT FILE: ./aaf1Fp.out
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (17 frames):
[0x8e1b49b]
[0x8e15d8c]
[0x8e86c38]
[0x8859f41]
[0x885df60]
[0x8861e36]
[0x8868f3d]
[0x886a8f9]
[0x853bb26]
[0x853bfee]
[0x8b89395]
[0x8b8c203]
[0x8629ccb]
[0x8768a9f]
[0x8768b4a]
[0x8e80034]
[0x8048111]

Exiting...

</stderr_txt>
]]>


Peter
ID: 52811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 52812 - Posted: 30 Apr 2008, 14:37:35 UTC - in response to Message 52733.  

I tried to get the R@H work units to run 100% in RAM. I set the use at most % of page file (swap space) to 0.00%. It looks like the WUs still use some swap space. Does anyone know how to keep 100% of the WU in RAM?

You can not force a proces to use just RAM and no swap. The application would have to be programmed this way - for usual application which uses foreign runtime libraries probably not viable.

BTW, the "use at most % of page file (swap space)" option could at most deny any task from being started or continued to run, but usually the whole (or most of) used process' memory is on Windows mapped to swap. So the only way is to limit or completely remove the swap, at the expense of more installed RAM.

I have a thread on this topic but it is not getting very much attention.

Check your thread.

Peter
ID: 52812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 52831 - Posted: 2 May 2008, 8:16:13 UTC - in response to Message 52811.  

My f1_atpase_beta_relax_3104_42290_0 on Linux was forcibly stopped by watchdog:
........
# cpu_run_time_pref: 14400
.....
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
CPU time: 60899.9 seconds. Greater than 4X preferred time: 14400 seconds
**********************************************************************


Similar situation on Macintosh:

This f1_atpase_beta_relax_31904_9833_1 is still alive and kicking after 45 1/2 hour of computing. The graphics are folding, the steps are increasing (presently at about model 1 step 2435), but it is enjoying the experience too much to complete. I confess my curiosity is roused, but now I have to evaluate the situation...

By the way: I observe that 2 of the 4 remaining f1_atpase-wus queued up on my computer have previously crashed (quite early in the computing process) on other computers.
ID: 52831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : Problems with version 5.96



©2024 University of Washington
https://www.bakerlab.org