Minirosetta v1.47 bug thread.

Message boards : Number crunching : Minirosetta v1.47 bug thread.

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1030
Credit: 11,254,756
RAC: 4,084
Message 58011 - Posted: 18 Dec 2008, 21:57:24 UTC - in response to Message 58010.  

Question on memory size..

Greetings!
First of all, thanks to all of the developers for debugging the code.

I have a question about the memory size and page fault rate for mini-rosetta 1.47 .

I was looking at the windows (XP) task manager and looking at the memory size and page fault rate.

I admit that I do not know what it all means - and would like to ask the forum for an explanation that would help me..

Environment: Here is what BOINC says:
Processor: 2 GenuineIntel Intel(R) Core(TM) Duo CPU T2300 @ 1.66GHz [x86 Family 6 Model 14 Stepping 12]
Processor features: fpu tsc pae nx sse sse2 mmx
OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00)
Memory: 2.00 GB physical, 4.87 GB virtual
Disk: 107.41 GB total, 78.57 GB free

Here is what the Task manger is showing for mini-rosetta 1.47
Mem usage: 184,944K ( Varying between 170,000K and 247,000K while I watched.)
PF delta: 3,228 ( in a three second period)
VM size: 199,344K ( and moving up to 243,000 K)

I was running 2 Boinc projects at once: Rosetta and WCG-clean energy.
If I suspend all others so that only Rosetta is running, the page faults are more sporadic, mostly zero, then up to 6,375 in the three second period.

With Boinc only running the Rosetta task, the task manager says:

Commit charge (K)
total: 788748
limit: 5107808
peak: 1319708

Physical Memory (K)
total: 2,095,532
available: 1,127,112
System cache: 838,252



Bottom Line - I assumed that the pf rate is not good.
Do you know of anything I can tweak to help??

THANK YOU!!
Jay E.



Can you afford to add more physical memory to that machine? That should at least decrease the page fault rate, although I don't know if it's the cheapest way to do this.

Here's a good place to find out what memory fits that machine, and how much it can hold:

http://www.crucial.com/

However, note that your version of Windows has a limit on how much of the installed memory it can actually use, probably about 3.5 GB.

ID: 58011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 58012 - Posted: 18 Dec 2008, 22:00:40 UTC

This WU had a validate error:

normal_relax_rlbd_1ynv_IGNORE_THE_REST_DECOY_5565_171_0

It looks from the stderr file like it crunched normally for 16 hours (my current preference) with no error. However, it was then marked "Invalid" with no explanation. The only other thing I see is that it crunched an unusually high number of decoys (8777 decoys). Does that cause problems with the validator?
ID: 58012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,075,596
RAC: 1,892
Message 58013 - Posted: 18 Dec 2008, 22:28:12 UTC
Last modified: 18 Dec 2008, 22:37:08 UTC

Jay, RE: page faults...

If you change the view you can add a column to display the number of faults since the task started. I have long runtimes, but currently have two tasks from Ralph that topped 100,000,000 page faults. One in 15hrs and the other in 19hrs. This is the highest fault rate I've ever seen. Indeed, I recall the days when I thought that 1M per hour of runtime was excessive.

The only solice I can offer is that not all faults are hard faults to disk. Some recorded faults are "soft". Perhaps someone else can further elaborate on the concepts.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 58013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Stephen

Send message
Joined: 26 Apr 08
Posts: 32
Credit: 429,286
RAC: 0
Message 58024 - Posted: 19 Dec 2008, 4:07:51 UTC
Last modified: 19 Dec 2008, 4:35:03 UTC

a WU will get to around 85% complete , progress will stay the same. time to completion stays around 10 minutes. i suspend all tasks, resume then the "stuck" WUs will complete.

edited: doing this also rolls back the "cpu time spent" to around 30 minutes
ID: 58024 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58027 - Posted: 19 Dec 2008, 5:47:30 UTC
Last modified: 19 Dec 2008, 5:49:36 UTC

Stephen, this may be part of why you are having problems keeping all 8 CPUs busy. Suggest you just let BOINC manage the machine for the next 12 hours or so. Don't abort, suspend, update, anything at all.

Some tasks will take longer then 3 hours to run, and their % complete progress bar will not move steadily. Rather then tell you the task has -30 minutes left, they reflect the situation by making time move very slowly after the task gets to 10 minutes remaining.

It's simply a problem with the estimate, not the work being done.
Rosetta Moderator: Mod.Sense
ID: 58027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4875
Credit: 4,472,466
RAC: 465
Message 58029 - Posted: 19 Dec 2008, 8:52:37 UTC

how do you "lose credit" on a task?
on this task i claimed 83 and got 68 for 4 hrs runtime. That is just weird when most of the other work I have been running always comes out on the plus side for granted.
ID: 58029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2799
Credit: 1,828,962
RAC: 521
Message 58030 - Posted: 19 Dec 2008, 9:54:17 UTC

ID: 58030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4875
Credit: 4,472,466
RAC: 465
Message 58031 - Posted: 19 Dec 2008, 12:26:59 UTC - in response to Message 58030.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=213832280


you didn't have to reboot your computer a few times during the tasks run did you?
that will kill a task.
ID: 58031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2799
Credit: 1,828,962
RAC: 521
Message 58032 - Posted: 19 Dec 2008, 14:14:26 UTC - in response to Message 58031.  
Last modified: 19 Dec 2008, 14:15:09 UTC

yes i did... thanks for that info a Microsoft upgrade required a reboot



https://boinc.bakerlab.org/rosetta/result.php?resultid=213832280


you didn't have to reboot your computer a few times during the tasks run did you?
that will kill a task.
ID: 58032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4875
Credit: 4,472,466
RAC: 465
Message 58033 - Posted: 19 Dec 2008, 14:44:26 UTC - in response to Message 58032.  


heres a tip: before rebooting, because you never know how many times windows will want you to do that when you do a update install, goto the activity tab of boinc manager and put all activity in suspend. wait for your hardrive to stop grinding away with all the saving and then you can reboot. also be sure to have the leave jobs/tasks in memory turned on as well. then you will not lose your position in the task. suspend seems to save everything to the hardrive and you can reboot all you want and not lose any data for the task.

yes i did... thanks for that info a Microsoft upgrade required a reboot



https://boinc.bakerlab.org/rosetta/result.php?resultid=213832280


you didn't have to reboot your computer a few times during the tasks run did you?
that will kill a task.

ID: 58033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2799
Credit: 1,828,962
RAC: 521
Message 58034 - Posted: 19 Dec 2008, 14:50:09 UTC - in response to Message 58033.  




thanks again ...ill do that next time






heres a tip: before rebooting, because you never know how many times windows will want you to do that when you do a update install, goto the activity tab of boinc manager and put all activity in suspend. wait for your hardrive to stop grinding away with all the saving and then you can reboot. also be sure to have the leave jobs/tasks in memory turned on as well. then you will not lose your position in the task. suspend seems to save everything to the hardrive and you can reboot all you want and not lose any data for the task.

yes i did... thanks for that info a Microsoft upgrade required a reboot



https://boinc.bakerlab.org/rosetta/result.php?resultid=213832280


you didn't have to reboot your computer a few times during the tasks run did you?
that will kill a task.


ID: 58034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58035 - Posted: 19 Dec 2008, 15:18:35 UTC
Last modified: 19 Dec 2008, 17:09:18 UTC

I do not agree with greg's comments about preservation of work and reasons why, but would prefer to take them up in another thread if you'd like to discuss further.

[edit]
We're discussing this under a new thread here.
Rosetta Moderator: Mod.Sense
ID: 58035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2799
Credit: 1,828,962
RAC: 521
Message 58037 - Posted: 19 Dec 2008, 15:30:07 UTC - in response to Message 58035.  


ok i just want to know what to do



I do not agree with greg's comments about preservation of work and reasons why, but would prefer to take them up in another thread if you'd like to discuss further.

ID: 58037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kr12

Send message
Joined: 6 Dec 07
Posts: 2
Credit: 85,902
RAC: 0
Message 58044 - Posted: 19 Dec 2008, 20:25:15 UTC

"graphic viewer" hangs with this task
cs_noe_fullw_nolin_homo_bench_cs_noe_abrelax_cs_mth1598_olange_5607_11086_0
(https://boinc.bakerlab.org/rosetta/result.php?resultid=215720373)
ID: 58044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
stewjack

Send message
Joined: 23 Apr 06
Posts: 39
Credit: 95,871
RAC: 0
Message 58050 - Posted: 20 Dec 2008, 4:43:14 UTC - in response to Message 58044.  

"graphic viewer" hangs with this task
cs_noe_fullw_nolin_homo_bench_cs_noe_abrelax_cs_mth1598_olange_5607_11086_0


I had the same thing happen with this similar WU.

cs_noe_fullw_nolin_homo_bench_cs_noe_abrelax_cs_nsp1_olange_5608_14752_0

Note: I didn't have time to mess with this one - so I just aborted it.

ID: 58050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rhb

Send message
Joined: 19 Jan 07
Posts: 5
Credit: 277,050
RAC: 0
Message 58052 - Posted: 20 Dec 2008, 7:14:45 UTC

I had a computation error. Running Ubuntu Linux 6.06, Boinc 5.4.9.
This is the first error I've seen in the last two weeks.

https://boinc.bakerlab.org/rosetta/result.php?resultid=215760302

Task ID 215760302
Name cs_noe_fullw_nolin_homo_bench_cs_noe_abrelax_cs_nsp1_olange_5608_24330_0
Workunit 196639962

<core_client_version>5.4.9</core_client_version>
<message>
process exited with code 193 (0xc1)
</message>
<stderr_txt>
*** glibc detected *** double free or corruption (!prev): 0x0bd2d980 ***
SIGABRT: abort called


ID: 58052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 58068 - Posted: 20 Dec 2008, 20:53:45 UTC

Hi.

This one has problems, it's failed twice.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=194507659

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (15 frames):
[0x8b979b7]
[0x8bc20b0]
[0xffffe500]
[0x84c0863]
[0x85ddf0a]
[0x85df32e]
[0x85e65b8]
[0x819a650]
[0x818d3b7]
[0x818ee89]
[0x8127771]
[0x8129a1a]
[0x804b9c8]
[0x8c1dbac]
[0x8048111]

Exiting...

</stderr_txt>

pete.

ID: 58068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 217
Credit: 9,696,863
RAC: 4,162
Message 58084 - Posted: 21 Dec 2008, 3:19:52 UTC

I'm seeing problems when attempting to show graphics on workunits with names such as cs_noe* on Mac OS X 10.4.11. Its seems like several other people are seeing similar problems.

The first time Show graphics is pressed the graphics app starts and displays a blank window. Moving the mouse causes the graphics app to crash.

The second and subsequent times Show graphics is pressed the graphics app starts and displays a blank window along with the spinning rainbow beach ball. The graphics app is frozen and you can't even force quit in the normal way: it's necessary to quit via the Activity Monitor.
ID: 58084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 58087 - Posted: 21 Dec 2008, 9:38:41 UTC
Last modified: 21 Dec 2008, 9:41:39 UTC

Running Debian Linux , Boinc 6.2.14.

https://boinc.bakerlab.org/result.php?resultid=215464278

Task ID 215464278
Name cc_nonideal_1_3_nocst4_hb_t286__IGNORE_THE_REST_1VYHA_6_5693_20_0
Workunit 196380006

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
*** glibc detected *** double free or corruption (!prev): 0x0e13a4f0 ***
SIGABRT: abort called
Stack trace (23 frames):
ID: 58087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
NewtonianRefractor

Send message
Joined: 29 Sep 08
Posts: 19
Credit: 2,350,860
RAC: 0
Message 58088 - Posted: 21 Dec 2008, 10:17:10 UTC

The graphics for one of my Minirosetta 1.47 work units crash. If I click on the show graphics button under boinc, a windows is launched, but it remains black and to close it I have to physically end the unresponsive process. The work unit runs fine though. It's under boinc 6.2.19
ID: 58088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Minirosetta v1.47 bug thread.



©2021 University of Washington
https://www.bakerlab.org