32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!)

Message boards : Number crunching : 32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!)

To post messages, you must log in.

AuthorMessage
Franklin Bowen

Send message
Joined: 11 Dec 05
Posts: 4
Credit: 13,591
RAC: 0
Message 6783 - Posted: 19 Dec 2005, 16:53:04 UTC

How do I keep Rosette@Home from producing such a large stderr.txt file?

I placed a zipped version of the stderr.txt and a screen shot of windows explorer on my web site at:

http://fmbbowen.com:39353/misc

See files:
stderr.zip
RosettaProblem.jpg

Thanks!

ID: 6783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,693,680
RAC: 38
Message 6787 - Posted: 19 Dec 2005, 17:08:33 UTC

Someone from Staff will have to look at that file. I would look, but I doubt I'd be able to come to any conclusions, and there is no point in slamming your bandwidth with a lot of people downloading the file.

Obviously something went into a tight error loop, generating many errors/second...

ID: 6787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Honza

Send message
Joined: 18 Sep 05
Posts: 48
Credit: 173,517
RAC: 0
Message 6804 - Posted: 19 Dec 2005, 19:27:34 UTC

I see from your screenshot, that you have 2 slot folders but only single-cored, signle CPU machine. This indicate that you are or have been attached to more that one project (Rosetta).
So, this may be a rare BOINC but as well.

Still not nice.
ID: 6804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Franklin Bowen

Send message
Joined: 11 Dec 05
Posts: 4
Credit: 13,591
RAC: 0
Message 6807 - Posted: 19 Dec 2005, 19:41:15 UTC - in response to Message 6804.  

I see from your screenshot, that you have 2 slot folders but only single-cored, signle CPU machine. This indicate that you are or have been attached to more that one project (Rosetta).
So, this may be a rare BOINC but as well.

Still not nice.


The other slot is SETI. I was solely running SETI until their servers stopped handing out work because they were overloaded. I now run at least two projects per machine just to maximize CPU usage.

ID: 6807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Honza

Send message
Joined: 18 Sep 05
Posts: 48
Credit: 173,517
RAC: 0
Message 6813 - Posted: 19 Dec 2005, 21:35:33 UTC - in response to Message 6807.  

The other slot is SETI. I was solely running SETI until their servers stopped handing out work because they were overloaded. I now run at least two projects per machine just to maximize CPU usage.
Thanks for making it clear.

For those with curiosity and to prevent further download of this quite large file, I'm attaching first lines of the 32GB file.
Now it's quite clear that Rosetta is involved in this issue.


# =====================================
# random seed: 793501
# =====================================
# =====================================
# random seed: 801521
# =====================================

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C911E58 read attempt to address 0x402118E8

1: 12/18/05 12:09:11
1: SymGetLineFromAddr(): GetLastError = 126
1: SymGetLineFromAddr(): GetLastError = 126

[last line repeating ad infinitum].
ID: 6813 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,693,680
RAC: 38
Message 6815 - Posted: 19 Dec 2005, 21:52:28 UTC

Franklin, looking at your results, do you have "leave applications in memory when preempted" set to "no" by any chance? You're getting a very high number of errors... if this is "no", please change it to "yes". This may solve all your problems.

ID: 6815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 6826 - Posted: 20 Dec 2005, 1:09:56 UTC - in response to Message 6813.  


1: SymGetLineFromAddr(): GetLastError = 126
1: SymGetLineFromAddr(): GetLastError = 126

[last line repeating ad infinitum].[/quote]

Humm. SymGetLineFromAddr is a function involved in debugging. Bill Michael's suggestion may help cure the symptoms for you, but the underlying problem is a Rosetta problem. I'd want to punt this one to David Baker and co. and suggest that they look and se where they're looping expecting a successful return value from this function.

Maybe looking at that logic (i.e. failing gracefully) when this error happens would be the correct solution to this problem.

-- Later --

Just have a look round on your system, see if you can find a DLL called DbgHelp.dll anywhere. Food for thought. winerror.h claims that 126 could be "ERROR_MOD_NOT_FOUND" which makes me wonder if DbgHelp.dll has gone AWOL. That could cause what we're seeing.
ID: 6826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Franklin Bowen

Send message
Joined: 11 Dec 05
Posts: 4
Credit: 13,591
RAC: 0
Message 6830 - Posted: 20 Dec 2005, 2:57:12 UTC - in response to Message 6826.  


1: SymGetLineFromAddr(): GetLastError = 126
1: SymGetLineFromAddr(): GetLastError = 126

[last line repeating ad infinitum].


Just have a look round on your system, see if you can find a DLL called DbgHelp.dll anywhere. Food for thought. winerror.h claims that 126 could be "ERROR_MOD_NOT_FOUND" which makes me wonder if DbgHelp.dll has gone AWOL. That could cause what we're seeing.[/quote]

I changed "leave applications in memory when preempted" to yes.

For DbgHelp.dll, see:

http://fmbbowen.com:39353/misc/DbgHelp.jpg

ID: 6830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 6838 - Posted: 20 Dec 2005, 6:20:39 UTC

It seems like this is probably a rosetta issue, though I have never seen this bug before. There where a couple of other people who reported very large stderr.txt files appearing on their computers. Thanks for going through the trouble to make it available to us, it will make it easier for us to try to figure out whats' going on. If you still have it, it would be useful to also see stdout.txt

I apologize for this problem, we will look into it right away.

Thanks again for letting us know.
ID: 6838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tick

Send message
Joined: 21 Nov 05
Posts: 2
Credit: 1,456
RAC: 0
Message 7015 - Posted: 21 Dec 2005, 14:38:05 UTC - in response to Message 6838.  

Hi,

I have a similar problem here. The name of the workunit is "1n0u__topology_sample_197016". The size of the stderr.txt is about 2 GB (header see below). I also had "leave applications in memory when preempted" set to no.
As boinc switched to another project i noticed in the Taskmanager that the rosetta-process remained processing (it was not idling).


# =====================================
# random seed: 1228961
# =====================================

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C921E58 read attempt to address 0x3F8FC1B0

1: 12/19/05 18:04:00
1: SymGetLineFromAddr(): GetLastError = 126
1: SymGetLineFromAddr(): GetLastError = 126
ID: 7015 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,693,680
RAC: 38
Message 7016 - Posted: 21 Dec 2005, 14:40:52 UTC - in response to Message 7015.  

I also had "leave applications in memory when preempted" set to no.


While this setting shouldn't cause THIS problem, there is a known bug that causes Rosetta errors if it is not left in memory...

ID: 7016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tick

Send message
Joined: 21 Nov 05
Posts: 2
Credit: 1,456
RAC: 0
Message 7020 - Posted: 21 Dec 2005, 15:05:04 UTC - in response to Message 7016.  

I changed the stderr.txt leaving only the "random seed" in it, because boinc froze while trying to handle the large file. Now boinc and rosetta seem to work well again. The result was send out and the workunit now is ready to report.
ID: 7020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Franklin Bowen

Send message
Joined: 11 Dec 05
Posts: 4
Credit: 13,591
RAC: 0
Message 7126 - Posted: 22 Dec 2005, 1:58:12 UTC - in response to Message 6838.  

If you still have it, it would be useful to also see stdout.txt

I apologize for this problem, we will look into it right away.

Thanks again for letting us know.



http://fmbbowen.com:39353/misc/stdout.zip

Rosetta has been running since the problem occurred. The file is about 180K unzipped.

NP
ID: 7126 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Phil

Send message
Joined: 20 Dec 05
Posts: 1
Credit: 270,809
RAC: 0
Message 7683 - Posted: 26 Dec 2005, 22:45:06 UTC

Just a +1 message; same thing just happened to me.

I got a Windows message informing me that my (180 GB) hard drive was low on space and found that the stderr.txt file that Rosetta was using was listed as 150 GB. Unfortunately, my first thought after not being able to open the file was simply to close BOINC, delete the file, and restart it, so I can't add any more information.

I changed my "leave application in memory when suspended" preference to 'yes' and will see if it happens again.
ID: 7683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,693,680
RAC: 38
Message 7692 - Posted: 26 Dec 2005, 23:46:45 UTC - in response to Message 7683.  

I changed my "leave application in memory when suspended" preference to 'yes' and will see if it happens again.


Thanks!

ID: 7692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : 32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!)



©2024 University of Washington
https://www.bakerlab.org