Message boards : Number crunching : 32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!)
Author | Message |
---|---|
Franklin Bowen Send message Joined: 11 Dec 05 Posts: 4 Credit: 13,591 RAC: 0 |
How do I keep Rosette@Home from producing such a large stderr.txt file? I placed a zipped version of the stderr.txt and a screen shot of windows explorer on my web site at: http://fmbbowen.com:39353/misc See files: stderr.zip RosettaProblem.jpg Thanks! |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 10 |
Someone from Staff will have to look at that file. I would look, but I doubt I'd be able to come to any conclusions, and there is no point in slamming your bandwidth with a lot of people downloading the file. Obviously something went into a tight error loop, generating many errors/second... |
Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0 |
I see from your screenshot, that you have 2 slot folders but only single-cored, signle CPU machine. This indicate that you are or have been attached to more that one project (Rosetta). So, this may be a rare BOINC but as well. Still not nice. |
Franklin Bowen Send message Joined: 11 Dec 05 Posts: 4 Credit: 13,591 RAC: 0 |
I see from your screenshot, that you have 2 slot folders but only single-cored, signle CPU machine. This indicate that you are or have been attached to more that one project (Rosetta). The other slot is SETI. I was solely running SETI until their servers stopped handing out work because they were overloaded. I now run at least two projects per machine just to maximize CPU usage. |
Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0 |
The other slot is SETI. I was solely running SETI until their servers stopped handing out work because they were overloaded. I now run at least two projects per machine just to maximize CPU usage.Thanks for making it clear. For those with curiosity and to prevent further download of this quite large file, I'm attaching first lines of the 32GB file. Now it's quite clear that Rosetta is involved in this issue.
[last line repeating ad infinitum]. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 10 |
Franklin, looking at your results, do you have "leave applications in memory when preempted" set to "no" by any chance? You're getting a very high number of errors... if this is "no", please change it to "yes". This may solve all your problems. |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
[last line repeating ad infinitum].[/quote] Humm. SymGetLineFromAddr is a function involved in debugging. Bill Michael's suggestion may help cure the symptoms for you, but the underlying problem is a Rosetta problem. I'd want to punt this one to David Baker and co. and suggest that they look and se where they're looping expecting a successful return value from this function. Maybe looking at that logic (i.e. failing gracefully) when this error happens would be the correct solution to this problem. -- Later -- Just have a look round on your system, see if you can find a DLL called DbgHelp.dll anywhere. Food for thought. winerror.h claims that 126 could be "ERROR_MOD_NOT_FOUND" which makes me wonder if DbgHelp.dll has gone AWOL. That could cause what we're seeing. |
Franklin Bowen Send message Joined: 11 Dec 05 Posts: 4 Credit: 13,591 RAC: 0 |
Just have a look round on your system, see if you can find a DLL called DbgHelp.dll anywhere. Food for thought. winerror.h claims that 126 could be "ERROR_MOD_NOT_FOUND" which makes me wonder if DbgHelp.dll has gone AWOL. That could cause what we're seeing.[/quote] I changed "leave applications in memory when preempted" to yes. For DbgHelp.dll, see: http://fmbbowen.com:39353/misc/DbgHelp.jpg |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
It seems like this is probably a rosetta issue, though I have never seen this bug before. There where a couple of other people who reported very large stderr.txt files appearing on their computers. Thanks for going through the trouble to make it available to us, it will make it easier for us to try to figure out whats' going on. If you still have it, it would be useful to also see stdout.txt I apologize for this problem, we will look into it right away. Thanks again for letting us know. |
tick Send message Joined: 21 Nov 05 Posts: 2 Credit: 1,456 RAC: 0 |
Hi, I have a similar problem here. The name of the workunit is "1n0u__topology_sample_197016". The size of the stderr.txt is about 2 GB (header see below). I also had "leave applications in memory when preempted" set to no. As boinc switched to another project i noticed in the Taskmanager that the rosetta-process remained processing (it was not idling). # ===================================== # random seed: 1228961 # ===================================== ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x7C921E58 read attempt to address 0x3F8FC1B0 1: 12/19/05 18:04:00 1: SymGetLineFromAddr(): GetLastError = 126 1: SymGetLineFromAddr(): GetLastError = 126 |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 10 |
I also had "leave applications in memory when preempted" set to no. While this setting shouldn't cause THIS problem, there is a known bug that causes Rosetta errors if it is not left in memory... |
tick Send message Joined: 21 Nov 05 Posts: 2 Credit: 1,456 RAC: 0 |
I changed the stderr.txt leaving only the "random seed" in it, because boinc froze while trying to handle the large file. Now boinc and rosetta seem to work well again. The result was send out and the workunit now is ready to report. |
Franklin Bowen Send message Joined: 11 Dec 05 Posts: 4 Credit: 13,591 RAC: 0 |
If you still have it, it would be useful to also see stdout.txt http://fmbbowen.com:39353/misc/stdout.zip Rosetta has been running since the problem occurred. The file is about 180K unzipped. NP |
Phil Send message Joined: 20 Dec 05 Posts: 1 Credit: 270,809 RAC: 0 |
Just a +1 message; same thing just happened to me. I got a Windows message informing me that my (180 GB) hard drive was low on space and found that the stderr.txt file that Rosetta was using was listed as 150 GB. Unfortunately, my first thought after not being able to open the file was simply to close BOINC, delete the file, and restart it, so I can't add any more information. I changed my "leave application in memory when suspended" preference to 'yes' and will see if it happens again. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 10 |
I changed my "leave application in memory when suspended" preference to 'yes' and will see if it happens again. Thanks! |
Message boards :
Number crunching :
32 GB stderr.txt (no, that is NOT a typo, I said gigabyte!)
©2024 University of Washington
https://www.bakerlab.org