1)
Message boards :
Number crunching :
minirosetta v1.19 bug thread
(Message 53029)
Posted 13 May 2008 by Rom Walton (BOINC) Post: I'll throw in a bit more about the no heartbeat message. At least once per release cycle we try to resolve this issue, so far the attempts to resolve the issue has lead to crashes within the core client. DNS resolution is done through libcurl, and using either libcurl's native async-dns solution or the c-ares library hasn't resolved the issue. We haven't found a way to reproduce this issue in a lab environment, and so we haven't bee able to give the libcurl guys enough information to get it fixed. So until we can get more info to the libcurl guys who can then fix it, the no heartbeat message is better than a crash. |
2)
Message boards :
Number crunching :
minirosetta v1.19 bug thread
(Message 52979)
Posted 10 May 2008 by Rom Walton (BOINC) Post:
In this particular case there isn't anything that any of us can do, I've passed the info on to the MiniRosetta devs. Basically MiniRosetta is a 32-bit process, and generally 32-bit processes are limited to 2GB of user-mode memory. MiniRosetta hit that limit and so when it asked for more the OS said NO, leading to the crash. The sign that this sort of problem has occurred is: LoadLibraryA( dbghelp.dll ): GetLastError = 8 and - Virtual Memory Usage - Sorry for not explaining the situation sooner, I was heading for bed and I started thinking about how I was going to help the devs debug this problem in the wild if they are unable to reproduce this issue in the lab. At present there isn't anything in the BOINC application framework that'll help them debug this in the wild. |
3)
Message boards :
Number crunching :
minirosetta v1.19 bug thread
(Message 52961)
Posted 10 May 2008 by Rom Walton (BOINC) Post:
All those crashes are a result of an out of memory error. |
4)
Message boards :
Number crunching :
BOINC Q&A
(Message 26908)
Posted 16 Sep 2006 by Rom Walton (BOINC) Post: Click on the comments link at the bottom of the article. |
5)
Message boards :
Number crunching :
BOINC Q&A
(Message 26778)
Posted 14 Sep 2006 by Rom Walton (BOINC) Post: In an effort to improve communication between the BOINC project and the community about the future of the BOINC project I'll be holding a weekly Q&A on my blog. I'm fielding this as an experiment right now, as a way to find out the kinds of things the community is interested in knowning. If there is a lot of interest in this sort of thing maybe the guys who publish bunc will be willing to pick it up as part of their newsletter. What do you all think? |
6)
Message boards :
Number crunching :
Simple boinc installer
(Message 24233)
Posted 22 Aug 2006 by Rom Walton (BOINC) Post: Sorry, that was me. :) Forgot to switch back to me account before posting. |
7)
Message boards :
Number crunching :
Rosetta@Home Presentation
(Message 20286)
Posted 16 Jul 2006 by Rom Walton (BOINC) Post: I don't know if this has already been posted or not but I happened to watch this on TV and thought you all would be interested. http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=449 It is a presentation David Baker gave to the computer science department at the University of Washington. Enjoy. |
8)
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.24
(Message 19254)
Posted 24 Jun 2006 by Rom Walton (BOINC) Post:
Actually it is. There is only enough space in the feeder queue for 1,000 workunits. When the scheduler connects up to the feeder queue to get work it cycles through all 1,000 slots looking for available work. When all 1,000 queue slots are filled up with large jobs that is what the server returns. Splitting the queue up equally is supported with different applications. If this is really a big problem we could set things up in such a way that the project believes it has more than one application and 50% of the queue is saved for each application. |
9)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12797)
Posted 29 Mar 2006 by Rom Walton (BOINC) Post: Report all Work Unit errors on this thread that are NOT - That error code useally means the machine ran out of memory during the execution of the workunit. Since you only have 512MB of RAM and one instance of Rosetta can use up to 250MB of Ram, I would recommend turning off HT. |
10)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12747)
Posted 28 Mar 2006 by Rom Walton (BOINC) Post: Contact me offline and I'll let you know where to send it. ----- Rom |
11)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12741)
Posted 28 Mar 2006 by Rom Walton (BOINC) Post: What is the size of your BOINC directory? How many days worth of workunits do your have? Which projects are attached? Would you be willing to make a copy of the directory and in the copy abort all of the other workunits except the one that is stalling and zip everything up and send it to me? |
12)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12478)
Posted 22 Mar 2006 by Rom Walton (BOINC) Post: A new version of Rosetta has been posted in the RALPH@Home project. Release Notes For those who are so inclined, please help us track down the issue by running RALPH@Home and if/when you find a workunit with the '1% bug' feel free to abort it and call it out in this thread. Thanks in advance for any help you can provide. ----- Rom |
13)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12304)
Posted 20 Mar 2006 by Rom Walton (BOINC) Post: Ummmmm, that is the bug, or rather the manifestation of the bug that you are seeing. I wish I had a good answer for ya, all I can say is this issue will become a thing of the past over the next day or two. |
14)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12277)
Posted 19 Mar 2006 by Rom Walton (BOINC) Post: Ummmmm, that is the bug, or rather the manifestation of the bug that you are seeing. The bug I fixed is the stay in memory or crash bug. ----- Rom |
15)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12245)
Posted 19 Mar 2006 by Rom Walton (BOINC) Post: David Baker asked me to post a more detailed write-up on what we have been able to track down thus far. I have posted the additional information to my blog since it can handle tables. Think of it as a birds eye view of the project. ----- Rom |
16)
Message boards :
Number crunching :
Report stuck & aborted WU here please
(Message 12230)
Posted 19 Mar 2006 by Rom Walton (BOINC) Post: |
17)
Message boards :
Number crunching :
Report stuck & aborted WU here please
(Message 12228)
Posted 19 Mar 2006 by Rom Walton (BOINC) Post: |
18)
Message boards :
Number crunching :
Report stuck & aborted WU here please
(Message 12225)
Posted 18 Mar 2006 by Rom Walton (BOINC) Post:
The ERR_NESTED_UNHANDLED_EXCEPTION_DETECTED no longer appears on the list at all, and the 0xC0000005 only accounts for 6 of the 49 errors reported in the last 24 hours. If the data of Ralph is any indication about how the application is going to behave on the public project it should result in a 60%-70% in error rate for the public project. ----- Rom |
©2024 University of Washington
https://www.bakerlab.org