Message boards : Number crunching : Minirosetta v1.32 bug thread
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Greg- Surely, I can not answer your question. But the fact is, my experience is typical. People are experiencing this problem on some machines and not on others, so it seems to me that the problem lies with the project. It behooves the project managers to get this figured out. Rosetta is losing crunchers every day. This is not just a WU freezing or anything like that. When the problem occurs, it freezes the whole computer necessitating a trip to Task Manager to clear out the process. >>RSM so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not? http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Greg- Surely, I can not answer your question. But the fact is, my experience is typical. People are experiencing this problem on some machines and not on others, so it seems to me that the problem lies with the project. It behooves the project managers to get this figured out. Rosetta is losing crunchers every day. This is not just a WU freezing or anything like that. When the problem occurs, it freezes the whole computer necessitating a trip to Task Manager to clear out the process. >>RSM so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not? http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
There are different versions of the programs for different kinds of machines. That's why it's useful to describe what type of machine you have. For example, if the problem affected only machines running 64-bit operating systems, or only machines running 64-bit Windows Vista SP1, they could find the types of machines are affected faster if the people with problems mentioned what operating system they were running. Also, I've heard of problems affecting only machines running certain versions of BOINC, in which case it would be useful for the people with the problems to mention which version they were running. By the way, what's corsair memory - a brand name without the usual capital letter? so how is it that my single machine with 2 cores and some corsair memroy runs just fine and your multitude of machines does not? |
Keith T. Send message Joined: 1 Mar 07 Posts: 58 Credit: 34,135 RAC: 0 |
How far past the required runtime does a task need to go before it gets stopped by the watchdog? I did not abort the task, I suspended all other tasks and then ran it again, deliberatly stopping it 5 times before a checkpoint. I also monitored the graphics and confirmed that the task was re-starting from model 0 at each restart. The task crashed after the 5th restart. https://boinc.bakerlab.org/rosetta/result.php?resultid=187758720 Here are the details as that url will probably be gone in a few days:
I hope that I may get some credit retrospectively for my > 12 hours of CPU time. Keith. [edit] BTW the "wingman" who also used an AMD processor, got a "success". https://boinc.bakerlab.org/rosetta/workunit.php?wuid=171490091
[/edit] |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca. mitrichr, I don't believe anyone intended to imply that your machine is a suspect. To use the crime analogy, it is not the "suspect", it is the "victim". And when the police investigate crimes, they ask the victims a lot of questions. 100,000 people carried out their day yesterday and were not mugged. This certainly doesn't mean that person that was mugged did anything wrong. But knowing the details, and understanding the circumstances, can help prevent it from happening in the future. There are new mini releases being tested now on Ralph. But I am not certain how many of the reported issues have been addressed. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
This comment is not helpful. Different platformsOSs run a different application. The WU's that fail on one platform with one application run happily on another platform running another application. This indicates that the WUs themselves are mostly fine, but there's an issue somewhere with the application on that platformOS - specifically the mini-rosetta one rather than the Rosetta beta. Detailing the specific WUs that fall over and comparing them to the WUs that succeed may allow the coders to pinpoint what the one WU specifically asks for that the other one doesn't. This is an entirely normal process for bugbeta-testing, which I've been involved in to a greater or lesser degree since the early 90s on other platforms. Impatience doesn't help. If the problem was easy to solve it wouldn't have gone wrong in the first place - or it may just be a simple oversight that'll get cured in the next release. For all the complaints, 235000 (mostly mini-WUs?) completed successfully on Rosetta in the last 24 hours. That being the case it's not unfair at all to look into potential softwaremachine conflicts with the rare individual machines having an issue. If that weren't the case, this thread would have a thousand different people posting. And they're not, are they. There's about ten. So I'm grateful I'm getting any attention at all tbh. Thanks again to Mod.Sense for a good post. With all due respect, this business of checking individual computer symptoms to root out this problem is just so much ca-ca. |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Sid- Please know that I mean no disrespect. My only concern is to be able to run Rosetta on all four of my machines as I did previously. Right now, I have one XP which has had nary a problem and continues to crunch whatever comes from Rosetta. I have an almost identical XP machine and two Core-2-Duo's that are off the project. I think that Rosetta might be the single most important project at BOINC. But I can not leave machines on the project when WU's totally freeze the computer and no other crunching on other projects or other work that I do can not go forward. So, I am going to bow out of the debate. I hope that I will see notice of resolution in the RSS feed so that I can continue work which I deem very valuable. http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
mitrichr - only 2 of your computers have returned results. The others were detached or never replied. Based on that information it is kind of hard to see what the problems were with the other tasks that did not complete. I hope you are running the last of the Rosetta tasks so that the team and others can see what if any errors come up. One of your tasks from computer 2 got sent to 2 other users. 1 of which had a file transfer error and the other was running Linux on a 2.66 ghz machine and completed the task ok. Another user completed the other task that did not report from computer 2 and was running a 1.86 ghz dual core on Vista SP1. Computer 4's one task got sent to a intel xenon machine running MS Server 2003 and completed the task ok. That is 3 different machines with 3 different OS packages that completed ok of which 2 machines were older than yours. --------- Keith T - the task you linked to got stopped to many times and Boinc Manager terminated it on your machine due to that. It does not matter if you stopped it or it restarted itself due to machine reboot or otherwise, a stoppage is a stoppage. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
**Sorry for the double post** mitrichr - FYI only 2 of your computers have returned results. The others were detached or never replied. Based on that information it is kind of hard to see what the problems were with the other tasks that did not complete. I hope you are running the last of the Rosetta tasks so that the team and others can see what if any errors come up. One of your tasks from computer 2 got sent to 2 other users. 1 of which had a file transfer error and the other was running Linux on a 2.66 ghz machine and completed the task ok. Another user completed the other task that did not report from computer 2 and was running a 1.86 ghz dual core on Vista SP1. Computer 4's one task got sent to a intel xenon machine running MS Server 2003 and completed the task ok. That is 3 different machines with 3 different OS packages that completed ok of which 2 machines were older than yours. --------- Keith T - the task you linked to got stopped to many times and Boinc Manager terminated it on your machine due to that. It does not matter if you stopped it or it restarted itself due to machine reboot or otherwise, a stoppage is a stoppage. |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Greg- I was forced to detach on three computers, and did so probably about 7-10 days ago. I just could not keep minding Rosetta when I have work to do, stuff to read, or I am out cycling or hiking. That is a burden I can not assume. I have a bunch of projects on which I work, I have a job, etc. I am surprised you are seeing anything current on two machines, except of course that I did not detach as much as 30 days ago. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Well good luck with your other projects. I see your also on my second project. Greg- |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
[quote]Well good luck with your other projects. I see your also on my second project. [quote] Sorry, I do not understand, what do you mean "...my second project...."? >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Sid- No offence taken at all and I agree with everything you say - I run no other project. This is frustration coming out and if you were to look at my recent results I have nearly as much reason to be as frustrated as you (90% failure rate in the last couple of days after as much as 2 hours runtime). I could do with a few 5.98 WUs to go at too. Can someone point me to how I can reduce the WU runtimes to 2 hours or less? I'm hoping to get a few more completed before they crash out. |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,337 RAC: 1 |
log into your account click on Rosetta@home preferences & you can change your work unit runtime in there. the way i understand it is the shorter the runtime the more bandwidth you use up. Speedy Have a crunching good day!! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
[quote]Well good luck with your other projects. I see your also on my second project. meant to say the second Boinc project I am part of, Einstein at home |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Maybe you should point out to the group that your failures are related to this error: Can't acquire lockfile - exiting Can someone explain this message? Sid- |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Can someone point me to how I can reduce the WU runtimes to 2 hours or less? I'm hoping to get a few more completed before they crash out. I was reading that in the FAQ last night and just went blind when it came to finding it. Just looked again now and it's obvious. Got there in the end - thanks. Maybe you should point out to the group that your failures are related to this error: Can't acquire lockfile - exiting I thought I did - in msgs: 55318, 55323, 55343 and 55436. Peculiar thing is, as soon as I had a little moan about most WUs falling over I just had a great little run of successes, including one in excess of 3 hours. Go figure... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
I bought a new laptop a few days ago. I'm running Vista (x64) and I seem to be getting a lot errors. This error has shown up a couple times (can't acquire lock file). There are also 4 or 5 of these Error code:200. Overlooked this. An Intel Core2 Duo running Vista64 crashing out with too many exits after the same "Can't acquire lockfile - exiting" as I get on my AMD Quad Core Phenom running Vista64 - both with Boinc 6.2.18 and presumably the x64 version. Vista64 is the common factor again. |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
I forgot to note- Sid said earlier that there are only about 10 people complaining among all the folks running Rosetta. That 10 people may represent 10,000 who are having trouble, getting disillusioned and detaching. Not everyone with a problem comes here. Also, seeing all of the technical terminology, many might be put off. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
PJCN88 Send message Joined: 11 Aug 07 Posts: 2 Credit: 149,276 RAC: 0 |
I will also stop for the moment with Rosetta. In the weekend I reinstalled BOINC on my computer to change the data directory and from that point I got troubles with Minirosetta or I have computation errors or I have to abort because it says running but nothing is happening system info : 09/09/08 20:18:35||Starting BOINC client version 6.2.18 for windows_x86_64 09/09/08 20:18:35||log flags: task, file_xfer, sched_ops 09/09/08 20:18:35||Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3 09/09/08 20:18:35||Running as a daemon 09/09/08 20:18:35||Data directory: D:boincdata 09/09/08 20:18:35||Running under account boinc_master 09/09/08 20:18:35||Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz [Intel64 Family 6 Model 23 Stepping 6] 09/09/08 20:18:35||Processor features: fpu tsc pae nx sse sse2 pni 09/09/08 20:18:35||OS: Microsoft Windows Vista: Ultimate x64 Editon, Service Pack 1, (06.00.6001.00) 09/09/08 20:18:35||Memory: 4.00 GB physical, 8.17 GB virtual 09/09/08 20:18:35||Disk: 368.10 GB total, 352.50 GB free 09/09/08 20:18:35||Local time is UTC +2 hours 09/09/08 20:18:35||No coprocessors Patrick PS : no problem with the 5.98 rosetta beta |
Message boards :
Number crunching :
Minirosetta v1.32 bug thread
©2024 University of Washington
https://www.bakerlab.org