Message boards : Number crunching : Problems with Rosetta stable version 5.69 and beta version 5.77
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Something goes wrong with 5.77 on my machine. It gets down to saying 00:09:57 and then stays there. So for the second time I am about to abort a task. Suspect this old PC just isn't capable or something. Been chugging along with RAH for over a year, I guess, but maybe it's time to quit. Jerry, ever tried just letting them run? They are probably doing just fine. Rosetta has no way to know ahead of time exactly how long it will take to crunch a given model, so the estimate is... well... just an estimate. Once it gets down to within 10 minutes of your target runtime, it starts to exponentially reduce the time remaining less and less, with the idea being that the time to completion will generally still be going down. The watch dog is always there looking over your tasks, and prepared to abort them if it deems necessary. Rosetta Moderator: Mod.Sense |
Warren B. Rogers Send message Joined: 3 Oct 05 Posts: 5 Credit: 1,127,824 RAC: 0 |
Something goes wrong with 5.77 on my machine. It gets down to saying 00:09:57 and then stays there. So for the second time I am about to abort a task. Suspect this old PC just isn't capable or something. Been chugging along with RAH for over a year, I guess, but maybe it's time to quit. Good day all. I've also noticed this problem with version 5.77 but it doesn't happen all of the time only occasionally. When I notice that this is happening I usually suspend that WU and let something else run. The one time I just let it run it was at 09:57 for about 3 hours and the count down timer was not moving. Also, I saw the amount of work being done drop to the 1000th of a percent/sec which was considerably slower than it was for the first 96% of the WU. I didn't want to abort the WU and just hoped that if it did something else for a while and worked it's way back to the WU it would finish it properly. Well, it did take about 1 hour to finish but it was much better than the rate it was moving at. I hope my experience is helpful. Warren Rogers |
Winkle Send message Joined: 22 May 06 Posts: 88 Credit: 1,354,930 RAC: 0 |
Hi On Beta 5.77. I suspended the Rosetta project through BOINC manager, and the project came up suspended, but task manager still says Rosetta is running at approx 90% CPU time. Is this a known bug ? Ian |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Winkle, I hear such thing about Linux off and on, but I see all of your machines are Windows. Which one is seeing this occur? What BOINC version is installed there? Rosetta Moderator: Mod.Sense |
Warren B. Rogers Send message Joined: 3 Oct 05 Posts: 5 Credit: 1,127,824 RAC: 0 |
Yes it did take about 19K seconds to complete and I had another on that took 20K to complete as well. The problem is that it is taking about 1 1/2 to 2 hours to get to the 10 minute mark then it sort of hangs up there for about 4 hours unless I suspend the project and let something else run and let the BOINC manager work it's way back to the WU. Thanks, Warren |
DerAndreas Send message Joined: 21 Jan 07 Posts: 2 Credit: 110,247 RAC: 0 |
Hello to all, On my Both machines there are download problems. With 5.69 there shows this message |Sending scheduler request: Requested by user |Reporting 1 tasks |Scheduler RPC succeeded |Message from server: Server can't open log file (../log_boinc/cgi.log) |Deferring communication for 1 hr 0 min 0 sec |Reason: project is down On the other maschine wich is running 5.77 there is this message: |[file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1iibA-_filters_1782_486910_0_0 |[error] Error on file upload: can't open log file |[file_xfer] Temporarily failed upload of CNTRL_01ABRELAX_SAVE_ALL_OUT_-1iibA-_filters_1782_486910_0_0: transient upload error |Backing off 5 min 38 sec on upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1iibA-_filters_1782_486910_0_0 Sometimes the second message will by on the first one too. On both machine running Boinc 5.10.20 the jobs and the error are in my previous version of Boinc 5.8.15 the same. What can i do? -- Greetings from Germany |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Hello to all, Nohting but wait for now:) |
Winkle Send message Joined: 22 May 06 Posts: 88 Credit: 1,354,930 RAC: 0 |
Sorry for the late reply. The BOINC version is 5.4.9. I just got back from Fiji, and left it running while I was away. All the other machines were fine, but on this one BOINC had locked up. I rebooted and had to kill the tasks, and now operates normally. The machine number is 225833 Regards Ian Winkle, I hear such thing about Linux off and on, but I see all of your machines are Windows. Which one is seeing this occur? What BOINC version is installed there? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Winkle, looks like you are running BOINC version 5.4.9 on that machine. Suggest you start by downloading and installing a more current BOINC version. Rosetta Moderator: Mod.Sense |
Winkle Send message Joined: 22 May 06 Posts: 88 Credit: 1,354,930 RAC: 0 |
Thanks, will do. I hadn't realised it all had changed so much. The computers simply sit there and crunch away. Winkle, looks like you are running BOINC version 5.4.9 on that machine. Suggest you start by downloading and installing a more current BOINC version. |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
dont know if this is a good place to post, but i c some admins and stuff post, so i hope i got some attention. please read my post about the abrelax WU's, toppictitel is abbrelax btw ;) i have 6 failed WU's most about a wrong sin/cosin value, so if some1 could have a look at it. Luuklag |
Mike Francis Send message Joined: 24 Nov 05 Posts: 8 Credit: 623,519 RAC: 0 |
For quite a while I have had no prblems with any of the work units I have been sent. Today, when I got home, there was one unit that had already been sent in and had a compute error,don't know what caused it. While I was looking at it,one unit finished premature'ly. The error message I got was; 10/3/2007 5:49:23 PM|rosetta@home|Reason: Unrecoverable error for result CNTRL_01ABRELAX_SAVE_ALL_OUT_-1a19A-_filters_1782_524938_0 ( - exit code -1073741819 (0xc0000005)) 10/3/2007 5:49:23 PM|rosetta@home|Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1a19A-_filters_1782_524938_0 finished Hope this is of use. Mike F, |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
I do not know if this post is needed becuase I have a seperate Number crunching post about it validating but not showing up on bonicstats with the 2 other rosetta wu's that my pc completed yesterday. These were 5.80 wu's. Since it was a 5.69 wu maybe you need to look at it. 109142983 If need to look at my full post here is the link. Msg3638 Jmarks |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Problem with this one, got stuck. app 5.69. CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_490977_0_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=100428228 Pete. |
[XTBA>XTC] ZeuZ Send message Joined: 4 Jun 06 Posts: 2 Credit: 19,725 RAC: 0 |
Hello everybody Somes crunchers seems to have a problem with the granted credit, the claimed credit is higher than the granted credit on a lot of wus We don't know what happens, it's a bit annoying For exemple https://boinc.bakerlab.org/rosetta/workunit.php?wuid=99953073 https://boinc.bakerlab.org/rosetta/results.php?hostid=587047&offset=40 https://boinc.bakerlab.org/rosetta/results.php?hostid=603973 Why these computers have that problem while the other don't? Thank you very much ZeuZ @ L'Alliance Francophone - XTC Mini TEAM |
[AF>EDLS>Physique] Pas93 Send message Joined: 28 Sep 05 Posts: 3 Credit: 1,436,260 RAC: 0 |
Yes, i'm the same problem https://boinc.bakerlab.org/rosetta/results.php?hostid=591113 :( |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i thought you guys would be happy with more credit. rosie is rewarding your computer for finishing faster than the average of the other computers that have run this model. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
i thought you guys would be happy with more credit. Actually ZueZ's point was they are awarded LESS then claimed. But Greg is on to the right cause, your machine took longer to complete the task then the BOINC benchmarks would have predicted. The benchmarks don't require much memory nor much L2 cache, and so they don't test enough to give a good prediction on how long a given machine will take to complete a given amount of work. The credit claimed is just based on your machine's benchmark rating for the time period it worked on the task. The credit granted is based on an average of the claims of others that have also worked on models from the same batch of work. Rosetta Moderator: Mod.Sense |
[XTBA>XTC] ZeuZ Send message Joined: 4 Jun 06 Posts: 2 Credit: 19,725 RAC: 0 |
Thank you for your replies guys So the benchmark is a bit falsified on some of our machines, ok but here is a problem all the same https://boinc.bakerlab.org/rosetta/workunit.php?wuid=99953073 The benchmark seem to be right, a core2quad is a fast cpu, 2 point for 11 000 seconde of calculation is very strange, i think it's a problem on some wu because the other wu calculated are right https://boinc.bakerlab.org/rosetta/results.php?hostid=573215&offset=40 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Thank you for your replies guys David, Rhiju, check this out. The result shows two! completion sections. One shows 29 decoys and 10494 seconds of CPU, the other shows 30 decoys and 10861.7 seconds of CPU. ...and either one would normally have granted more then 2 credits. It's almost like it completed the task once and then later ran another model on it. This is Windows XP Pro. on Intel Core2 Quad. With only 1GB of memory for 4CPUs, it would say this machine is probably memory constrained. From what I can tell, more of the machines reporting Linux problems are memory constrained as well. ZeuZ, I didn't mean to say any of the bechmarks were falsified (although some people do that, and that is a big part of why Rosetta made a more independant credit system). I only meant that the work measured in the benchmark is trivial (simple) when compared to running Rosetta. So it is possible for one machine to show benchmarks twice as high as another, and yet it does not get twice as much work done. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Problems with Rosetta stable version 5.69 and beta version 5.77
©2024 University of Washington
https://www.bakerlab.org