Message boards : Number crunching : Computational Error
Author | Message |
---|---|
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
Rosetta returns computational error when BOINC CC 4.45 does CPU Benchmarks This probably is due to Rosetta being removed from memory while BOINC runs benchmark. |
Ocean Archer Send message Joined: 22 Sep 05 Posts: 32 Credit: 49,302 RAC: 0 |
This re-enforces my comment that Rosetta does not like to share - regardless of the reason. LevelStake --- do you have the option to leave the program in memory selected? |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
This re-enforces my comment that Rosetta does not like to share - regardless of the reason. LevelStake --- do you have the option to leave the program in memory selected? Yes, all BOINC projects I run are set to stay in memory but BOINC CC seems to throw them out when doing an Auto Benchmark |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
To make the system consistent for measurement purposes, you have to unload the system as much as possible. Thus, halting the science applications and removing them from memory. Now here is a question, what happens if BOINC is halted while the application is in memory. Does it equally abend the work when restarted? Or is there some subtle difference in the way the application responds to the two unloads? |
Ocean Archer Send message Joined: 22 Sep 05 Posts: 32 Credit: 49,302 RAC: 0 |
Paul -- I cannot speak for others, but in my small, old, slow machines running WindowsME or Windows 2000, I do not have the same problem with LHC, SETI, Einstein, Predictor or PrimeGrid. Since my machines are old and slow, one would think they would be first to show problems; such is not the case. A couple of my machines have been upgraded to BOINC 5.x.y, so I cannot retest all machines with all combinations unless I trash the upgrade and go back to version 4.45. Excluding Climate Prediction (I don't run it), the bottom line is - Rosetta remains the only project that I cannot run in conjunction with other BOINC projects. As far as stopping and restarting Rosetta, I have found on my machines that even when power to the computer is interrupted and the system later restarted, the project WU does not fail, and the system simply picks up and continues to process ... (edited) |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Ocean Archer, The "unload"/Client error problem *IS* with the rosetta science application , no question about that in anyones mind. Why this is so is the burning question. And the answer is, right now, no one knows. If you rummage my account you can see I have had one. But, my machines are almost always higher end than most peoples (I have not gotten or run a system with less than 1G of RAM for several years). So, I am not finding fault and would, and will be delighted if we can figure this one out. I know David Kim is looking into the problem ... |
Ocean Archer Send message Joined: 22 Sep 05 Posts: 32 Credit: 49,302 RAC: 0 |
Paul and David -- I'm not the expert in this area, but I'm willing to break into any of my machines and configure them for testing purposes. How can I be of service?? |
Jord Send message Joined: 16 Sep 05 Posts: 41 Credit: 204,120 RAC: 0 |
That science applications are unloaded from memory when making a benchmark, is fixed in the next version of Boinc. Using 5.1.8 I haven't found any problems of Rosetta being in memory together with Seti, Einstein, Seti Beta, Boinc Alpha and two other projects, or with Rosetta being in memory at the time of the automated benchmark or a forced manual one. So in my opinion, if everyone may need to get 5.2.0 in about a week's time anyway, does time need to be spend on fixing it for older versions of Boinc? In that case, you'd best find the "leave in memory while benchmarking" fix in the CVS and build a couple of older Boinc versions with that fix. ;) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
That science applications are unloaded from memory when making a benchmark, is fixed in the next version of Boinc. Can anyone else confirm this (just wondering if this holds for other platforms/Windows versions)? I guess people with limited memory may still want to have apps removed from memory when running multiple projects. I am wondering how the other projects dealt with this issue/bug? anyone know? |
Jord Send message Joined: 16 Sep 05 Posts: 41 Credit: 204,120 RAC: 0 |
As far as I know it is for all platforms, David. But look: 08/10/2005 03:19:15||Suspending computation and network activity - running CPU benchmarks 08/10/2005 03:19:15|rosetta@home|Pausing result 1cfyA_abrelax_no_cst_10632_1 (left in memory) 08/10/2005 03:19:17||Running CPU benchmarks 08/10/2005 03:20:16||Benchmark results: 08/10/2005 03:20:16|| Number of CPUs: 1 08/10/2005 03:20:16|| 1217 double precision MIPS (Whetstone) per CPU 08/10/2005 03:20:16|| 2423 integer MIPS (Dhrystone) per CPU 08/10/2005 03:20:16||Finished CPU benchmarks 08/10/2005 03:20:17||Resuming computation and network activity 08/10/2005 03:20:17||request_reschedule_cpus: Resuming activities 08/10/2005 03:20:17|rosetta@home|Resuming result 1cfyA_abrelax_no_cst_10632_1 using rosetta version 477 (Link to the fix in CVS: here .. excuse me, was linking to the wrong fix first. Linking to the correct one of the 21st of September 2005 now.) |
Aurora Borealis Send message Joined: 7 Oct 05 Posts: 15 Credit: 352,300 RAC: 0 |
I can't leave the project in memory because it causes other problems with Windows 98. The main difficulty is that you end up with several projects apparently running at the same time, and getting very long crunch time to nowhere. Any ideas!! Questions? Answers are in the BOINC Wiki. Boinc V6.12.41 Win 7 i5 GPU Nvidia 470 |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
I can't leave the project in memory because it causes other problems with Windows 98. Any ideas!! I had a look at the specs of your system. With only 223MB of RAM you will likely continue to have problems running Rosetta, at least until such time that the devs can sort this problem out. I'd suggest running something else for the time being, or running only Rosetta. *** Join BOINC@Australia today *** |
Solblekt Send message Joined: 27 Sep 05 Posts: 8 Credit: 3,302 RAC: 0 |
I can inform you that on my two computers rosetta crash after about 80% is done. Before 80% is done it does swap between projects without any problem. One computer has 256 MB the other 512 MB. The crash occure when there is a swap from rosetta to any other project. I do not leave the projects in memory. I have noticed that rosetta at this stage use an lot of memory over 160 MB. Can the problem has something to do with allocation of memory? The message in BOINC is for some seconds or so that there has been something wrong in a calculation. I do not belive that it is a good ide to leave all projects in memory. Not as long as it use that much memory any way. The thing is that you tend to end up with a computer working with paging instead. For now I have stoped crunshing for rosetta. I hope they will give us a hint on the frontpage when the problem has been solved. |
kb7rzf Send message Joined: 7 Oct 05 Posts: 16 Credit: 35,427 RAC: 0 |
10/7/2005 8:51:28 PM|rosetta@home|Pausing result 1cfyA_abrelax_16893_0 (removed from memory) 10/7/2005 8:51:30 PM|rosetta@home|Unrecoverable error for result 1cfyA_abrelax_16893_0 ( - exit code -1073741819 (0xc0000005)) 10/7/2005 8:51:31 PM||request_reschedule_cpus: process exited 10/7/2005 8:51:31 PM|rosetta@home|Deferring communication with project for 1 minutes and 0 seconds 10/7/2005 8:51:31 PM|rosetta@home|Computation for result 1cfyA_abrelax_16893_0 finished I had one error come up with mine, and as Solblekt said, it did it after 80% was done. I currently have 1 WU that is not going any higher with this project, and its sitting at 83.33%, with the time to complete slowly going up now, and the percentage not moving. Jeremy |
Aurora Borealis Send message Joined: 7 Oct 05 Posts: 15 Credit: 352,300 RAC: 0 |
I can't leave the project in memory because it causes other problems with Windows 98. Any ideas!! Thanks for the reply. I've temporarily suspended the other projects and set Rosetta to NO NEW WORK until I run through my current queue. Hopefully a solution will be found and I can reactivate this project in the future. Questions? Answers are in the BOINC Wiki. Boinc V6.12.41 Win 7 i5 GPU Nvidia 470 |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
For what it's worth. I just had one crash on me, after 3.75 hours of crunching. Happened when BOINC did its automatic benchmarks. First one with this problem on this machine that I'm aware of. 3.4GHz Pentium 4 with Windows SBS 2003 Server, 1GB RAM, HT enabled with Rosetta on one logical CPU, CPDN on the other. 8/10/2005 11:08:04 PM 720 Suspending computation and network activity - running CPU benchmarks 8/10/2005 11:08:04 PM 721 Pausing result 3i29_000185019_0 (removed from memory) 8/10/2005 11:08:04 PM 722 Pausing result 1cfyA_abrelax_09628_0 (removed from memory) 8/10/2005 11:08:05 PM 723 Unrecoverable error for result 1cfyA_abrelax_09628_0 ( - exit code -1073741819 (0xc0000005)) No problem with the CPDN work unit, only Rosetta. However... At approximately the same time, my Laptop (2.8GHz Mobile Pentium 4, Win XP Pro, 512MB RAM, no HT) and my other PC (overclocked Athlon XP 3000+, Win2K Pro, 512MB RAM) ran benchmarks as well, and had NO problem. So it seems the benchmarking issue is isolated to multiple CPU machines, whether logical (HT) or physical (dual core/dual processors) *** Join BOINC@Australia today *** |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
To make the system consistent for measurement purposes, you have to unload the system as much as possible. Thus, halting the science applications and removing them from memory. I have noticed if you exit BOINC and reboot computer Rosetta carries on as normal. Rosetta only has problems during benchmarking when it is removed from memory by CC otherwise if left in memory no problem. |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
Another one: 10/8/2005 10:59:40 PM|rosetta@home|Unrecoverable error for result 1cfyA_abrelax_13371_1 ( - exit code -1073741819 (0xc0000005)) 10/8/2005 10:59:41 PM||request_reschedule_cpus: process exited 10/8/2005 10:59:41 PM|rosetta@home|Deferring communication with project for 1 minutes and 0 seconds 10/8/2005 10:59:41 PM|rosetta@home|Computation for result 1cfyA_abrelax_13371_1 finished It was this WU and I can see it's getting ready to get sent again. I don't want it! Maybe it's just bad! [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Holly, No. The first death was related to a problem with Rosetta@Home and some versions of OS-X. The person with that computer will never complete a work unit successfully. Your error is MOST LIKELY the problem that David Kim is tearing his hair out over. Rosetta@Home seems to have problems when it is removed from memory. There are variations, but seem to center on running of the benchmarks and pausing with removal from memory. The solution for these is to let them stay in memory. If your computer is memory limited you may want to put RAH on hold for a bit. Another issue has to do with checkpointing and that is being looked into so the work is saved more often. Last point, BOINC policy is to never send a failed work unit to the same account. Not just computer, account (unless they changed the rules on me behind my back, in which case Ingleside will usually yell at me ...). So, if it dies, you should never see it again. In contrast, classic SETI@Home never made these kinds of tests so the overall science had the potential of being compromised. |
The Pirate Send message Joined: 22 Sep 05 Posts: 20 Credit: 7,090,933 RAC: 0 |
|
Message boards :
Number crunching :
Computational Error
©2024 University of Washington
https://www.bakerlab.org