Message boards : Number crunching : Minirosetta v1.32 bug thread
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
mitrichr![]() Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...} David- Yes, I know, that was what I had been doing and did not want to. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings ![]() ![]() |
mitrichr![]() Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...} David- Yes, I know, that was what I had been doing and did not want to. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings ![]() ![]() |
![]() Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
too many exit(0)s https://boinc.bakerlab.org/rosetta/result.php?resultid=188137329 https://boinc.bakerlab.org/rosetta/result.php?resultid=187374554 https://boinc.bakerlab.org/rosetta/result.php?resultid=187233250 Watchdog shutting down... https://boinc.bakerlab.org/rosetta/result.php?resultid=187377874 - exit code -1073741819 (0xc0000005) https://boinc.bakerlab.org/rosetta/result.php?resultid=185996255 |
mitrichr![]() Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
So, here is the deal: People are detaching from Rosetta, even though it may be the single most important project at BOINC. We have no choice but to endure these problems, or detach, because this mini runs along with what else is going. Detach this buggy process from the rest of Rosetta, Run it in Ralph@home. Get some volunteers to run it if you can not find the problems in the lab, and let us get back to crunching for Rosetta. If in fact you get it to run properly, put the news in the RSS feed, so we know it is safe to go back in the water. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings ![]() ![]() |
![]() ![]() Send message Joined: 7 Nov 05 Posts: 14 Credit: 936,419 RAC: 0 |
It seems that since the minirosetta 1.32, I not had a completed job on my AMD 3000+ machine, yet it is fine on my other two machines?? All jobs now end, either slowly or quickly, with a "Output file absent" error, see output messages below.... 8/31/2008 12:58:36 PM|rosetta@home|Computation for task abinitio_homfrag_71_A_1l6pA_4443_21507_0 finished 8/31/2008 12:58:36 PM|rosetta@home|Output file abinitio_homfrag_71_A_1l6pA_4443_21507_0_0 for task abinitio_homfrag_71_A_1l6pA_4443_21507_0 absent 8/31/2008 1:00:37 PM|rosetta@home|Computation for task abinitio_homfrag_71_A_1zd0A_4443_21518_0 finished 8/31/2008 1:00:37 PM|rosetta@home|Output file abinitio_homfrag_71_A_1zd0A_4443_21518_0_0 for task abinitio_homfrag_71_A_1zd0A_4443_21518_0 absent This has been going on for a few weeks...thought it was a irregularity and have put up with it. But now it is really getting to me. Is there any way to complete these jobs (future jobs) on this machine, or might I just as well abandon R@H on this machine?? Will work for bandwidth!! |
Terrasapiens Send message Joined: 25 Apr 08 Posts: 15 Credit: 368,919 RAC: 0 |
Terrasapiens, I see you are running BOINC 6.2.18. Do have any history running mini on older versions of BOINC? Are you using BOINC as your screensaver? I think the mini WUs ran fine about two versions prior to the current one but I'm not sure exactly. I do know that the mini rosetta WUs didn't start failing until v1.28. I did turn off the BOINC screen saver a couple of weeks ago but that made no difference in terms of failed WUs. |
Speedy![]() Send message Joined: 25 Sep 05 Posts: 163 Credit: 826,597 RAC: 0 |
For people having trouble This may help you out hope it helps. Cheers Speedy Have a crunching good day!! |
![]() Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
too many exit(0)s https://boinc.bakerlab.org/rosetta/result.php?resultid=188137329 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2474 Credit: 46,499,576 RAC: 3,223 ![]() |
What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then. Thanks Robert. It seems you run an AMD Dual Core with Vista 32-bit. too many exit(0)s Thanks (_KoDAk_). 1 of these is a 5.98 WU, one I can't see, but the other 3 (and the later 1) all show the same "Can't acquire lockfile - exiting" error that I get. I note you run an Intel Quad core with the 64-bit version of Windows Server Enterprise. Based on poor evidence of just one example I'd be inclined to look at an incompatibility between MiniRosetta 1.32 and any Windows 64-bit OS as a matter of urgency. Speedy: I appreciate the suggestion, but a bug thread on v1.32 isn't served by avoiding running it altogether (thought it may well be better sorted out on Ralph, I agree). It's made worse at the moment by the fact nearly all WUs are 1.32s and hardly any 5.98s. Though, for what little it's worth, my failure rate has reduced from 79/151 (52%) to 50/115 (43%) since my last posts. ![]() ![]() |
![]() ![]() Send message Joined: 11 Aug 07 Posts: 49 Credit: 1,786,248 RAC: 0 |
I've been having the same screensaver/blackout problem that others have had. I don't know which WU it is, but it definitely is a Mini 1.32, as that's all my machine has right now (it's one of these two: abinitio_homfrag_71_A_2uzrA_4443_23491_0 ; abinitio_homfrag_71_A_2ib0A_4443_23507_0). I used to have BOINC be the Windows screensaver, then I changed it to an actual Windows one when the Mini started doing the black screen thing, then I later changed the screensaver to None when Mini was continuing to produce black screens. Sometimes using Ctrl+Alt+Del got me to where the taskbar would show, sometimes not; another trick I found out about is pressing the key for the Start button (my keyboard has it between the left Ctrl and Alt keys) would also allow me to see the taskbar. One of the open programs on the taskbar at that time is MiniRosetta, and I don't know if that is the screensaver that is trying to kick in. Sometimes the Mini screensaver kicks in way before the Windows screensaver is supposed to start (once the Mini one started after the computer was idle for a whopping 30 seconds!). I have not paid attention as to what the WU crunching is doing, if it's still running during the black screen or if it's frozen up. ![]() |
![]() ![]() Send message Joined: 11 Aug 07 Posts: 49 Credit: 1,786,248 RAC: 0 |
Update: It's also blacking out on this WU: abinitio_homfrag_71_A_2hh6A_4443_24307_0. It seems to be doing it at some of the checkpoints, but not all of them. What's more, the other WU that is running does show graphics okay but not this particular WU (it's just a black screen if it's able to show a separate window at all). And the main black screen issue doesn't care if you're doing anything else, even if you're watching a movie or moving your mouse or typing, it will supercede anything else on the screen. The Windows button has worked each time I've pressed it, a better success rate than Ctrl+Alt+Del. One additional thing of note: when exiting from the black screen and I see the taskbar, the icon for the Mini WU that's on the taskbar is that of "minirosetta_graphics_1.30_windows_intelx86.exe", not a BOINC or generic icon (it's a miniature version of Rosetta's logo). I hope this helps you fix this bug. It's sure bugging me! :) ![]() |
![]() ![]() Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 1 |
To Mod, Why has no one from the team or for that matter anyone from RAH in general said anything about these 'needs psipred_ss2 to run filters' errors? This is what annoys alot of people and causes them to leave, no news, no reply to posts about a common issue, no communication at all. Could you give us a update on this issue or see if the team has something they can say in general about this? Thanks |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
Update: It's also blacking out on this WU: abinitio_homfrag snap! with a variation on a theme. Its blacking out on abinitio_homfrag_71_A_2flsA_4443_19927_0 and the graphics window has to be shut via Task Manager |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
The graphics have now come back on line. It would appear that it was being upset by a usb drive that I had just installed. The old hard drive I had installed was still set up as a slave drive which was causing the problems. |
mitrichr![]() Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
I just re-attached a PIII and a Core 2 Duo, two machines which I had previously used to crunch for Rosetta. They both got WU's within minutes. After the WU started on each machine, I let it run as long as it stayed active. Within about seven minutes, each machine froze. So, I detached both again. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings ![]() ![]() |
Keith T.![]() Send message Joined: 1 Mar 07 Posts: 58 Credit: 34,135 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=187758720 has been running on my AMD Athlon 2200+ since 28/08/08 08:45 (UTC+1). So far it has run for 11:33:44 and is currently "waiting to run". boinc_checkpoint_count.txt shows 97 boinc_init_count shows 1 87 The task appears to still be on the first decoy or model. I have changed my runtime prefs while this task has been running, to try to get it to finish and grant some credit. stderr.txt # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 28800 # cpu_run_time_pref: 28800 # cpu_run_time_pref: 28800 # cpu_run_time_pref: 28800 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 3600 # cpu_run_time_pref: 3600 I think it may have got stuck in a loop and is repeating the first model, as when I look at the graphics, the task does seem to be making progress. How far past the required runtime does a task need to go before it gets stopped by the watchdog? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
How far past the required runtime does a task need to go before it gets stopped by the watchdog? The messages in stderr indicate that this task has been removed from memory and resumed many times. If 5 such restarts occur with no progress being made (i.e. a checkpoint saved) the task will be ended. Otherwise, if the tasks uses more then 4 times the runtime preference, the watchdog will end it. Since it is waiting to run, it probably has to begin running again for the watchdog to get any time. The watchdog checks on it every 15 minutes. So, I would have expected it to have ended on the first restart where your runtime preference was lowered to 2 hours. Since it did not end, I guess I would abort that one. 11.5 hours and not to complete a single model, and not responding to the watchdog, sounds like something may be wrong there. Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
To Mod, Greg, I can't post information I do not have. The message does not seem to adversely effect the running of the tasks. I'm sure it will be addressed in a future release. Rest assured that the Project Team does review and react to the posts in these "problems with..." threads. Rosetta Moderator: Mod.Sense |
JordanWeber Send message Joined: 24 Apr 08 Posts: 4 Credit: 716,009 RAC: 0 |
Still can't run mini on this 1 computer, but at least I get computation errors now on all the tasks, with a message, getting warmer :): 189123860
|
Phil Hirons, Jr. Send message Joined: 5 Jan 06 Posts: 1 Credit: 44,233 RAC: 0 |
I've had 3 Mini WU that go alon fine until about 10 minutes are estimated to be left. They then take hours of CPU time to complete (>15) |
Message boards :
Number crunching :
Minirosetta v1.32 bug thread
©2025 University of Washington
https://www.bakerlab.org