21)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 79349)
Posted 2 Jan 2016 by Snags Post: A few things to consider: Where did you set your preferences? Changes made in the BOINC Manager will override any web-based settings. Double check the wording. In my version of BOINC Manager a box must be checked to keep tasks running while the computer is in use while you must select the “no” radio button to achieve the same thing using web-based prefs. What I'm puzzled about is that BOINC is starting new tasks when older ones still are Waiting to Run... This can happen if there isn’t enough memory to continue running a particular task. BOINC will set that one aside and try another. Rosetta tasks are among the most memory hungry tasks you will encounter in the BOINC world. So how much memory per core do you have and, more importantly, how much is BOINC allowed to use? Could computer (not BOINC) sleep/hibernation settings be coming into play? Best, Snags |
22)
Message boards :
Number crunching :
Getting tired of this error
(Message 77860)
Posted 27 Jan 2015 by Snags Post: Could you clarify something? When you write that rosetta@home resets the project do you mean that all the files get deleted without any manual intervention on your part? I had assumed you meant you had clicked the "reset project" command from within the BOINC Manager. If in fact the files are being deleted without your having clicked that button (or the button below labeled "remove") then I suggest you refocus your troubleshooting to the activities of your security software. The pattern in your tasks list (first successful completions followed by unsuccessful completions, all of tasks downloaded at the same time) supports the hypothesis that security software running on an automated schedule is deleting files. Timo also suggested that you check your timezone/date/time settings. I vaguely recall some old, possibly ancient issue with the Windows system clock which caused it to reset every day causing (I believe) the checksum error. As I mentioned earlier, I don't run Windows machines and I have no idea if it is still an issue. Best, Snags |
23)
Questions and Answers :
Macintosh :
Rosetta not running
(Message 77831)
Posted 16 Jan 2015 by Snags Post: Rosetta stopped running in my Boinc manager about one week ago. Hi Douggie, The project is definitely not on hiatus which I can confirm by checking my own computer (no break in activity), the server status box on the right side on the home page, or the message boards where no one else is reporting related complaints. A quick check of your tasks list shows that you have been assigned and presumably downloaded tasks in the last week although you haven't returned any since the 11th. Can you confirm that these tasks have been successfully downloaded to your computer? The other possibility is that you have inadvertently suspended calculation of rosetta tasks. In order to check either of these possibilities you will need to be looking at the advanced view of BOINC manager. From there you can see what tasks have been downloaded to your computer, whether or not they have been suspended, and examine the event log (found in the Advanced dropdown menu) for more clues. The update command doesn't start calculations on your machine; it initiates contact with the rosetta@home servers (the server's responses to that contact can be found in the event log). The graphics only appear when your computer is actively calculating rosetta tasks. Post back with some more information (or questions if you are still not sure where to find the information) and perhaps I or another volunteer can help get you crunching again. HTH, Snags |
24)
Message boards :
Number crunching :
Getting tired of this error
(Message 77820)
Posted 12 Jan 2015 by Snags Post: A spot check through your task lists show other folks have successfully completed the workunits which does suggest the problem originates at your end. I don't run Windows so I can't be of much help but I did notice something while scrolling through the task list which may trigger a helpful tip from someone else. A bunch of tasks were marked "client detached" on January 4th. I assume this is when you reset the project. Of the new tasks downloaded on the 4th all were completed and returned successfully on the 6th and 7th of January. Of the new tasks assigned on the 6th most were returned and completed successfully on the 7th and 8th. On the 9th only two workunits were returned successfully. The rest were completed successfully by your machine but were marked "client error" with this at the bottom of the stdrr out: ====================================================== DONE :: 1460 starting structures 21586.7 cpu seconds This process generated 1460 decoys from 1460 attempts ====================================================== BOINC :: WS_max 4.82271e+008 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt><message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>minirosetta_database_3d2618f.zip</file_name> <error_code>-120 (RSA key check failed for file)</error_code> <error_message>signature verification failed</error_message> </file_xfer_error> </message> (These talks were awarded credits, by the way, as you appear to have been able to send back usable information). Of the tasks assigned: 6 Jan 2015 7:41:50 UTC or earlier (after the rest on the 4th) completed and reported just fine. 6 Jan 2015 7:54:04 UTC all completed models but reported with the signature verification error message. 7 Jan 2015 6:05:08 UTC two were returned in the same fashion but the rest, and all subsequently received tasks were reported with no work completed (zero CPU time used, no models run) along with the same error message. Could this be triggered by firewall/security issues? HTH Snags |
25)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 77499)
Posted 23 Sep 2014 by Snags Post: Well, the changes I have made didnt fix the issue. I have reduced the number of processors available, increased the memory allocated to the BIONIC software, and have increased the percentage of processer time to 75%. Still I get this: Look at bit closer at the link. The advice is to not use the BOINC throttling at all. In other words you need to increase the "use at most % of CPU time" to 100. To forestall any heat issues this may create you are then advised to reduce the % of processors BOINC is allowed to use to whatever number gives you the performance you are satisfied with. |
26)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 77461)
Posted 15 Sep 2014 by Snags Post: Greetings, Copied and pasted from an earlier answer: On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly. Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active. An explanation and more possible causes can be found here: BOINC FAQ Service Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it. Best, Snags |
27)
Message boards :
Number crunching :
Minirosetta 3.50
(Message 76747)
Posted 18 May 2014 by Snags Post: i just started crunching a few days ago. completed one wu successfully with rosetta BOINC FAQ Service earlier post Hope this helps. Snags |
28)
Message boards :
Number crunching :
Current issues with 7+ boinc client
(Message 76452)
Posted 19 Feb 2014 by Snags Post: Not sure where to post this, but I hope you can help. Hi Sid, is the task still showing up in the transfers tab? When you tried aborting it, was that from the task tab or the transfers tab? As for the "some task is suspended via Manager" message I assume you double checked the resume/suspend button is showing as "suspend" for all the rosetta tasks and, after that, shut down and restarted BOINC to see if that would reset any errant instructions. I do have some vague memory of having to hunt down an orphaned task in a similar situation but I think that was a case of BOINC hanging on to a task it had in fact uploaded. It involved editing the state file though so hopefully your task won't require that. Best, Snags |
29)
Message boards :
Number crunching :
exited with zero status but no finished file.
(Message 76451)
Posted 19 Feb 2014 by Snags Post: 2/16/2014 12:19:52 PM | rosetta@home | Task 20140214_Exp131_119186_layer_sheet_3_surface_fold_156_0003_006_0001_S02_0016_fragments_fold_SAVE_ALL_OUT_142570_17_1 exited with zero status but no 'finished' file The "exited with zero status but no 'finished' file" occurs when some other task on your computer prevents the science app from communicating with BOINC. It is usually safe to ignore it as it will have to happen 100 times to a task before the task will give up and error out. On the BOINC forum Jord (Ageless)makes the following suggestions: Possible causes of the "Task exited with zero status but no 'finished' file" syndrome: 1. Make sure you exclude the BOINC directory and all subdirectories (or the BOINC Data directory and all subdirectories in BOINC 6 and 7) from being actively scanned by anti-virus and anti-spyware software. Only scan when you have exited BOINC. 2. Don't defrag your disk with BOINC on. 3. Don't run Scandisk with BOINC on. 4. Disable Drive Indexing. 5. Update your motherboard chipset drivers, specifically those for your IDE or SATA controllers. 6. Disable the Time synchronization in Windows XP/Vista. normally found under the clock (double click it in the system tray), third tab (Internet in English), uncheck the sync option. 7. When you use use BOINC's CPU throttling function, you can run into the too many exit(0)s error. The advice here is to disable the BOINC throttling (set it to 100%) and reduce the amount of CPUs/cores for BOINC to use. ** Use at most 100.0 percent of CPU time. * In BOINC 7.0, this is done through the option On multiprocessors, use at most xxx% of the processors. |
30)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 76357)
Posted 17 Jan 2014 by Snags Post: When this happens and the WU is restarted, does the computation begin anew for the WU, or does it pick up near the point where it exited? I see Danny has answered your question so I'll just chime back in to to say this doesn't cause a problem for rosetta@home, it just increases the computer cycles per workunit causing a bit of inefficiency on your end. Eventually you will see a task error out when a model can't complete (after a hundred tries) but I doubt it will happen very often. What else changed around the time you updated BOINC? Maybe I'm fixated on the three hour interval, but it seems most likely to be caused by Windows or some software other than BOINC. As I don't run Windows I don't know how you can see what's happening every three hours. If no one here has a suggestion I would post on the BOINC message boards where both BOINC and Windows gurus hang out and see if they don't have some useful ideas. You might want to use "Task ... exited with zero status but no 'finished' file" as the message title and be sure and give them the details of your troubleshooting efforts (Jord's checklist) in your first post. Good Luck. Let us know what you discover. Snags |
31)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 76343)
Posted 12 Jan 2014 by Snags Post: Please, restart Ralph server.... Hi, Dave, boboviz post wasn't in response to yours; he was trying to alert an admin that ralph@home was down. The "exited with zero status but no 'finished' file" occurs when some other task on your computer prevents the science app from communicating with BOINC. It is usually safe to ignore it as it will have to happen 100 times to a task before the task will give up and error out. Since it's happening to you at such regular intervals I suspect you recently set some scan to occur regularly in the background. On the BOINC forum Jord (Ageless)makes the following suggestions: Possible causes of the "Task exited with zero status but no 'finished' file" syndrome: This is obviously not a Rosetta specific issue; it shows up on just about every project board at some time or another. Gary Roberts, the patient prince of einstein@home, explains what's happening in this post and the BOINC FAQ Service entry is here. Hope this helps. Snags |
32)
Message boards :
Rosetta@home Science :
Principles for designing ideal protein structures published in the journal Nature
(Message 74260)
Posted 12 Nov 2012 by Snags Post: Thank you so much for posting here and coming back to answer our questions. In another thread someone asked what can be done to increase volunteer participation in rosetta@home. I think it's exactly this sort of information that can help. It's not just the announcement of papers published (which are difficult for most of us to read and understand) but a brief layman's explanation coupled with the sorts of details that help us place our contribution within the larger context. Best, Snags p.s. Please encourage your colleagues to post as well. They don't need to, in fact shouldn't, wait until they have a paper to publish to let us know what they/we are working on. |
33)
Message boards :
Number crunching :
Mini Rosetta Version 3.41.
(Message 74259)
Posted 12 Nov 2012 by Snags Post: More zdock proplems: Several ended quickly with client error/compute error 2PCC_zdock_2PCC_cluster_selectcst_c.1.53_SAVE_ALL_OUT_63659_5 1YVB_zdock_1YVB_cluster_selectcst_c.0.77_SAVE_ALL_OUT_63621_6 1FLE_zdock_1FLE_cluster_selectcst_c.5.12_SAVE_ALL_OUT_63540_7 Ended with exit status -177, maximum disk usage exceeded, a long stderr out and "SIGPIPE: write on a pipe with no reader". My wingman on the second task received exit status 196 on a Windows machine. 1WEJ_zdock_1WEJ_cluster_selectcst_c.7.6_SAVE_ALL_OUT_63679_7 both copies "process exited with code 1" and ERROR: Cannot open PDB file "1WEJ_ppk_b_start.pdb" ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 198 BOINC:: Error reading and gzipping output datafile: default.out Two more ended with validate errors and the odd, presumably tell-tale, 1201 cpu seconds 2ABZ_zdock_2ABZ_cluster_selectcst_c.4.7_SAVE_ALL_OUT_63630_6 2H7V_zdock_2H7V_cluster_selectcst_c.16.0_SAVE_ALL_OUT_63641_5 Best, Snags |
34)
Message boards :
Number crunching :
Mini Rosetta Version 3.41.
(Message 74222)
Posted 9 Nov 2012 by Snags Post: Hi. I think Polian is right and this is a problem for the project to solve. According to the BOINC FAQ Service it happens when "the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task." Nothing has been run on ralph in a few weeks but if this is a simple typing error then it could have been caught by running a handful on an in-house computer before adding them to the rosetta queue. Perhaps this type of error doesn't happen frequently enough to warrant adding that step to the existing protocols. The tasks appear to error out almost immediately so they don't waste much of our time and the bulk of them have probably already made their way through the system. There will be a few stragglers showing up over the next couple of weeks (dependent on users' settings) but not enough to justify trying to preemptively delete the bad workunits. Best, Snags |
35)
Message boards :
Number crunching :
exited with zero status but no 'finished' file
(Message 74197)
Posted 7 Nov 2012 by Snags Post: Most recently discussed here ? I assumed Mod.Sense suggested a new thread because svincent originally posted in the "Current issues with 7+ BOINC client" thread. I made a link to a different, slightly older thread (with the exact same title as this thread, "exited with zero status but no 'finished' file") simply because I didn't have time to summarize it. I will repost the link to the BOINC FAQ Service page which describes this long standing (since BOINC 5+) error message and the possible causes and solutions. If svincent or googloo (from the previous thread) still think it's related to the 7+ BOINC client then it would be most helpful if they post back detailing how they eliminated the other triggers. Best, Snags |
36)
Message boards :
Number crunching :
exited with zero status but no 'finished' file
(Message 74185)
Posted 6 Nov 2012 by Snags Post: Most recently discussed here Best, Snags |
37)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 74125)
Posted 29 Oct 2012 by Snags Post: hyb_ai_bench_4adyB_SAVE_ALL_OUT_IGNORE_THE_REST_58035_47 My mac (BOINC 6.12.33) ended with Outcome: Success; Client state: Done; Exit status: 0(0x0) but the following in the stderr out: BOINC:: CPU time: 36269.7s, 14400s + 21600s[2012-10-29 7:18:59:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 The watchdog ended it and I received the default one model/20 credits. On my wingman's windows machine the workunit ended with a client error within a few seconds of starting though it should be noted that all 40 of his most recent tasks have failed so his failure might not be related to the workunit. Best, Snags |
38)
Message boards :
Number crunching :
exited with zero status but no 'finished' file
(Message 73870)
Posted 20 Sep 2012 by Snags Post: Seeing the same issue on a new Win7 machine that I cranked up with BOINC 7.0.28. Per your note, I've just switched from 75% CPU on four jobs (one per core) to 100% CPU, only two concurrent jobs. Waiting to see if that reduces the problem, and how the temperature settles out. I shouldn't think so. This error message was first added for BOINC 5. We saw quite a spate of posts about it a while ago well before BOINC 7 was released. I haven't noticed any of the posts citing problems with BOINC 7 listing this as a symptom. ... Yes but as Sid notes most of the time it isn't worth fretting over as a rare occurrence it would be difficult to track down the conflict and may be impossible to avoid. If it continues to happen frequently click through to the BOINC FAQ Service and check out Jord's list of suggestions. The link in my previous post takes you straight to the relevant page. Best, Snags |
39)
Message boards :
Number crunching :
exited with zero status but no 'finished' file
(Message 73818)
Posted 12 Sep 2012 by Snags Post: On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly. Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active. An explanation and more possible causes can be found here: BOINC FAQ Service Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it. Best, Snags |
40)
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
(Message 73538)
Posted 26 Jul 2012 by Snags Post: I get an instant 'compute error' on all my work units for the last few days now. No problems with other projects from WCG. There might be simpler solution than downgrading. He's getting -185 errors with "couldn't start Input file minirosetta_3.31_windows_intelx86.exe missing or invalid: -123: -123". Perhaps simply rebooting the computer and/or possibly resetting rosetta will do the trick. Best, Snags |
©2024 University of Washington
https://www.bakerlab.org