Posts by Snags

21) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 79349)
Posted 2 Jan 2016 by Snags
Post:
A few things to consider:

Where did you set your preferences? Changes made in the BOINC Manager will override any web-based settings.

Double check the wording. In my version of BOINC Manager a box must be checked to keep tasks running while the computer is in use while you must select the “no” radio button to achieve the same thing using web-based prefs.

What I'm puzzled about is that BOINC is starting new tasks when older ones still are Waiting to Run...

This can happen if there isn’t enough memory to continue running a particular task. BOINC will set that one aside and try another. Rosetta tasks are among the most memory hungry tasks you will encounter in the BOINC world. So how much memory per core do you have and, more importantly, how much is BOINC allowed to use?

Could computer (not BOINC) sleep/hibernation settings be coming into play?

Best,
Snags
22) Message boards : Number crunching : Getting tired of this error (Message 77860)
Posted 27 Jan 2015 by Snags
Post:
Could you clarify something? When you write that rosetta@home resets the project do you mean that all the files get deleted without any manual intervention on your part? I had assumed you meant you had clicked the "reset project" command from within the BOINC Manager. If in fact the files are being deleted without your having clicked that button (or the button below labeled "remove") then I suggest you refocus your troubleshooting to the activities of your security software. The pattern in your tasks list (first successful completions followed by unsuccessful completions, all of tasks downloaded at the same time) supports the hypothesis that security software running on an automated schedule is deleting files.

Timo also suggested that you check your timezone/date/time settings. I vaguely recall some old, possibly ancient issue with the Windows system clock which caused it to reset every day causing (I believe) the checksum error. As I mentioned earlier, I don't run Windows machines and I have no idea if it is still an issue.

Best,
Snags
23) Questions and Answers : Macintosh : Rosetta not running (Message 77831)
Posted 16 Jan 2015 by Snags
Post:
Rosetta stopped running in my Boinc manager about one week ago.
The only other project I'm working on now is the malaria one. That's running OK.
I click update on Rosetta, it defers communication for about four minutes, but nothing loads in.
Show graphics is grayed out.
Is the project itself in hiatus, or just lost in limbo land?
Please advise

Hi Douggie,

The project is definitely not on hiatus which I can confirm by checking my own computer (no break in activity), the server status box on the right side on the home page, or the message boards where no one else is reporting related complaints.

A quick check of your tasks list shows that you have been assigned and presumably downloaded tasks in the last week although you haven't returned any since the 11th. Can you confirm that these tasks have been successfully downloaded to your computer?

The other possibility is that you have inadvertently suspended calculation of rosetta tasks.

In order to check either of these possibilities you will need to be looking at the advanced view of BOINC manager. From there you can see what tasks have been downloaded to your computer, whether or not they have been suspended, and examine the event log (found in the Advanced dropdown menu) for more clues.

The update command doesn't start calculations on your machine; it initiates contact with the rosetta@home servers (the server's responses to that contact can be found in the event log). The graphics only appear when your computer is actively calculating rosetta tasks. Post back with some more information (or questions if you are still not sure where to find the information) and perhaps I or another volunteer can help get you crunching again.


HTH,
Snags
24) Message boards : Number crunching : Getting tired of this error (Message 77820)
Posted 12 Jan 2015 by Snags
Post:
A spot check through your task lists show other folks have successfully completed the workunits which does suggest the problem originates at your end. I don't run Windows so I can't be of much help but I did notice something while scrolling through the task list which may trigger a helpful tip from someone else.

A bunch of tasks were marked "client detached" on January 4th. I assume this is when you reset the project. Of the new tasks downloaded on the 4th all were completed and returned successfully on the 6th and 7th of January. Of the new tasks assigned on the 6th most were returned and completed successfully on the 7th and 8th. On the 9th only two workunits were returned successfully. The rest were completed successfully by your machine but were marked "client error" with this at the bottom of the stdrr out:

======================================================
DONE :: 1460 starting structures 21586.7 cpu seconds
This process generated 1460 decoys from 1460 attempts
======================================================
BOINC :: WS_max 4.82271e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt><message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>minirosetta_database_3d2618f.zip</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>

(These talks were awarded credits, by the way, as you appear to have been able to send back usable information).

Of the tasks assigned:
6 Jan 2015 7:41:50 UTC or earlier (after the rest on the 4th) completed and reported just fine.
6 Jan 2015 7:54:04 UTC all completed models but reported with the signature verification error message.
7 Jan 2015 6:05:08 UTC two were returned in the same fashion but the rest, and all subsequently received tasks were reported with no work completed (zero CPU time used, no models run) along with the same error message.


Could this be triggered by firewall/security issues?

HTH

Snags



25) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 77499)
Posted 23 Sep 2014 by Snags
Post:
Well, the changes I have made didnt fix the issue. I have reduced the number of processors available, increased the memory allocated to the BIONIC software, and have increased the percentage of processer time to 75%. Still I get this:


And it goes on and on, pages of it. Any other suggestions folks?

Ron

Look at bit closer at the link. The advice is to not use the BOINC throttling at all. In other words you need to increase the "use at most % of CPU time" to 100. To forestall any heat issues this may create you are then advised to reduce the % of processors BOINC is allowed to use to whatever number gives you the performance you are satisfied with.
26) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 77461)
Posted 15 Sep 2014 by Snags
Post:
Greetings,

I am fairly new to rosetta (not new to BIONIC) and have noted the error I am getting in several places in this thread, but no answers. Here is the error (not mine, copied from another user but its the same error I am getting):

Task binding_helix_0472_disulf1_1_disulf2_2_disulf3_4_disulf4_2_disulf5_1_0004_abinitio_SAVE_ALL_OUT_73462_2878_0 exited with zero status but no 'finished' file
3/29/2013 8:25:05 AM | rosetta@home | If this happens repeatedly you may need to reset the project.

I have reset the project twice as the software suggests and this has not fixed the problem. I have also changed the WU swap time from 60 minutes to 24 hours, no effect, still getting the same error. I am getting credit for the CPU time, but that is not the reason I run BIONIC, I am trying to help out. If my WU's are not producing then I need to find out how to start providing valid results. Any suggestions (nice ones please) would be appreciated.

Ron

Copied and pasted from an earlier answer:

On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly.

Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active.

An explanation and more possible causes can be found here: BOINC FAQ Service

Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it.


Best,
Snags

27) Message boards : Number crunching : Minirosetta 3.50 (Message 76747)
Posted 18 May 2014 by Snags
Post:
i just started crunching a few days ago. completed one wu successfully with rosetta
, but got the following errors since:
/13/2014 3:20:27 AM | rosetta@home | Task aftimidv2_7_fold_SAVE_ALL_OUT_165014_1039_0 exited with zero status but no 'finished' file
5/13/2014 3:20:27 AM | rosetta@home | If this happens repeatedly you may need to reset the project.





All other projects are finishing ok.
Will try here again in a few weeks, and in the meantime look for answers in these message boards.


BOINC FAQ Service

earlier post

Hope this helps.
Snags
28) Message boards : Number crunching : Current issues with 7+ boinc client (Message 76452)
Posted 19 Feb 2014 by Snags
Post:
Not sure where to post this, but I hope you can help.

I just upgraded Boinc from v7.2.33 to v7.2.39 and just as it was installing a task was completing and managed to get itself stuck at the uploading stage, as shown in this murky image

Stuck Task

I've tried updating and even aborting it, but it doesn't seem to want to move on, so that instead of 8 tasks I'm only running 7.

[Edit: Oops, I am running 8 tasks, but the stuck task is still stuck]

Anyone have an idea how to fix this?


Hi Sid, is the task still showing up in the transfers tab? When you tried aborting it, was that from the task tab or the transfers tab?

As for the "some task is suspended via Manager" message I assume you double checked the resume/suspend button is showing as "suspend" for all the rosetta tasks and, after that, shut down and restarted BOINC to see if that would reset any errant instructions.

I do have some vague memory of having to hunt down an orphaned task in a similar situation but I think that was a case of BOINC hanging on to a task it had in fact uploaded. It involved editing the state file though so hopefully your task won't require that.

Best, Snags
29) Message boards : Number crunching : exited with zero status but no finished file. (Message 76451)
Posted 19 Feb 2014 by Snags
Post:
2/16/2014 12:19:52 PM | rosetta@home | Task 20140214_Exp131_119186_layer_sheet_3_surface_fold_156_0003_006_0001_S02_0016_fragments_fold_SAVE_ALL_OUT_142570_17_1 exited with zero status but no 'finished' file
2/16/2014 12:19:52 PM | rosetta@home | If this happens repeatedly you may need to reset the project.
2/16/2014 12:24:23 PM | rosetta@home | Task 20140214_Exp131_119186_layer_sheet_3_surface_fold_156_0003_006_0001_S02_0016_fragments_fold_SAVE_ALL_OUT_142570_17_1 exited with zero status but no 'finished' file
2/16/2014 12:24:23 PM | rosetta@home | If this happens repeatedly you may need to reset the project.
2/16/2014 12:31:11 PM | rosetta@home | Task 20140214_Exp131_119186_layer_sheet_3_surface_fold_156_0003_006_0001_S02_0016_fragments_fold_SAVE_ALL_OUT_142570_17_1 exited with zero status but no 'finished' file
2/16/2014 12:31:11 PM | rosetta@home | If this happens repeatedly you may need to reset the project.

I had been running Rosetta for month without issue on a Linux box but was forced to complete the work on that box and add Rosetta to the current box which is WIN7.

Not sure what is going on here.

Any ideas anyone.

[EDIT]
After resetting the project I have seen at least two more entries so I am not sure that a reset has accomplished anything.




The "exited with zero status but no 'finished' file" occurs when some other task on your computer prevents the science app from communicating with BOINC. It is usually safe to ignore it as it will have to happen 100 times to a task before the task will give up and error out. On the BOINC forum Jord (Ageless)makes the following suggestions:

Possible causes of the "Task exited with zero status but no 'finished' file" syndrome:

1. Make sure you exclude the BOINC directory and all subdirectories (or the BOINC Data directory and all subdirectories in BOINC 6 and 7) from being actively scanned by anti-virus and anti-spyware software. Only scan when you have exited BOINC.

2. Don't defrag your disk with BOINC on.

3. Don't run Scandisk with BOINC on.

4. Disable Drive Indexing.

5. Update your motherboard chipset drivers, specifically those for your IDE or SATA controllers.

6. Disable the Time synchronization in Windows XP/Vista. normally found under the clock (double click it in the system tray), third tab (Internet in English), uncheck the sync option.

7. When you use use BOINC's CPU throttling function, you can run into the too many exit(0)s error. The advice here is to disable the BOINC throttling (set it to 100%) and reduce the amount of CPUs/cores for BOINC to use.
** Use at most 100.0 percent of CPU time.
* In BOINC 7.0, this is done through the option On multiprocessors, use at most xxx% of the processors.
30) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 76357)
Posted 17 Jan 2014 by Snags
Post:
When this happens and the WU is restarted, does the computation begin anew for the WU, or does it pick up near the point where it exited?

Some types of task are able to checkpoint at certain intervals, and all tasks will checkpoint when a model is completed. A task (as you see them in BOINC Manager) can contain multiple decoys/models, so at the end of each of these there will effectively be a checkpoint.

So, if your computer reaches either a checkpoint, or completes a model and moves on to another one (still within the same task), then it will pick up from that point when Rosetta restarts. If it doesn't reach one of those points before being stopped then it will restart the task from the beginning.


I see Danny has answered your question so I'll just chime back in to to say this doesn't cause a problem for rosetta@home, it just increases the computer cycles per workunit causing a bit of inefficiency on your end. Eventually you will see a task error out when a model can't complete (after a hundred tries) but I doubt it will happen very often.

What else changed around the time you updated BOINC? Maybe I'm fixated on the three hour interval, but it seems most likely to be caused by Windows or some software other than BOINC. As I don't run Windows I don't know how you can see what's happening every three hours. If no one here has a suggestion I would post on the BOINC message boards where both BOINC and Windows gurus hang out and see if they don't have some useful ideas. You might want to use "Task ... exited with zero status but no 'finished' file" as the message title and be sure and give them the details of your troubleshooting efforts (Jord's checklist) in your first post.

Good Luck. Let us know what you discover.

Snags
31) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 76343)
Posted 12 Jan 2014 by Snags
Post:
Please, restart Ralph server....


Issue remains. This particular WU has restarted 3 TIMES, at 3 hour intervals.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=572101126

If the above suggestion actually means resetting the project in the BOINC Mgr, that's not a solution, because:

1) It's been done and it didn't resolve anything, and
2) It addresses a symptom, not the cause

Is the cause bad WUs, or Rosetta 3.48? Also, what is the solution. Thanks


Hi, Dave, boboviz post wasn't in response to yours; he was trying to alert an admin that ralph@home was down.

The "exited with zero status but no 'finished' file" occurs when some other task on your computer prevents the science app from communicating with BOINC. It is usually safe to ignore it as it will have to happen 100 times to a task before the task will give up and error out. Since it's happening to you at such regular intervals I suspect you recently set some scan to occur regularly in the background. On the BOINC forum Jord (Ageless)makes the following suggestions:
Possible causes of the "Task exited with zero status but no 'finished' file" syndrome:

1. Make sure you exclude the BOINC directory and all subdirectories (or the BOINC Data directory and all subdirectories in BOINC 6 and 7) from being actively scanned by anti-virus and anti-spyware software. Only scan when you have exited BOINC.

2. Don't defrag your disk with BOINC on.

3. Don't run Scandisk with BOINC on.

4. Disable Drive Indexing.

5. Update your motherboard chipset drivers, specifically those for your IDE or SATA controllers.

6. Disable the Time synchronization in Windows XP/Vista. Normally found under the clock (double click it in the system tray), third tab (Internet in English), uncheck the sync option.

7. When you use use BOINC's CPU throttling function, you can run into the too many exit(0)s error. The advice here is to disable the BOINC throttling (set it to 100%) and reduce the amount of CPUs/cores for BOINC to use.
** Use at most 100.0 percent of CPU time.
* In BOINC 7.0, this is done through the option On multiprocessors, use at most xxx% of the processors.


This is obviously not a Rosetta specific issue; it shows up on just about every project board at some time or another. Gary Roberts, the patient prince of einstein@home, explains what's happening in this post and the BOINC FAQ Service entry is here.

Hope this helps.

Snags
32) Message boards : Rosetta@home Science : Principles for designing ideal protein structures published in the journal Nature (Message 74260)
Posted 12 Nov 2012 by Snags
Post:
Thank you so much for posting here and coming back to answer our questions. In another thread someone asked what can be done to increase volunteer participation in rosetta@home. I think it's exactly this sort of information that can help. It's not just the announcement of papers published (which are difficult for most of us to read and understand) but a brief layman's explanation coupled with the sorts of details that help us place our contribution within the larger context.


Best,
Snags

p.s. Please encourage your colleagues to post as well. They don't need to, in fact shouldn't, wait until they have a paper to publish to let us know what they/we are working on.
33) Message boards : Number crunching : Mini Rosetta Version 3.41. (Message 74259)
Posted 12 Nov 2012 by Snags
Post:
More zdock proplems:

Several ended quickly with client error/compute error

2PCC_zdock_2PCC_cluster_selectcst_c.1.53_SAVE_ALL_OUT_63659_5
1YVB_zdock_1YVB_cluster_selectcst_c.0.77_SAVE_ALL_OUT_63621_6
1FLE_zdock_1FLE_cluster_selectcst_c.5.12_SAVE_ALL_OUT_63540_7
Ended with exit status -177, maximum disk usage exceeded, a long stderr out and "SIGPIPE: write on a pipe with no reader". My wingman on the second task received exit status 196 on a Windows machine.

1WEJ_zdock_1WEJ_cluster_selectcst_c.7.6_SAVE_ALL_OUT_63679_7 both copies "process exited with code 1" and
ERROR: Cannot open PDB file "1WEJ_ppk_b_start.pdb"
ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 198
BOINC:: Error reading and gzipping output datafile: default.out


Two more ended with validate errors and the odd, presumably tell-tale, 1201 cpu seconds

2ABZ_zdock_2ABZ_cluster_selectcst_c.4.7_SAVE_ALL_OUT_63630_6
2H7V_zdock_2H7V_cluster_selectcst_c.16.0_SAVE_ALL_OUT_63641_5

Best,
Snags




34) Message boards : Number crunching : Mini Rosetta Version 3.41. (Message 74222)
Posted 9 Nov 2012 by Snags
Post:
Hi.

I've increased Boinc disc limit to 10GB on both my rigs to see if that helps, as i've got a few of the zdock tasks in line to run on both rigs.




I think Polian is right and this is a problem for the project to solve. According to the BOINC FAQ Service it happens when "the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task."

Nothing has been run on ralph in a few weeks but if this is a simple typing error then it could have been caught by running a handful on an in-house computer before adding them to the rosetta queue. Perhaps this type of error doesn't happen frequently enough to warrant adding that step to the existing protocols.

The tasks appear to error out almost immediately so they don't waste much of our time and the bulk of them have probably already made their way through the system. There will be a few stragglers showing up over the next couple of weeks (dependent on users' settings) but not enough to justify trying to preemptively delete the bad workunits.

Best,
Snags
35) Message boards : Number crunching : exited with zero status but no 'finished' file (Message 74197)
Posted 7 Nov 2012 by Snags
Post:
Most recently discussed here

Best,
Snags


Svincent was TOLD to start a new thread by Mod Sense.



?

I assumed Mod.Sense suggested a new thread because svincent originally posted in the "Current issues with 7+ BOINC client" thread. I made a link to a different, slightly older thread (with the exact same title as this thread, "exited with zero status but no 'finished' file") simply because I didn't have time to summarize it.

I will repost the link to the BOINC FAQ Service page which describes this long standing (since BOINC 5+) error message and the possible causes and solutions.

If svincent or googloo (from the previous thread) still think it's related to the 7+ BOINC client then it would be most helpful if they post back detailing how they eliminated the other triggers.

Best,
Snags
36) Message boards : Number crunching : exited with zero status but no 'finished' file (Message 74185)
Posted 6 Nov 2012 by Snags
Post:
Most recently discussed here


Best,
Snags
37) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 74125)
Posted 29 Oct 2012 by Snags
Post:
hyb_ai_bench_4adyB_SAVE_ALL_OUT_IGNORE_THE_REST_58035_47

My mac (BOINC 6.12.33) ended with Outcome: Success; Client state: Done; Exit status: 0(0x0) but the following in the stderr out:

BOINC:: CPU time: 36269.7s, 14400s + 21600s[2012-10-29 7:18:59:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001

The watchdog ended it and I received the default one model/20 credits.

On my wingman's windows machine the workunit ended with a client error within a few seconds of starting though it should be noted that all 40 of his most recent tasks have failed so his failure might not be related to the workunit.

Best,
Snags
38) Message boards : Number crunching : exited with zero status but no 'finished' file (Message 73870)
Posted 20 Sep 2012 by Snags
Post:
Seeing the same issue on a new Win7 machine that I cranked up with BOINC 7.0.28. Per your note, I've just switched from 75% CPU on four jobs (one per core) to 100% CPU, only two concurrent jobs. Waiting to see if that reduces the problem, and how the temperature settles out.

Question: Any correlation between this error and the, "mismatch" between Rosetta and BOINC 7.x? Should I be planning my retreat to BOINC 6 because I'm seeing this failure?


I shouldn't think so. This error message was first added for BOINC 5. We saw quite a spate of posts about it a while ago well before BOINC 7 was released. I haven't noticed any of the posts citing problems with BOINC 7 listing this as a symptom.

...
Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it.


Best,
Snags


Understood, but does the restart imply loss-of-work back to the previous checkpoint for the job?


Yes but as Sid notes most of the time it isn't worth fretting over as a rare occurrence it would be difficult to track down the conflict and may be impossible to avoid. If it continues to happen frequently click through to the BOINC FAQ Service and check out Jord's list of suggestions. The link in my previous post takes you straight to the relevant page.


Best,
Snags
39) Message boards : Number crunching : exited with zero status but no 'finished' file (Message 73818)
Posted 12 Sep 2012 by Snags
Post:
On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly.

Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active.

An explanation and more possible causes can be found here: BOINC FAQ Service

Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it.


Best,
Snags
40) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 73538)
Posted 26 Jul 2012 by Snags
Post:
I get an instant 'compute error' on all my work units for the last few days now. No problems with other projects from WCG.

I assume you mean this computer. See this thread. In short: you might need to downgrade to BOINC v6.12.34, which you use on your other computers (and as you see they don't have such issues).


There might be simpler solution than downgrading. He's getting -185 errors with "couldn't start Input file minirosetta_3.31_windows_intelx86.exe missing or invalid: -123: -123". Perhaps simply rebooting the computer and/or possibly resetting rosetta will do the trick.

Best,
Snags


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org