Posts by Snags

1) Questions and Answers : Macintosh : Server Down? (Message 90067)
Posted 23 Dec 2018 by Snags
Post:
Is the Rosetta@Home server down? I'm getting a "no work available to process" notification.

There doesn't appear to be anything wrong with the server; rather, there are simply no new tasks in the pipeline. I'm sure there will be more but as it is the holidays we may have to wait until the new year. Just let your other projects have a go.
2) Message boards : Number crunching : Project web page support - information (Message 89862)
Posted 9 Nov 2018 by Snags
Post:
Do you ever notice that non of the cross-project stats links work. It's been a few years since you could use them.
It's getting harder to support something that doesn't give the number cruncher something back for the effort we give. It would be so nice to not have to go to other boinc sites to find out how I am doing compared to other projects.


? Do you me on your account page? The only one that's not working for me right now is BOINCSynergy. All the others are fine.
3) Message boards : Rosetta@home Science : Run time. (Message 89861)
Posted 9 Nov 2018 by Snags
Post:
Your first assumption is the correct one. The program will complete as many decoys as it can in the time allotted. At the end of each decoy it will check if it has time to run another, if not it will wrap up. This is when you may see the task completed in less time than you have chosen in your preferences. On the other hand, not all decoys take the same amount of time to run. Some will continue past your run time preference in order to complete the decoy. If it runs four hours over, the watchdog should cut in and end the task.

Snags
4) Questions and Answers : Preferences : Hide Computers? (Message 89713)
Posted 11 Oct 2018 by Snags
Post:
Where is this illusive setting now?


Your account ->rosetta@home preferences -> Should rosetta@home show your computers on its website? Uncheck for no.


Snags
5) Message boards : Number crunching : ProtonMail "email address is invalid" (Message 87753)
Posted 26 Nov 2017 by Snags
Post:
I use a protonmail address with all my projects without a problem.
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81521)
Posted 4 May 2017 by Snags
Post:
I keep getting the message that 'Task XX exited with zero status but no finished file. If this happens repeatedly you may need to reset the project.' I have reset the project but continue to get the error. Is this an issue on my end or the project's end? Thanks!

Copied and pasted from an earlier answer:

On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly.

Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active.

An explanation and more possible causes can be found here: BOINC FAQ Service

Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it.


Best,
Snags
7) Questions and Answers : Macintosh : Rapidly Decreasing Credit (Message 81384)
Posted 28 Mar 2017 by Snags
Post:
Thanks but I had enough data that my computers never ran out. The credit has fallen another 100 credits today.
Louis Beatty
Colorado Mountain College


There are a number of tasks marked "aborted by user". This would have reduced your daily quota briefly. For instance you received 24 tasks on the 23rd but only 12 each on the 25th and 26th. If it's been a long standing practice to abort some tasks this obviously won't explain a steady reduction in daily credit but if it's only a recent occurrence, coupled with the dns hiccup, you may have to give another week or so for the numbers to climb back up.

Out of curiousity, why abort so many tasks?
8) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 80660)
Posted 16 Sep 2016 by Snags
Post:
My mac is running well. Are others having mac client issues?

I'm not having any mac specific issues; the only time I haven't gotten new work seems to coincide with the same problems that effected everyone else.

I do have concerns about a "acourbet.10.design_S" unit that is claiming large amounts of working memory (3.08GB right now, down from 3.2GB). It's been running about 4.5 hours, the last checkpoint was about 2.5 hours ago. It's an older machine with only 4GB of memory but as it's not doing anything else at the moment BOINC is allowed to use all of it. I'll keep an eye on it but thought a head's up might be in order.

edited to add: this is a resend after the previous cruncher's effort ended with an "out of memory error".
workunit


Best,
Snags
9) Message boards : Cafe Rosetta : Personal Milestones (Message 80555)
Posted 22 Aug 2016 by Snags
Post:
Today is my ten year anniversary volunteering as RosettaMod.Sense. Ten years ago the project was running about 20 TFLOPS and Dr. Baker was asked how much computing power his lab could make good, effective use of, and he said "well, let's just say if you each had 10 friends that could help..." Today the project is consistently over half a PetaFLOP. That's 25x growth in a decade. We, the R@h community and supporters, have all been a part of that. I'm sure over that decade he's found ever more uses of computational power to advance the science of protein structure prediction.

Today BakerLab continues to share their technology with other researchers and provides a platform where they can test their own changes and ideas to the program. They're tackling larger proteins than ever as well as docking and synthetic protein structures that may be used some day for targeted drug delivery or diagnostic testing. They've been a leader at CASP over the entire period of time and this means they do presentations to the other researchers as to how they devised a more accurate solution than the other labs. They've spawned a "home game" called Fold.it, to spark imaginations and provide a new avenue to the research; and one of their graduating PhD students from the original team 10 years ago started Eterna to study RNA. They've opened the Institute for Protein Design at University of Washington where they ...aim to design a new world of proteins to address 21st century challenges in medicine, energy, and technology. (watch that "energy" space carefully).

These are huge accomplishments, and if you'll note, they each are a foundation for a new beginning, a new avenue, for what is yet to come. Congratulations to BakerLab, and all of the PhD candidates and research assistants that have move on to do their own research over the years. It is with great pride that I render whatever aide I personally can to further your efforts.

Congrats, ModSense, on ten years! I get a thrill knowing I have been able to contribute, albeit in a small and passive way, in the this exciting scientific endeavor. Undoubtably, there are many crunchers who get to enjoy the same thrill at least in part due to your aid with troubleshooting, your patient corrections of myriad misunderstandings, and your clarifications of the intricacies of BOINC and rosetta@home for the benefit of experienced and inexperienced crunchers alike. Thank you for all your help.

Best,
Snags
10) Message boards : Number crunching : Stuck on "Ready to Report" (Message 80208)
Posted 20 Jun 2016 by Snags
Post:
What messages appear in the event log after you hit the update button?

Have you updated either the OS or BOINC recently?

I'm not seeing this problem on my Macs but it's been awhile since I've applied any updates.
MAC 10.10.5 BOINC 7.6.22
MAC 10.6.8 BOINC 7.4.42

Best, Snags
11) Message boards : Number crunching : Memory and CPU problems with Ubuntu 16.04? (Message 79938)
Posted 26 Apr 2016 by Snags
Post:
Sound like memory usage limit. EACH rosetta WU uses on average AT LEAST 0.5 GB of RAM (I have 3 right now using 600+ MB). This RAM usage increases as the WU progresses up to a certain maximum. It doesn't start using the maximum maount of RAM it'll eventually use right at the start... thus this slow increase in RAM usage.

This means that having 4 normal WUs running at the same time will show up in your system monitor as 50% RAM usage JUST from Rosetta.
You BOINC preferences are probably set up yo only allow 50-60% of the maximum RAM, thus BOINC suspends WUs until the RAM usage is below this threshold.


Unless I have invisible friends, it sounds like a reasonable or at least sufficiently plausible explanation. If so, 16.04 is moving in the direction of bloatware, but that is certainly no surprise these days. Further so, it is plausible that few people are running similarly old machines and fewer of them are noticing the performance changes, which could explain the paucity of reports from other observers.

Then again, a lack of further comments from me may only indicate that I've given up and I'm running the machine under Windows 10. Much as I dislike Microsoft, I have to say at least this one isn't a flaming lemon. Yes, there are a couple of things I prefer doing from Linux, but nothing urgent right now.

Oh yeah and by the way, the menu bar problem seems to be widely reported over on Launchpad and they have consolidated most of the reports (including mine) into one giant thread there. Not clear how much progress they are making, but after reading most of it, I'd estimate the probability that it is related to this BOINC problem at under 25%. My own guess is that it involves a new dynamic menu feature that doesn't work correctly, but under Appearance settings I switched it back to static menus and I'm not seeing it now.


What Chilean is trying to point out is that if you have limited BOINC to no more than .5GB per core it is inevitable that you will at least occasionally run into the memory usage limit and see the behavior you have described. Previously you said Ubunto indicated there was available memory at the same time BOINC was suspending tasks with the "waiting for memory" message. This suggests the BOINC preferences are the limiting factor, not Ubunto.

At the beginning of the event log BOINC describes your machine and gives a few details of your preference settings. You should find these lines:
Sat Apr 16 12:50:05 2016 | [name of project] | General prefs: from [name of project] (last modified [date time])
and if you are using local preferences instead of web-based preferences:
Sat Apr 16 12:50:05 2016 | | Reading preferences override file
then:
Sat Apr 16 12:50:05 2016 | | max memory usage when active: xxx.xxMB
Sat Apr 16 12:50:05 2016 | | max memory usage when idle: xxx.xxMB
Sat Apr 16 12:50:05 2016 | | max disk usage: xxx.xxGB
Sat Apr 16 12:50:05 2016 | | max CPUs used: x

What are these values and are they the same for both installations? If they are the same the next step would be to look at the Activity Monitor or whatever it's called in Ubunto and see precisely where your memory is being allocated.

I don't have a computer science degree but Troubleshooting 101 for every subject I've ever dealt with has included: Rule Out The Obvious.

Best,
Snags
12) Message boards : Number crunching : Memory and CPU problems with Ubuntu 16.04? (Message 79937)
Posted 26 Apr 2016 by Snags
Post:

The pattern I have now is to wait until it starts suspending work units. Then I suspend the running units, 4 other work units start running and memory drops back to the area around 30%--and then usage starts creeping up. Eventually it stops running 4 units and I can repeat the process.

This suggests that you have "Leave applications in memory while suspended" unchecked. Each time you suspend a task it is taken out of memory and the work done after the last checkpoint is discarded.

I'm not sure that either Mod.Sense or Chilean have described rosetta's increasing need for memory perfectly precisely. I think rosetta models may require more memory for subsequent stages of processing after the first and that some models precede through more stages of processing than other models within the same task. The caveat is that I haven't actually looked that closely at rosetta's memory behavior in quite a while and I am vaguely aware that the rosetta team spent some time reexamining rosetta's use of memory in the somewhat recent past. Despite this and the fact that Mod.Sense is almost always exactly right, I still think, given the variety of rosetta protocols, it likely that any task increasing it's memory further after the initial setup is behaving appropriately and its need for more memory is not indicative of a memory leak or a bug.

It was clear from your second post that the symptoms you described were most likely the expected result of a memory usage limit with a possible discrepancy between the Ubunto and Windows installations. This could be checked by answering Link's question then checking the event log (per rjs5's suggestion) to see if BOINC was reading the preferences the way you expected. It would also tell you from where BOINC is reading those preferences. You could compare the event logs of the Windows and Ubunto installations to confirm the memory limit preferences are the same. Mod.Sense asked you about this in his first response to you.


Most of the responses have been from BOINC 101 with a sub-unit on rosetta. Mod.Sense's suggestion, to step back from the maze you've entered, look where everyone else is pointing, and double-check the basics, is from Troubleshooting 101.

Best,
Snags
13) Message boards : Number crunching : 300+ TeraFLOPS sustained! (Message 79847)
Posted 5 Apr 2016 by Snags
Post:
Thanks for the update. Wanted to point out that 24hr work units, running more efficiently will simply produce more models in as close to 24hrs as they can.

I think I see what you are saying. You put as many apples of various sizes in the box without overflowing. However, I have seen several tasks that run under 10,000 seconds on the above (and three other) machines in only two days. I think that is very rare, and after checking it is only on the 3.73 tasks.

Also, if they are that short, you would think there would be plenty of room to fit another model in. So it seems that something is making the run times shorter than before, and preventing another model from being run. Maybe there is a limit on the total number of models?

It depends on the type of tasks. I'll just copy and paste what I wrote earlier and perhaps Mod.sense or DEK can correct and/or add detail as necessary:
If memory serves, the 99 model limit was enacted when some tasks created output files too large to be uploaded. The limit only applies to a particular type of task. Others use the preferred cpu time plus 4 method to determine when to end things. When a model is completed the task calculates whether it has time left to complete another model. If the answer is no then the task wraps things up despite there appearing (to the cruncher) hours left. if the answer is yes the tasks will begin another model. All models aren't equal however, even within the same task so some will take longer than predicted. To insure that otherwise good models aren't cut short just before completing (and to increase the odds that the task will complete at least one model) the task will continue past the preferred cpu time. At some point though, you gotta cut your losses and so at preferred cpu time plus 4 hours the watchdog cuts bait and the task goes home. ( I'm curious about the average overtime; my totally uninformed guess is that it's less than an hour.)

There are other types of tasks in which filters are employed to cut off models early. If the model passes the filter it will continue working on that one task to the end. This results in dramatically disparate counts, with one task generating hundreds of models while another task from the same batch only generating one, two, five, etc. Recently on ralph a filter was used to remove models resulting in a file transfer error upon upload. The stderr out listed 13 models from 2 attempts but since the models had been erased the file meant to contain them didn't exist. I'm guessing, based on DEK's post, which I may well have misinterpreted, that the server, possibly as part of a validation check, automatically gives the file transfer error (client error, compute error) when this particular file isn't part of the upload.

All these different strategies result, from the cruncher's point of view, in varied behavior which we struggle to interpret. Is it a problem with my computer or a problem with rosetta? Is it a problem at all? BOINC is complicated enough for the computer savvy, much more so for majority of crunchers who just want to maximize their participation in rosetta and end up massively tangled up in the BOINC settings. The variety of legitimate behaviors exhibited by rosetta tasks trips up the volunteers trying to help them become untangled. From the researcher' point of view everything may look fine, working as expected, and any issues a lone cruncher is having is most likely due to their particular set up. And it probably is, but the lack of information leaves the volunteers flailing.

I have long wished for a reference, a database of tasks, in which the tasks are divided into broad categories of strategies employed (as above, which some info on how they "look " to the crunchers) and what, in a most basic way, is being asked (how does this particular protein fold, how do these two proteins interact, can we create a new protein to do x, etc.)


Best,
Snags
14) Message boards : Number crunching : Minirosetta 3.73-3.78 (Message 79806)
Posted 27 Mar 2016 by Snags
Post:


I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

... Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

Not all. If you check any FFD_ tasks in your list you will see they generate many hundreds of models (I have several with over 1000 models generated).

If memory serves, the 99 model limit was enacted when some tasks created output files too large to be uploaded. The limit only applies to a particular type of task. Others use the preferred cpu time plus 4 method to determine when to end things. When a model is completed the task calculates whether it has time left to complete another model. If the answer is no then the task wraps things up despite there appearing (to the cruncher) hours left. if the answer is yes the tasks will begin another model. All models aren't equal however, even within the same task so some will take longer than predicted. To insure that otherwise good models aren't cut short just before completing (and to increase the odds that the task will complete at least one model) the task will continue past the preferred cpu time. At some point though, you gotta cut your losses and so at preferred cpu time plus 4 hours the watchdog cuts bait and the task goes home. ( I'm curious about the average overtime; my totally uninformed guess is that it's less than an hour.)

There are other types of tasks in which filters are employed to cut off models early. If the model passes the filter it will continue working on that one task to the end. This results in dramatically disparate counts, with one task generating hundreds of models while another task from the same batch only generating one, two, five, etc. Recently on ralph a filter was used to remove models resulting in a file transfer error upon upload. The stderr out listed 13 models from 2 attempts but since the models had been erased the file meant to contain them didn't exist. I'm guessing, based on DEK's post, which I may well have misinterpreted, that the server, possibly as part of a validation check, automatically gives the file transfer error (client error, compute error) when this particular file isn't part of the upload.

All these different strategies result, from the cruncher's point of view, in varied behavior which we struggle to interpret. Is it a problem with my computer or a problem with rosetta? Is it a problem at all? BOINC is complicated enough for the computer savvy, much more so for majority of crunchers who just want to maximize their participation in rosetta and end up massively tangled up in the BOINC settings. The variety of legitimate behaviors exhibited by rosetta tasks trips up the volunteers trying to help them become untangled. From the researcher' point of view everything may look fine, working as expected, and any issues a lone cruncher is having is most likely due to their particular set up. And it probably is, but the lack of information leaves the volunteers flailing.

I have long wished for a reference, a database of tasks, in which the tasks are divided into broad categories of strategies employed (as above, which some info on how they "look " to the crunchers) and what, in a most basic way, is being asked (how does this particular protein fold, how do these two proteins interact, can we create a new protein to do x, etc.)

Best,
Snags

15) Message boards : Cafe Rosetta : how do I get back into other projects, and boincstats (Message 79742)
Posted 9 Mar 2016 by Snags
Post:
Not sure that BAM existed yet at that point-in-time. :(


Ah, well... which I seem to be saying a lot these days. Long shots and wild speculation are failing abysmally (and somewhat predictably, at least I should have noticed we'd run out of tasks!). Fanciful dreams must be next.
16) Message boards : Number crunching : Not getting anymore work units (Message 79705)
Posted 7 Mar 2016 by Snags
Post:
Your daily quota is reduced by one for each failed task and it is possible that you could run out of work while the server refuses to send you more until the next "day" starts. If that were the case you should have seen (before you restarted the computer) a message in the event log indicating as much. Something like, "You have reached your daily quota of x (number of) tasks". We can't see your task list since your computers are hidden so I have no idea if this theory has merit in your case.

There is another possibility. Both of my computers received 24 hour backs after a single request for work resulted in this reply:
Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

Steve, no one on the forum has access to your task list or your event log; we only have your forum post to go by. If you don't know how to access the information via your BOINC Manager post back with your OS and your BOINC version number (if you know it) and hopefully someone will be able to walk you through it.

There has been a few mentions of this "Rosetta Mini for Android is not available for your type of computer." back-off so I've posted a further response in the Minirosetta 3.71 thread

Best,
Snags

edit: I see several others have posted since I started thinking about my reply (and posted in the other thread) : ) Did they actually run out of cpu tasks?
17) Message boards : Number crunching : Minirosetta 3.73-3.78 (Message 79704)
Posted 7 Mar 2016 by Snags
Post:
Both of my computers received 24 hour backs after a single request for work resulted in this reply:
Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long time?

Best,
Snags

edit: I just saw additional posts in this thread that suggest rosie really did run out of cpu tasks. Ah, well. I suppose I should see if I can find BOINC documentation on the back-off settings (documentation that I could actually understand, that is) : /
18) Message boards : Cafe Rosetta : how do I get back into other projects, and boincstats (Message 79680)
Posted 1 Mar 2016 by Snags
Post:
One other thought did pop into my head...since you mentioned BoincStats ...did you ever use BAM? I'm pretty sure BAM uses the weak authenticator to attach to projects so maybe searching for the BAM files would prove fruitful. A long shot, I know, but anything not to be singing the lost password blues ...

Best,
Snags


oooh, did you see they discovered 13 new gamma-ray pulsars recently? (announced a week or two ago). Rosie is my first and enduring love but, yay Einstein!
19) Message boards : Cafe Rosetta : how do I get back into other projects, and boincstats (Message 79651)
Posted 28 Feb 2016 by Snags
Post:
The password and the authenticator are not the same but if you can remember your old password and email address you can sign into your account on Einstein and retrieve your authenticator.

When you originally signed up for Einstein you would have chosen a password and unless you chose a truly random group of letters, numbers and symbols it might be worth going to the Einstein website and trying a few guesses. If you can get into your account on the website go to your account page and scroll to the "account keys" line, click view and voila! there's your authenticator and instructions on using it to attach your computer to Einstein.

Best of luck,
Snags
20) Message boards : Number crunching : Rosetta Badges (Message 79350)
Posted 2 Jan 2016 by Snags
Post:
I agree that the baker-lab is a little under-staffed, if anything.


Are you kidding? BakerLab is one of the bigger team in boinc's world. The large part of projects has two or three admins/developers....

Most of those folks are researchers not admins. I believe they are the people DEK is referring to when, after a bad batch of tasks has come though upsetting the commentariat, he writes "I will speak to the person who submitted those tasks" (cue Jaws theme music).

It's the admin/researcher ratio that's relevant and I share Timo's impression that admin staff is stretched a bit, especially considering the amount and the variety of work the lab conducts.

edited to add: It should be someone's assigned task to take the relevant items from the IPD's news feed and post them in the news forum here (which I believe is automatically added to rosetta@home's front page).


Next 20



©2024 University of Washington
https://www.bakerlab.org