Posts by Timo

1) Message boards : Rosetta@home Science : How large is the problem? (Message 98391)
Posted 31 Jul 2020 by Profile Timo
Post:
Very interesting thread. As someone who works in data science, I appreciate first hand that we are still very resource-constrained in terms of computational throughput, in general as a species. Protein folding is the type of problem that could literally use every single computer and chip on the planet and still not have enough resources to be a 'perfect solution finder'. Hence, there are many shortcuts and approximations and tricks that are used to try to get a 'best guess' with the amount of compute available.

Someday, hopefully we will evolve beyond this resource scarcity via either transformational developments in quantum computing (perhaps soon, ie. within the next couple of decades) or on a longer time horizon we will move up the Kardashev scale. In the meantime, it would be nice if a larger percentage of people would take interest in contributing to efforts like Rosetta@Home to try to maximize the resource availability for the advancement of human knowledge.
2) Message boards : Number crunching : Tasks no longer downloading (Message 97903)
Posted 4 Jul 2020 by Profile Timo
Post:
There was an issue that creeped up a few weeks back with one of the SSL certificates or something related, which was causing downloads to get 'stuck' for older versions of BOINC. Easy fix was to update BOINC to the latest release. Perhaps give that a try if this is still an issue for you.
3) Message boards : Number crunching : [Solved] Uploads and Downloads stopped (Message 97191)
Posted 3 Jun 2020 by Profile Timo
Post:
Hi everyone,

I ran into a problem on two of my boxes which had not yet been repointed to the https version of the project, where their uploads and downloads got stuck in '(Project backoff: retry in ...)' states despite internet connectivity working properly.

Luckily stumbled across this post, and found that, indeed once I upgraded BOINC to the latest version (7.16.7 at the time of writing this), the issue resolved itself.

I guess this just goes to show it's a good idea to update BOINC periodically as new releases come out.
4) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 96866)
Posted 30 May 2020 by Profile Timo
Post:
I'm seeing the same error message in the event log as of this morning and a number of tasks are stuck in 'uploading' status.
I suspect this is only affecting one of my machines which I switched to Rosetta@Home's new SSL URL, which makes sense.
Unfortunate timing for this to happen on a Saturday.
5) Message boards : Rosetta@home Science : Article about Rosetta@Home on HPCWire (Message 92246)
Posted 25 Mar 2020 by Profile Timo
Post:
Sharing a new article posted about Rosetta@Home posted on HPCWire:
https://www.hpcwire.com/2020/03/24/rosettahome-rallies-a-legion-of-computers-against-the-coronavirus/
6) Message boards : Rosetta@home Science : BOINC@TACC would love to offer computational resources (Message 92245)
Posted 25 Mar 2020 by Profile Timo
Post:
Hi, thanks for your offer of support. I'm not sure I understand why you need to ask if we need help? I'm pretty sure that the current spike in work (related to COVID-19) means this project will gladly take all the help they can get. To help, simply install and run the project on your compute resources. Thanks in advance from everyone in the community.
7) Message boards : Number crunching : File transfers. (Message 91671)
Posted 9 Feb 2020 by Profile Timo
Post:
Just a note to help others not have to 'abort transfer' (and thus inadvertently abort tasks that may then never get completed and thus impact research) I've found that closing the BOINC client including checking the checkbox that says 'Stop running tasks when exiting the BOINC manager' and re-starting it, force-retries the downloads and they usually succeed.

Still this is definitely a networking issue on the UW side. Hopefully someone reads this forum post.
8) Message boards : Rosetta@home Science : DRH_Curve (Message 89364)
Posted 1 Aug 2018 by Profile Timo
Post:
Hi Rosetta researchers,

I was curious about what the DRH_Curve jobs are modeling? There seems to be an endless stream of them for the past month or two.

Do these jobs all belong to one person? Are they all looking at a particular protein / family of proteins, or a particular type of situation or..? Just curious.
9) Message boards : Rosetta@home Science : Ginzu vs Structure (prediction 'methods') (Message 87947)
Posted 22 Dec 2017 by Profile Timo
Post:
I was perusing the robetta queue and noticed that mainly jobs fall under two prediction 'methods'.. one of them is 'structure' and the other is 'ginzu'. At a high level, what is Ginzu prediction?
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 87913)
Posted 16 Dec 2017 by Profile Timo
Post:
Seems like it's now fixed (only about 1 minute after I posted this!) Nice work :)
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 87912)
Posted 16 Dec 2017 by Profile Timo
Post:
No new workunits being downloaded on at least 1 of my boxes. Log says:
"Server Error: Feeder not running."

I suspect this just just happened as the server status page still shows the Feeder as running.
12) Message boards : News : Outage notice (Message 86745)
Posted 26 Jun 2017 by Profile Timo
Post:
I had an issue uploading on a couple of my boxes too, but after investigating I discovered it was due to an old 'host' file that had hard-coded DNS entries for the project servers -- something I did myself a little while back as a work-around when the project's DNS registrar was having problems. Simply removing the rah server entries from the host file fixed the issue.
13) Message boards : Number crunching : Question for Researchers about waiting for results (Message 81503)
Posted 20 Apr 2017 by Profile Timo
Post:
Hi David and team.

Some quick background: Yesterday I spent most all of my work day waiting, I write some 'big data' blending jobs in Hive&Pig (and starting to learn Spark), but the transformations I'm working on involve many BILLIONS of records and so even with my company's 160+ node Hadoop cluster, some steps of my transformation take a couple of hours to crunch.

This waiting time really slows down my ability to iterate and test some aspects of my logic. Where possible I try to find a subset of data that can serve as a test case but there are some use cases where this is strategy cannot be applied.

So, my question for you is, with the multi-days/weeks long turn around times of rosetta jobs, how the heck do you manage to iterate in your experiments efficiently and perhaps more importantly how do you ensure that you don't spend a whole two weeks waiting for a run to complete only to find out that there was a typo in the input sequences somewhere? Secondly, what do you do while waiting for jobs to finish?
14) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81428)
Posted 14 Apr 2017 by Profile Timo
Post:
I also have a task that is stuck uploading. Will try to post more details about it tomorrow if it's still stuck when I wake up. I tried putting back the hosts information in case it's a dns problem, also tried flushing the dns resolver cache and removing any host entries too.. Doesn't seem to be DNS related.
15) Message boards : Number crunching : Ryzen (Message 81344)
Posted 18 Mar 2017 by Profile Timo
Post:
2 minutes on google led me to this R7 1700:

http://boinc.bakerlab.org/rosetta/results.php?hostid=2225133

and this R7 1800X:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3181998

For the 1700, assuming it's running 16 threads, from the results returned the RAC is approx 15,210 and for the 1800X it works out to 12,268.

If they're not running 16 threads then the numbers above will be scaled too high. The granted credit-per-second is higher on the 1700 which suggests it's running a lower non-Rosetta load, or is overclocked quite a lot. Or maybe the 1800X gets powered down more frequently in that period. I'd guess it's due to load though.

If those numbers are correct, then they're really good at Rosetta and I want one. Or five.


Great finds! Very exciting to see competition in the higher end again. Hopefully they sell enough of these to recoup their investment and keep up the momentum at their R&D labs in AMD.

A few things I've read about Ryzen have explained that finding the right memory configuration is key to getting the best performance out of these chips as keeping all 16 cores fed requires lots of fast memory bandwidth. Furthermore, some people complained that it was very difficult to find a motherboard+memory combination that actually allowed the memory to run at full speeds, hoping as the motherboard selection starts to mature a bit and more OEMs jump in the ring these issues smooth themselves out. :)
16) Message boards : Number crunching : DNS Problems and Late Work Units (Message 81317)
Posted 12 Mar 2017 by Profile Timo
Post:
[quote] There seems to be a deep assumption in there somewhere that most of the clients are supposed to be running continuously for many hours at a time. (Some of mine do, and others don't.)

The client preferences (on the website side) have the option of creating multiple (separate) profile categories. You could set up one with shorter target run-times for your systems that run less frequently and another with longer run-times for systems that run more continuously. The available range is quite large, as low as 1 hour and as high as 24 hours.

Either way, it sounds like in your particular case the amount of work being cached may be set too high. The BOINC manager will adjust the cached amount over time to match the percentage of time BOINC is running per 24 hour period, but it takes time to adjust and any changes to the settings will take more than a few days to get applied. (This is the same reason why many people suggest that any changes in target run-times be made in smaller incremental steps).
17) Message boards : Number crunching : Ryzen (Message 81221)
Posted 24 Feb 2017 by Profile Timo
Post:
Very excited to see some competition in the higher end of the CPU market again. This is a huge win for high performance computing / BOINC / Rosetta etc. too

These chips look to pack an amazing overall amount of compute power inside a surprisingly energy efficient envelope.

AMD unveils Ryzen launch dates, clock speeds, performance, pricing [ExtremeTech.com]
18) Message boards : Number crunching : Unintended consequences of the new credit system? (Message 80931)
Posted 23 Dec 2016 by Profile Timo
Post:
I don't blame R@H for having deadlines at all, in fact I keep a short work buffer (0.3 days on most boxes) and have WCG setup as a backup project with low priority so it will just kick in if R@H goes down. This gives everyone the best of both worlds, the average TAT for work on my machines is less than a day, and my boxes stay busy (with a fallback to WCG) even if R@H goes down.

Quick work turnaround is important to me because I am thinking first and foremost about the researchers using this platform. I work with some high performance computing clusters in my day-job and I can attest first hand that long turn around times for queries/model runs leads directly to slower iteration and slower progress.
19) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 80786)
Posted 26 Oct 2016 by Profile Timo
Post:
[snip]

some of the simulations being run on R@H can complete in just a few days

DAYS? For one work unit? I've never read anywhere if the researchers prefer a long run-time-per-WU over a short run-time-per-WU so I've always (7 years now) had my preferences set to shorter WUs.
Thanks for the input Timo, I appreciate it.

I've read that they wanted longer workunits selected as a way to reduce the load on the server. I haven't seen whether they changed their minds on that, though.


To be clear, I was advocating for having a shorter work 'buffer' - ie. no need to have 10+ days of work buffered and thus drive up the average 'turn around time' of a given WU (even a short one) to be many days because it gets downloaded onto the end of a really long queue..

By all means, set a longer WU target run length. What I think is probably not a good idea is a 10 day buffer/cache of 1 Hour WUs. Basically, high server load coupled with long turn around time..

As someone who deals with crunching data daily at my day job, I appreciate first hand the frustration of waiting for queries to run that take many hours, I can't imagine being asked to iterate and experiment in an environment where queries take many days or even weeks to complete.. hence why I think a small cache is better for the project just in terms of enabling faster iterative experimentation.
20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 80778)
Posted 25 Oct 2016 by Profile Timo
Post:
@LC - Just checking the obvious here, so I don't mean to be insulting by asking such a simple question but it's happened to myself before without even noticing so I thought I would ask/mention... Did you by chance hit the 'Show active tasks' button at the top left of the 'Tasks' pane of BOINC (If so it will currently say 'Show all tasks', try toggling it and seeing if you indeed do have more work buffered).

Secondly, a 10 day buffer (if I read your post right, that is what you had set?) seems really really excessive, some of the simulations being run on R@H can complete in just a few days - why be the one guy slowing down pace of iteration/experimentation by making researchers wait a whole 10 days? I usually keep a 1 day buffer, and have 'Mapping Cancer Markers' (via World Community Grid) as my 'backup' project when R@H is out of work.. just a thought.


Next 20



©2024 University of Washington
https://www.bakerlab.org