Posts by Brian Nixon

21) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101029)
Posted 3 Apr 2021 by Brian Nixon
Post:
There does seem to be a backoff when a task fails. With sched_op_debug selected in your event log options, you should see it logged as
[sched_op] Deferring communication for …
[sched_op] Reason: Unrecoverable error for task …
But as that’s client-side I would have expected it to be reset if you manually Update a project.
22) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101028)
Posted 3 Apr 2021 by Brian Nixon
Post:
Who is the guilty party
We’re here to help with scientific research, not to point fingers at the people doing it. Somebody made a mistake, and an experiment failed. It happens; that’s how people learn.
23) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101015)
Posted 2 Apr 2021 by Brian Nixon
Post:
Does the panel know (a) how long these errors will continue (b) how many good tasks I need to return to get back into Rosetta’s good books?
(a) With 1.1 million jobs in the queue and a completion rate around 280,000 per day, I’d estimate at least 4 days…

(b) At just shy of 500 max per day you still are in Rosetta’s good books, so number of tasks isn’t the issue. If it’s just backoff times you’re running in to, either that’s set by the server and there’s nothing you can do about it, or you can try to force a connection by selecting Update on the Projects page.
24) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101014)
Posted 2 Apr 2021 by Brian Nixon
Post:
Bandwidth usage massively increased in March
This might be at least in part due to the current batch of work units suffering an unusually high failure rate, meaning you will be downloading a lot more tasks than normal in any given period. As an extreme example, your Threadripper has had over 300 failures in the last few days. As there’s no way to tell bad tasks from good before they’ve downloaded and started, there’s nothing we can do about it other than let them run their course (or stop running Rosetta until they’ve passed).

In BOINC Manager you can set a limit on the amount of data transferred in a given period. It’s not very sophisticated and only works per machine, so when you’ve got several the best you can do is set an allowance for each one as a proportion of your total limit based on the number of tasks you expect it to run. (And if you do set a limit you then need to keep an eye out for it being reached, at which point even small results files for completed tasks won’t be uploaded.)

Bad tasks aside, one way to reduce the overall amount of network traffic while performing the same amount of work is to increase the target run time for tasks in your project preferences. Even though a longer run time might increase the upload size needed for each task (due to the greater number of results), that is often far outweighed by the saving in download size (which is fixed for each task, however long it runs for). The credit per hour is more or less the same whatever target run time you choose.
25) Questions and Answers : Windows : Intel old i7 processor faster than newer i9? (Message 100992)
Posted 1 Apr 2021 by Brian Nixon
Post:
Don’t worry. The performance measurements from BOINC’s Whetstone benchmark show the i7 at 4.96 GFLOPS and the i9 at 5.95, and the i9 is already earning substantially more credit per task. Everything’s fine…
26) Questions and Answers : Windows : Intel old i7 processor faster than newer i9? (Message 100987)
Posted 1 Apr 2021 by Brian Nixon
Post:
The i7 appears to BOINC to be a new machine (as the i9 has inherited the older ID), and for new hosts the initial remaining time estimate is always off. Give it a few days and you should find that it too will show each task as needing 8 hours to run.
27) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100985)
Posted 1 Apr 2021 by Brian Nixon
Post:
Nobody was asking or expecting you to abort the jobs – but what’s done is done, and cannot be undone. It makes no difference to the project who runs them, so please don’t be dissuaded from participating. The ones that weren’t resends are already out to other hosts. My machines are out of Rosetta work primarily because of the way I chose to set them up, and I’m too lazy to go round and change them all just to work around a bug in the work unit configuration. It’s arguably better that machines capable of running the ‘big’ tasks don’t pick up the ‘small’ ones, so that less-powerful machines do have a chance to run something.
28) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100983)
Posted 1 Apr 2021 by Brian Nixon
Post:
Ho hum.
You’ve got 4 tasks running and 20 ready to start. That’s 24 more than a lot of other people…
29) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100980)
Posted 1 Apr 2021 by Brian Nixon
Post:
I have seen several periods of downtime where work units have not been deployed for days at a time. I don't mind having a Pi dedicated to the task so long as it's doing real work and not just heating the room.

I also get the idea of other projects and I looked into it, however my current setup is severely restricted in that department. I am currently using the Balena client image mostly out of convenience and I have not found another project compatible with that and my processor architecture.
If you have access to the BOINC Manager application, you might try adding World Community Grid. That reportedly has an ARM Linux application for its Open­Pandemics sub-project, and much smaller work units. Otherwise it might be worth getting in touch with Balena to explain the issue and see if they would consider adding something for a different project in the same way they did for Rosetta (though they may find it harder to convince IBM than Baker Lab to let them hack at their applications; SiDock is another similar project without an ARM build (yet) that might benefit from that kind of effort).

Do bear in mind that we are here to help the project, not the other way round. If they happen not to have any work that needs doing at any given time, it’s their choice not to make use of a resource that’s available to them, not a cause for us to complain.

I’m not sure which part of the U.K. you’re in where an idle Pi is useful for heating, but I think I’d like to move there… (I’ve got four 8-⁠cylinder Xeons pulling 500 W out the wall and barely keeping the place warm…)
30) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100977)
Posted 1 Apr 2021 by Brian Nixon
Post:
should I set up any firewall rules?
Assuming it’s the same as on Windows:

The only thing that requires Internet access is the client, and it only makes HTTP(S) connections to the project servers. So you need to open tcp/80 and/or tcp/443 outbound (plus udp/53 or whatever else your DNS needs if that’s not handled by a separate resolver); everything else can be blocked.
31) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100976)
Posted 1 Apr 2021 by Brian Nixon
Post:
Each task has 3 processes using the same amount of ram, but only one of those 3 is using the cpu, the other two are near zero cpu time. So, 6 tasks, 18 processes using 1-2gb each.
That doesn’t sound right, but as I don’t run BOINC on Linux I can’t add more…
32) Message boards : Cafe Rosetta : can't get new work (Message 100970)
Posted 1 Apr 2021 by Brian Nixon
Post:
With 2 GB, your machine does not have enough memory to be sent any of the current batch of work units, which are marked as requiring up to 6.6 GB (even if in practice they need much less). Until Rosetta has some smaller work units again, you might want to consider a different project.
33) Questions and Answers : Windows : Intel old i7 processor faster than newer i9? (Message 100969)
Posted 1 Apr 2021 by Brian Nixon
Post:
Rosetta@home tasks are fixed duration, not fixed work. They will all run to a target CPU time of 8 hours (or whatever you choose in your project preferences) regardless of CPU speed. The recent tasks on the i7 ran for 8 hours, not 5.
34) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100936)
Posted 31 Mar 2021 by Brian Nixon
Post:
Are you certain it’s Rosetta using all 30 GB? None of your recently-completed tasks shows anything close to that kind of memory usage.
35) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100933)
Posted 31 Mar 2021 by Brian Nixon
Post:
It’s largely luck of the draw. If you happen to contact the server when it has some ‘small’ tasks ready to send, you will get them. But if it only has ‘big’ ones ready to go at that moment, and your machine is outside the limits, you won’t. And in that case the client will back off (for longer and longer durations, up to 1½ days) before asking again.

Also there doesn’t appear to be any mechanism stopping ‘small’ tasks going to ‘big’ machines – so the more that happens the less likely it becomes that there is work available for others…
36) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100931)
Posted 31 Mar 2021 by Brian Nixon
Post:
the post dated info where ir actually ran ok
For some examples, look at Grant’s recent valid tasks. There doesn’t seem to be anything unusual about their memory and disk usage.
37) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100926)
Posted 31 Mar 2021 by Brian Nixon
Post:
Still odd that with my number of cores/threads and available system RAM i haven't had issues.
It must be the case that the server only considers a host’s total available RAM and disk space (not per core) in deciding whether a task is suitable.

So if a task tells the server it might need 6.6 GB of RAM, the server will never send it to any host with less (even if in practice it would not need anywhere near that much), but it will happily send you 24 of them because they can run (just maybe not all at the same time).

There can’t be many machines with >6 GB RAM per core…
38) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100924)
Posted 31 Mar 2021 by Brian Nixon
Post:
I've had no issues with insufficient disk space or memory.
This points to a misconfiguration of the new batch of work units, as it seems unlikely it would be the project’s intention to cut off a third of its capacity…

Look in client_state.xml for the rsc_memory_bound and rsc_disk_bound settings of the new work units: they used to be 1,800,000,000 each; to yield the errors people are reporting they must now be set to 7,000,000,000 and 9,000,000,000.

How big is your BOINC data directory now? Did the new batch need to download any unusually large files (such as a new protein database)? The issue I have is not so much disk space (though it will be a pain to have to repartition every machine) as download size, since I’m on a capped data plan.
39) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100922)
Posted 31 Mar 2021 by Brian Nixon
Post:
Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.
You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)…
40) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 100907)
Posted 30 Mar 2021 by Brian Nixon
Post:
Is that high memory size requirement normal ?
It wasn’t normal before today. Whether it becomes the new normal, we’ll have to wait and see. It’s also not yet clear whether these tasks do actually need all the memory and disk space they say they do, or whether it’s a misconfiguration.


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org