Question for Researchers about waiting for results

Message boards : Number crunching : Question for Researchers about waiting for results

To post messages, you must log in.

AuthorMessage
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,644,940
RAC: 271
Message 81503 - Posted: 20 Apr 2017, 16:02:21 UTC

Hi David and team.

Some quick background: Yesterday I spent most all of my work day waiting, I write some 'big data' blending jobs in Hive&Pig (and starting to learn Spark), but the transformations I'm working on involve many BILLIONS of records and so even with my company's 160+ node Hadoop cluster, some steps of my transformation take a couple of hours to crunch.

This waiting time really slows down my ability to iterate and test some aspects of my logic. Where possible I try to find a subset of data that can serve as a test case but there are some use cases where this is strategy cannot be applied.

So, my question for you is, with the multi-days/weeks long turn around times of rosetta jobs, how the heck do you manage to iterate in your experiments efficiently and perhaps more importantly how do you ensure that you don't spend a whole two weeks waiting for a run to complete only to find out that there was a typo in the input sequences somewhere? Secondly, what do you do while waiting for jobs to finish?
ID: 81503 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81504 - Posted: 20 Apr 2017, 18:57:16 UTC - in response to Message 81503.  

"how the heck do you manage to iterate in your experiments efficiently?"

We typically submit large batches of jobs per iteration and when we are satisfied with the results, we cancel jobs that are still queued but jobs that are on clients will continue to run. Having short turn around times and machines that are continually crunching and networked would obviously make this more efficient.

"how do you ensure that you don't spend a whole two weeks waiting for a run to complete only to find out that there was a typo in the input sequences somewhere?"

We try to be careful :) and we almost never have to manually type sequences.

"what do you do while waiting for jobs to finish?"

There's always stuff to do. Depending on the researcher, one can prepare more jobs, analyze data, develop new methods, write, refactor, and debug code, think of and do other experiments (computational and/or wet lab), write papers, go to meetings, respond to forum posts, etc etc etc.....
ID: 81504 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Question for Researchers about waiting for results



©2024 University of Washington
https://www.bakerlab.org