compression of files

Message boards : Number crunching : compression of files

To post messages, you must log in.

AuthorMessage
Profile Runaway1956

Send message
Joined: 5 Nov 05
Posts: 19
Credit: 535,400
RAC: 0
Message 12649 - Posted: 25 Mar 2006, 2:54:45 UTC
Last modified: 25 Mar 2006, 3:00:54 UTC

I have 4 computers running, with a combined RAC of 850 (ATM). I'm downloading a heckuva lot of bits and bytes - on a dial up connection.

I realize that most people these days are on broadband, and they don't realize how BIG a 2 MB file is. But, it takes minutes to download each file on 56k. In fact, most of my connection hours seem to be Boinc Rosetta downloads. :(

Would it be possible to compress the download files, so they download faster?

Even the really fast broadband users might benefit, considering that some have bandwidth restrictions in thier contracts. (Exceed "x" gig download limit, pay a premium type of thing.)

I opened one of the 2 MB .gz files at random, using Winrar. It extracted a 6 MB file, which I compressed again using Winrar, at best compression.

hom001_aa1ten_09_05.200_v1_3.gz starts out at 2,152,755 bytes

hom001_aa1ten_09_05.200_v1_3 extracted is 6,581,169 bytes.

hom001_aa1ten_09_05.200_v1_3.rar is 918,056 bytes.

In other words, I can compress those files down to ~ 43% of the size they are being shipped at. Meaning, my computers would only be using ~ 43% of the time they are now using on the dial-up.

Not to mention, some of the big league crunchers might save a couple dollars a month on their broadband connections.

Anyone can run the same test - just grab a file or six at random from your Rosetta project folder, and see what you can do with them. (copy them somewhere else to play with them - don't mess up your Rosetta folder. ;) )

What do ya say, folks? Can we get better compression on those .gz files, please????

Note that if my RAC goes up much more, the dial-up connection won't keep up with the crunching. :(


ID: 12649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 12659 - Posted: 25 Mar 2006, 6:29:49 UTC

Already discussed Here


ID: 12659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Runaway1956

Send message
Joined: 5 Nov 05
Posts: 19
Credit: 535,400
RAC: 0
Message 12680 - Posted: 25 Mar 2006, 13:11:59 UTC

Thanks Scribe.

I read the links you supplies, as well as doing some browsing before I found your post. ;)

I have gone into preferences, and set my target runtime to the highest - 1 day.

Probably won't see any difference for a day or two, maybe Monday or Tuesday I can report that I'm using less bandwidth to do as much or more work.

I certainly hope that David and crew are considering a change to compression methods - Gzip looks like it is probably the best solution to that specific problem.

It's still not clear to me how that CPU runtime thing will reduce bandwidth - I'm off to read some more, lol
ID: 12680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 12684 - Posted: 25 Mar 2006, 14:11:41 UTC - in response to Message 12680.  

Thanks Scribe.

I read the links you supplies, as well as doing some browsing before I found your post. ;)

I have gone into preferences, and set my target runtime to the highest - 1 day.

Probably won't see any difference for a day or two, maybe Monday or Tuesday I can report that I'm using less bandwidth to do as much or more work.

I certainly hope that David and crew are considering a change to compression methods - Gzip looks like it is probably the best solution to that specific problem.

It's still not clear to me how that CPU runtime thing will reduce bandwidth - I'm off to read some more, lol


It is designed to allow you to run any particular WU for a longer time. So instead of downloading say 10 WUs to get 4 days work, you could download a single WU and run it for 4 days. The bandwidth of any single WU is the same, you just have to download fewer of them.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 12684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Runaway1956

Send message
Joined: 5 Nov 05
Posts: 19
Credit: 535,400
RAC: 0
Message 12688 - Posted: 25 Mar 2006, 15:14:26 UTC - in response to Message 12684.  

Thanks Scribe.

I read the links you supplies, as well as doing some browsing before I found your post. ;)

I have gone into preferences, and set my target runtime to the highest - 1 day.

Probably won't see any difference for a day or two, maybe Monday or Tuesday I can report that I'm using less bandwidth to do as much or more work.

I certainly hope that David and crew are considering a change to compression methods - Gzip looks like it is probably the best solution to that specific problem.

It's still not clear to me how that CPU runtime thing will reduce bandwidth - I'm off to read some more, lol


It is designed to allow you to run any particular WU for a longer time. So instead of downloading say 10 WUs to get 4 days work, you could download a single WU and run it for 4 days. The bandwidth of any single WU is the same, you just have to download fewer of them.





OK, first thing I thought was, that will kill my RAC. But, that's not true, as credit = computer time x bench, roughly speaking.

Second thought, what's the purpose in crunching the same unit longer? I'd have to go back and read to get it exactly right - but I'm giving the computer more time to build more models - sort of double and triple checking the work. I'll read more, but that's the idea I get from it.

So, pushing the runtime up actually helps the science, and costs me nothing in credits, right?

I like it. ;)
ID: 12688 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 12689 - Posted: 25 Mar 2006, 15:24:10 UTC - in response to Message 12688.  

......So, pushing the runtime up actually helps the science, and costs me nothing in credits, right?



....right! :thumb

ID: 12689 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,586,475
RAC: 9,872
Message 12696 - Posted: 25 Mar 2006, 19:15:58 UTC - in response to Message 12688.  

Thanks Scribe.

I read the links you supplies, as well as doing some browsing before I found your post. ;)

I have gone into preferences, and set my target runtime to the highest - 1 day.

Probably won't see any difference for a day or two, maybe Monday or Tuesday I can report that I'm using less bandwidth to do as much or more work.

I certainly hope that David and crew are considering a change to compression methods - Gzip looks like it is probably the best solution to that specific problem.

It's still not clear to me how that CPU runtime thing will reduce bandwidth - I'm off to read some more, lol


It is designed to allow you to run any particular WU for a longer time. So instead of downloading say 10 WUs to get 4 days work, you could download a single WU and run it for 4 days. The bandwidth of any single WU is the same, you just have to download fewer of them.





OK, first thing I thought was, that will kill my RAC. But, that's not true, as credit = computer time x bench, roughly speaking.

Second thought, what's the purpose in crunching the same unit longer? I'd have to go back and read to get it exactly right - but I'm giving the computer more time to build more models - sort of double and triple checking the work. I'll read more, but that's the idea I get from it.

So, pushing the runtime up actually helps the science, and costs me nothing in credits, right?

I like it. ;)


You can run lots of tests on each work unit - the more you run on each, the fewer WUs you have to download ;)

HTH
Danny
ID: 12696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 12701 - Posted: 25 Mar 2006, 20:18:13 UTC

At the moment, we participants are looking for 10,000 models for each of the WUs that are released. (When we get lots more participants, they'd like to bump that up to 100,000 for certain types of WUs which aren't being resolved very well with only 10,000 models.)

The project doesn't care if 1 model is returned from 10,000 machines, 10 models are returned from 1000 machines, 100 models are returned from 100 machines, or 1000 models are returned from 10 machines; as long as they all get returned in time.

Welcome aboard.. and have fun reading through all the discussions on science and project progress to get a good feel for what's going on here. :)
ID: 12701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Runaway1956

Send message
Joined: 5 Nov 05
Posts: 19
Credit: 535,400
RAC: 0
Message 12761 - Posted: 28 Mar 2006, 16:58:26 UTC

Well, it seems to be working.

There was a glitch, at first. The WU didn't want to be worked, it seemed, when the time went from 2 hours up to 24. I set the time back to 8 hours, it seemed to help - but I eventually had to reset two of the machines. They would just hang, but it didn't seem like that dreaded 1% hangup - one WU got to 12% and hung, another got to 78% and hung.

Restarting the machine would set that particular WU back to 0% - resulting in wasted time, and lost results.

It seems all the machines are finally settled in, doing one WU each 8 hours. And, the family is off my butt, complaining about hogging all the bandwidth.

I may try 24 hours again - but not for a little while.

Thanks guys - I've gained a lot of knowlege from this thread.
ID: 12761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : compression of files



©2024 University of Washington
https://www.bakerlab.org