QX6700 or XEONs What should you choose for crunching

Message boards : Number crunching : QX6700 or XEONs What should you choose for crunching

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32528 - Posted: 12 Dec 2006, 17:40:57 UTC - in response to Message 32522.  

I upgraded my V8 (cores) computer to Windows VISTA. no slow down :)

The monster running Vista

it is getting back to number one position, it will beat my overclocked quad core over night i think.

still no saturation of the front side buses ... that does not happen mister Ruiz!
(yep! I got 2 FSB on this motherboard! the snooping filters are more efficent than Hypertransport aging protocole)


who?


And again, Rosetta isn't going to saturate any bus on a machine with decent L2 cache, so why are you going on about the FSB/HyperTransport - it's quite clear that you DO understand that this is not an issue from previous posts. It may be an issue for some other applications, but not for Rosetta, so Rosetta makes a very poor example for comparing these things, right?

--
Mats


because some people still saying that my front side bus is saturating, and it is not true either ... Got the point?

With the prefect, the FSB NEVER saturate, who else say it is a lier. (except in few known case that are not realistic, because synthetic benchmarks)


who?
ID: 32528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32529 - Posted: 12 Dec 2006, 17:47:45 UTC - in response to Message 32522.  

I upgraded my V8 (cores) computer to Windows VISTA. no slow down :)

The monster running Vista

it is getting back to number one position, it will beat my overclocked quad core over night i think.

still no saturation of the front side buses ... that does not happen mister Ruiz!
(yep! I got 2 FSB on this motherboard! the snooping filters are more efficent than Hypertransport aging protocole)


who?


And again, Rosetta isn't going to saturate any bus on a machine with decent L2 cache, so why are you going on about the FSB/HyperTransport - it's quite clear that you DO understand that this is not an issue from previous posts. It may be an issue for some other applications, but not for Rosetta, so Rosetta makes a very poor example for comparing these things, right?

--
Mats


i just vTuned the work loads i was crunching on my quad core, your statement is actually wrong, there is 10% of the time spent on memory load and store, it does not cost on Core 2 because the success rate of the prefetcher is high, and every thing gets into the cache before the load unit needs it.
I know that some work load of Rosetta varies a lot, i guess, you miss understood some part of the algorythm, there is few pointer chassing going on that requires FSB or old hypertransport ...

remember, each workload of rosetta can be dramatically different, based on the kind of structure you are processing.

whowho?
ID: 32529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32530 - Posted: 12 Dec 2006, 18:11:09 UTC - in response to Message 32529.  

I upgraded my V8 (cores) computer to Windows VISTA. no slow down :)

The monster running Vista

it is getting back to number one position, it will beat my overclocked quad core over night i think.

still no saturation of the front side buses ... that does not happen mister Ruiz!
(yep! I got 2 FSB on this motherboard! the snooping filters are more efficent than Hypertransport aging protocole)


who?


And again, Rosetta isn't going to saturate any bus on a machine with decent L2 cache, so why are you going on about the FSB/HyperTransport - it's quite clear that you DO understand that this is not an issue from previous posts. It may be an issue for some other applications, but not for Rosetta, so Rosetta makes a very poor example for comparing these things, right?

--
Mats


i just vTuned the work loads i was crunching on my quad core, your statement is actually wrong, there is 10% of the time spent on memory load and store, it does not cost on Core 2 because the success rate of the prefetcher is high, and every thing gets into the cache before the load unit needs it.
I know that some work load of Rosetta varies a lot, i guess, you miss understood some part of the algorythm, there is few pointer chassing going on that requires FSB or old hypertransport ...

remember, each workload of rosetta can be dramatically different, based on the kind of structure you are processing.

whowho?



Yeah, ok, there may be some memory traffic, but I've yet to see a single case where Rosetta is even close to saturating the memory traffic, which is why I stated that Rosetta is a poor benchmark for whether the bus is "efficient" or not. Can we agree on that?

--
Mats
ID: 32530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32532 - Posted: 12 Dec 2006, 19:16:52 UTC - in response to Message 32530.  
Last modified: 12 Dec 2006, 19:20:55 UTC

I upgraded my V8 (cores) computer to Windows VISTA. no slow down :)

The monster running Vista

it is getting back to number one position, it will beat my overclocked quad core over night i think.

still no saturation of the front side buses ... that does not happen mister Ruiz!
(yep! I got 2 FSB on this motherboard! the snooping filters are more efficent than Hypertransport aging protocole)


who?


And again, Rosetta isn't going to saturate any bus on a machine with decent L2 cache, so why are you going on about the FSB/HyperTransport - it's quite clear that you DO understand that this is not an issue from previous posts. It may be an issue for some other applications, but not for Rosetta, so Rosetta makes a very poor example for comparing these things, right?

--
Mats


i just vTuned the work loads i was crunching on my quad core, your statement is actually wrong, there is 10% of the time spent on memory load and store, it does not cost on Core 2 because the success rate of the prefetcher is high, and every thing gets into the cache before the load unit needs it.
I know that some work load of Rosetta varies a lot, i guess, you miss understood some part of the algorythm, there is few pointer chassing going on that requires FSB or old hypertransport ...

remember, each workload of rosetta can be dramatically different, based on the kind of structure you are processing.

whowho?



Yeah, ok, there may be some memory traffic, but I've yet to see a single case where Rosetta is even close to saturating the memory traffic, which is why I stated that Rosetta is a poor benchmark for whether the bus is "efficient" or not. Can we agree on that?

--
Mats

well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable. I don't even speak about the cost of crossing through the NUMA bridge, this will even slow down more. CPU1 getting data into the memory controler of the CPU2, due to thread migration is the worst design case i saw. K8 + Hypertransport cross link has good bandwidth, but we learn with Pentium 4 that is not the only important point. predicting and feeding with low latency is what matter.




got the point?

who?
ID: 32532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32534 - Posted: 12 Dec 2006, 19:40:35 UTC - in response to Message 32532.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats

ID: 32534 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32547 - Posted: 12 Dec 2006, 22:48:59 UTC - in response to Message 32534.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats



Are you saying that Rosetta work load are fitting into 1Mb ? i don t think so

who?
ID: 32547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32588 - Posted: 13 Dec 2006, 14:42:29 UTC - in response to Message 32547.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats



Are you saying that Rosetta work load are fitting into 1Mb ? i don t think so

who?


At least sufficently to not notice any delays [assuming you mean 1MB (Mb = Megabit, MB = Megabyte in my way of writing things].

I tried oprofile for "BU_FILL", which is essentially misses in both L1 and L2 caches, and I got around 10 interrupts per second on that [so about 13M fills per second, one fill = 64byte => 800MB/s - which is about 10% of the bus capacity.

If we get around 10% usage on the bus (per core), I don't see how it can be improved by 10% - that's an "infinite" improvement, which are usually not achievable in real world scenarios.

--
Mats
ID: 32588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32593 - Posted: 13 Dec 2006, 17:09:39 UTC - in response to Message 32588.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats



Are you saying that Rosetta work load are fitting into 1Mb ? i don t think so

who?


At least sufficently to not notice any delays [assuming you mean 1MB (Mb = Megabit, MB = Megabyte in my way of writing things].

I tried oprofile for "BU_FILL", which is essentially misses in both L1 and L2 caches, and I got around 10 interrupts per second on that [so about 13M fills per second, one fill = 64byte => 800MB/s - which is about 10% of the bus capacity.

If we get around 10% usage on the bus (per core), I don't see how it can be improved by 10% - that's an "infinite" improvement, which are usually not achievable in real world scenarios.

--
Mats



I am very impressed, it sound like Hyperthrans has a compression algorythm or something, because it uses less memory space than any other processor i used.

you probably want to review your figures again, your measurement tool is broken, or you forgot to count the prefected cache lines. (Most of the measurement tool exclude them, AMD tool does at least ...)

now, hypertrans is magic ... cool :) that explain why Voodoo like it so much before ...

who?
ID: 32593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32595 - Posted: 13 Dec 2006, 18:03:48 UTC - in response to Message 32593.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats



Are you saying that Rosetta work load are fitting into 1Mb ? i don t think so

who?


At least sufficently to not notice any delays [assuming you mean 1MB (Mb = Megabit, MB = Megabyte in my way of writing things].

I tried oprofile for "BU_FILL", which is essentially misses in both L1 and L2 caches, and I got around 10 interrupts per second on that [so about 13M fills per second, one fill = 64byte => 800MB/s - which is about 10% of the bus capacity.

If we get around 10% usage on the bus (per core), I don't see how it can be improved by 10% - that's an "infinite" improvement, which are usually not achievable in real world scenarios.

--
Mats



I am very impressed, it sound like Hyperthrans has a compression algorythm or something, because it uses less memory space than any other processor i used.

you probably want to review your figures again, your measurement tool is broken, or you forgot to count the prefected cache lines. (Most of the measurement tool exclude them, AMD tool does at least ...)

now, hypertrans is magic ... cool :) that explain why Voodoo like it so much before ...

who?


Ehm, first of all, did I even mention anything about Hypertransport in my post? Not so. If the memory management works right (Linux does), the process should allocate memory from the local processor, which means that hypertransport doesn't come into the question. Memory controller is the local memory controller, so hypertransport shouldn't get involved for that. [Yes, there's obviously snoop messages for each cache-line, but that should be fairly short compared to a cache-line]. Of course, there is indeed no guarantee that the process is kept in the same processor, in which case cross-processor traffic starts to affect things.

I did not find any (different) way to measure the actual memory transfers, so you may be correct that there are prefetches there...

Just out of curiosity, what sort of memory transfers are you seeing on Rosetta (in MB/s for example)?

By the way, the working set where Rosetta is most active is an array of 300 x 6 x 4 bytes. It does reference a whole bunch of other variables in the function, but that's by far the largest one. 300 x 6 x 4 is MUCH smaller than 1MB.

There is lots of other data, but it's very rarely accessed in general.

--
Mats
ID: 32595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32612 - Posted: 14 Dec 2006, 0:26:49 UTC - in response to Message 32595.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats



Are you saying that Rosetta work load are fitting into 1Mb ? i don t think so

who?


At least sufficently to not notice any delays [assuming you mean 1MB (Mb = Megabit, MB = Megabyte in my way of writing things].

I tried oprofile for "BU_FILL", which is essentially misses in both L1 and L2 caches, and I got around 10 interrupts per second on that [so about 13M fills per second, one fill = 64byte => 800MB/s - which is about 10% of the bus capacity.

If we get around 10% usage on the bus (per core), I don't see how it can be improved by 10% - that's an "infinite" improvement, which are usually not achievable in real world scenarios.

--
Mats



I am very impressed, it sound like Hyperthrans has a compression algorythm or something, because it uses less memory space than any other processor i used.

you probably want to review your figures again, your measurement tool is broken, or you forgot to count the prefected cache lines. (Most of the measurement tool exclude them, AMD tool does at least ...)

now, hypertrans is magic ... cool :) that explain why Voodoo like it so much before ...

who?


Ehm, first of all, did I even mention anything about Hypertransport in my post? Not so. If the memory management works right (Linux does), the process should allocate memory from the local processor, which means that hypertransport doesn't come into the question. Memory controller is the local memory controller, so hypertransport shouldn't get involved for that. [Yes, there's obviously snoop messages for each cache-line, but that should be fairly short compared to a cache-line]. Of course, there is indeed no guarantee that the process is kept in the same processor, in which case cross-processor traffic starts to affect things.

I did not find any (different) way to measure the actual memory transfers, so you may be correct that there are prefetches there...

Just out of curiosity, what sort of memory transfers are you seeing on Rosetta (in MB/s for example)?

By the way, the working set where Rosetta is most active is an array of 300 x 6 x 4 bytes. It does reference a whole bunch of other variables in the function, but that's by far the largest one. 300 x 6 x 4 is MUCH smaller than 1MB.

There is lots of other data, but it's very rarely accessed in general.

--
Mats


So, I guess, the 155Megs around it are there for fun (yep ... the executable allocate 140 to 150Megs ...)

i see a different story than yours, I am in the process of getting the source code, I ll let you know later.


who?
ID: 32612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32633 - Posted: 14 Dec 2006, 10:48:50 UTC - in response to Message 32612.  


well, by getting the memory cache lines in time in the load units, as Core 2 does, you get 10% performance improvement. 10% is not a small number. Hypertransport innefficency probably cost 8 to 10% to the K8 on rosetta, that is not negligeable.

got the point?

who?


1. Automatic memory prefetching has been in K7/K8 processors since the introduction of Athlon MP some 5 or 6 years ago.
2. I still don't see enough cache-misses/read/write requests on my (current sample) to motivate a 10% improvement in Rosetta... Have you actually compared with a K8-based system?


--
Mats



Are you saying that Rosetta work load are fitting into 1Mb ? i don t think so

who?


At least sufficently to not notice any delays [assuming you mean 1MB (Mb = Megabit, MB = Megabyte in my way of writing things].

I tried oprofile for "BU_FILL", which is essentially misses in both L1 and L2 caches, and I got around 10 interrupts per second on that [so about 13M fills per second, one fill = 64byte => 800MB/s - which is about 10% of the bus capacity.

If we get around 10% usage on the bus (per core), I don't see how it can be improved by 10% - that's an "infinite" improvement, which are usually not achievable in real world scenarios.

--
Mats



I am very impressed, it sound like Hyperthrans has a compression algorythm or something, because it uses less memory space than any other processor i used.

you probably want to review your figures again, your measurement tool is broken, or you forgot to count the prefected cache lines. (Most of the measurement tool exclude them, AMD tool does at least ...)

now, hypertrans is magic ... cool :) that explain why Voodoo like it so much before ...

who?


Ehm, first of all, did I even mention anything about Hypertransport in my post? Not so. If the memory management works right (Linux does), the process should allocate memory from the local processor, which means that hypertransport doesn't come into the question. Memory controller is the local memory controller, so hypertransport shouldn't get involved for that. [Yes, there's obviously snoop messages for each cache-line, but that should be fairly short compared to a cache-line]. Of course, there is indeed no guarantee that the process is kept in the same processor, in which case cross-processor traffic starts to affect things.

I did not find any (different) way to measure the actual memory transfers, so you may be correct that there are prefetches there...

Just out of curiosity, what sort of memory transfers are you seeing on Rosetta (in MB/s for example)?

By the way, the working set where Rosetta is most active is an array of 300 x 6 x 4 bytes. It does reference a whole bunch of other variables in the function, but that's by far the largest one. 300 x 6 x 4 is MUCH smaller than 1MB.

There is lots of other data, but it's very rarely accessed in general.

--
Mats


So, I guess, the 155Megs around it are there for fun (yep ... the executable allocate 140 to 150Megs ...)

i see a different story than yours, I am in the process of getting the source code, I ll let you know later.


who?


Yes, my memory usage is in the 100-150 MB range too. But a lot of that is rarely touched. Just the binary uses 18MB of static memory, and there's surely a bunch of dynamically allocated data as well. But as I say, much of that is never touched. When I say the most active area, I measured about 60% of the overall time spent there. There's several other functions that take a fair amount of cycles, but they aren't really spending much time on memory fetches...

--
Mats
ID: 32633 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32646 - Posted: 14 Dec 2006, 18:54:47 UTC - in response to Message 32633.  
Last modified: 14 Dec 2006, 19:40:24 UTC



Yes, my memory usage is in the 100-150 MB range too. But a lot of that is rarely touched. Just the binary uses 18MB of static memory, and there's surely a bunch of dynamically allocated data as well. But as I say, much of that is never touched. When I say the most active area, I measured about 60% of the overall time spent there. There's several other functions that take a fair amount of cycles, but they aren't really spending much time on memory fetches...

--
Mats


in your opinion,what do you think make the K8 so slow on Rosetta?
in the mean time, when memory matter "for sure", Hypertransport does not improve anything: on SETI, the 1st AMD system with 4 sockets (and 4 memory controllers) is 79 on the top list.
nobody can argue that seti work load uses more than the cache size ... changing the memory timing of your system changes dramatically the RAC.
conclusion: even when memory matter, Hypertransport is hyper-useless.

Who?

ID: 32646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32694 - Posted: 15 Dec 2006, 12:22:33 UTC - in response to Message 32646.  



Yes, my memory usage is in the 100-150 MB range too. But a lot of that is rarely touched. Just the binary uses 18MB of static memory, and there's surely a bunch of dynamically allocated data as well. But as I say, much of that is never touched. When I say the most active area, I measured about 60% of the overall time spent there. There's several other functions that take a fair amount of cycles, but they aren't really spending much time on memory fetches...

--
Mats


in your opinion,what do you think make the K8 so slow on Rosetta?
in the mean time, when memory matter "for sure", Hypertransport does not improve anything: on SETI, the 1st AMD system with 4 sockets (and 4 memory controllers) is 79 on the top list.
nobody can argue that seti work load uses more than the cache size ... changing the memory timing of your system changes dramatically the RAC.
conclusion: even when memory matter, Hypertransport is hyper-useless.

Who?



I haven't looked at Seti at all - not really a project that I took much interest in how it works, and I'm not participating there at all any more.

When I compared machines of same speed and architecture with different number of CPU's, the performance per MHz is very similar.

In my view, most of the time is spent waiting for the math processing to finish, so if K8 had a faster math unit, it would help performance. Obviously, there may be other parts that are important too. [And the actual behaviour may vary depending on the actual type of calculations performed, as some types of workunits perform different tasks and run different bits of code].

Of course, if you change the architecture, there's no doubt that a different architecture will be different in performance (assuming it's not just a simple straight copy of an older architecture). There's no denying that the new Core2 technology is good - I have never said otherwise. Compared to my 6.3 +/- 5% credits average per MHz per hour, your quad core is getting around 7.1 credit per MHz per hour (assuming it's running at 2.66GHz). The dual core also close as it hits 7.0 assuming that you're still running 4.0GHz.

That's a good 10% per GHz better performance.

--
Mats


ID: 32694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32731 - Posted: 16 Dec 2006, 1:28:53 UTC - in response to Message 32694.  

yep ... and my machine just passed RAC=2800 ... without Hypertransport ;-) :-P

who?

ID: 32731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 32732 - Posted: 16 Dec 2006, 3:01:42 UTC - in response to Message 32731.  

Well... with or without HTT, glad you're on Team Rosetta!

yep ... and my machine just passed RAC=2800 ... without Hypertransport ;-) :-P

who?

ID: 32732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 25
Message 32757 - Posted: 16 Dec 2006, 16:15:19 UTC

@who?: When are you gonna try out your 8-way monster on SETI? I am very interested in seeing what it will do with an optimized app. Speaking of optimized app....any news on the one you were working on?
Reno, NV
Team: SETI.USA
ID: 32757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32761 - Posted: 16 Dec 2006, 18:08:26 UTC - in response to Message 32757.  

@who?: When are you gonna try out your 8-way monster on SETI? I am very interested in seeing what it will do with an optimized app. Speaking of optimized app....any news on the one you were working on?


i am done with my optimization for SETI, i am waiting that the library i wrote and use become "officially" public ... You know .... license stuff , to insure I get my cute little license, and transfert the right of using it to berkeley.
(due to my position, i got to do this :( )
Intellectual property is a headach, especially when you want to autorize "anybody" to use it, because some usages are not ok ...
the SSSE3 version is up and running, if you dig, you could figure out that i did test it :d and it is screaming fast.
the SSE4 version for next summmer is ready too. Get ready for an other exciting ride on this one :)

I am affraid to go back to seti , because the inquirer will blast again "an intel guys is looking for aliens", even if i feel comfortable with this, i can understand that investors can see a little crazyness in this, so, I don't want my company to get hit by a tabloid effect phenomena, where people get mislead about my company. my employer is not responsable for my hobbies, but it can t be hurt if i am not careful.
so, i am doing the proper license work, thanks to some co workers for helping me doing the license, even if it is not work related.


who?
You ll get the SSSE3 version around Jan 2007.
ID: 32761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 25
Message 32779 - Posted: 17 Dec 2006, 1:20:07 UTC - in response to Message 32761.  

Thanks for all the info! Are you referring to IntelĀ® Math Kernel Library 9.0? I think that is out now. It's all greek to me, so maybe you are talking about something else.

I understand your PR concerns. I am sure someone will be glad to use your application, once you release it, to show off what a dual quad can really do.
Reno, NV
Team: SETI.USA
ID: 32779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 32781 - Posted: 17 Dec 2006, 1:45:03 UTC - in response to Message 32779.  

Thanks for all the info! Are you referring to IntelĀ® Math Kernel Library 9.0? I think that is out now. It's all greek to me, so maybe you are talking about something else.

I understand your PR concerns. I am sure someone will be glad to use your application, once you release it, to show off what a dual quad can really do.


I was refering to my pattern matching algo in SSSE3 (how SETI match the sub harmonic of the FFT)

who?
ID: 32781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 32825 - Posted: 17 Dec 2006, 21:07:53 UTC

Rosetta needs you more than SETI ;)
ID: 32825 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : QX6700 or XEONs What should you choose for crunching



©2024 University of Washington
https://www.bakerlab.org