Jump to content
12 posts in this topic

Recommended Posts

Today, I’m here to explain the Megahertz Myth and why your processor might not be as fast as it should be.

 

Today, I’m using two different Multi Core processors.

Pentium D 915 @ 2.8Ghz /w 14.0 multiplier

and

Intel Core 2 Duo E4600 @ 2.4Ghz /w 12.0 Multiplier

 

The full details are here:

 

Pentium D 915 Specs

E4600 Specs

 

 

Firstly you may want to watch Megahertz Myth by Steve Jobs first.

 

 

After that you may be left with a few questions.

Namely what is the “Processing Pipeline” in the video. That my friends, is the multiplier.

Most of us are accustomed to overclocking by changing the clock of our processors by changing the bus ratio in our BIOS. The multiplier on the processor depends the speed you get with each mumber.

Since the processors I’m using for this demonstration both processor’s FSB clocks at 800Mhz, I will overclock from 200 bus/core ratio to 266. This should give it an FSB clock of 1066.

On the processor of my mother’s computer, the Q6600, it has a multiplier of 9 and FSB of 1066. Meaning that the bus/core ratio is 266.

 

So, just out of curiosity, here is my little table.

Bus/Core Ratio = FSB

200 800

266 1066

355 1333

 

Back to the article.

 

Here are the base values:

Pentium D 915 @ 3.7GHz /w 1066Mhz FSB

Intel Core 2 Duo E4600 @ 3.2GHz /w 1066 FSB

Because of laziness, I’m going to use xBench to benchmark

 

These are the results to the Pentium D mark.

 

Results 166.06

System Info

Xbench Version 1.3

System Version 10.5.2 (9C7010)

Physical RAM 8192 MB

Model MacPro3,1

Drive Type ST3500820AS ST3500820AS

CPU Test 148.73

GCD Loop 260.33 11.81 Mops/sec

Floating Point Basic 122.89 2.18 Gflop/sec

vecLib FFT 109.23 2.60 Gflop/sec

Floating Point Library 137.55 36.27 Mops/sec

Thread Test 196.08

Computation 166.76 2.66 Mops/sec, 4 threads

Lock Contention 200.24 8.27 Mlocks/sec, 4 threads

 

 

 

These are the results to the Core 2 Duo mark.

 

Results 172.06

System Info

Xbench Version 1.3

System Version 10.5.2 (9C7010)

Physical RAM 8192 MB

Model MacPro3,1

Drive Type ST3500820AS ST3500820AS

CPU Test 148.73

GCD Loop 280.88 14.81 Mops/sec

Floating Point Basic 133.89 3.18 Gflop/sec

vecLib FFT 109.23 3.60 Gflop/sec

Floating Point Library 149.01 25.95 Mops/sec

Thread Test 204.08

Computation 193.76 3.93 Mops/sec, 4 threads

Lock Contention 215.57 9.27 Mlocks/sec, 4 threads

 

As you can see, due to the small difference in multiplier can slow a processor down. The most efficient processor is actually the Q6600. When you overclock its FSB to 1333 it clocks at 3.2GHz.

Today, I'm here to explain the Megahertz Myth and why your processor might not be as fast as it should be.

 

 

So, just out of curiosity, here is my little table.

Bus/Core Ratio = FSB

200 800

266 1066

355 1333

 

That would be.

 

333 1333

 

355 gives a FSB of 1420.

Back to the article.

 

Here are the base values:

Pentium D 915 @ 3.7GHz /w 1066Mhz FSB

Intel Core 2 Duo E4600 @ 3.2GHz /w 1066 FSB

Because of laziness, I'm going to use xBench to benchmark

 

These are the results to the Pentium D mark.

 

Results 166.06

System Info

Xbench Version 1.3

System Version 10.5.2 (9C7010)

Physical RAM 8192 MB

Model MacPro3,1

Drive Type ST3500820AS ST3500820AS

CPU Test 148.73

GCD Loop 260.33 11.81 Mops/sec

Floating Point Basic 122.89 2.18 Gflop/sec

vecLib FFT 109.23 2.60 Gflop/sec

Floating Point Library 137.55 36.27 Mops/sec

Thread Test 196.08

Computation 166.76 2.66 Mops/sec, 4 threads

Lock Contention 200.24 8.27 Mlocks/sec, 4 threads

 

 

 

These are the results to the Core 2 Duo mark.

 

Results 172.06

System Info

Xbench Version 1.3

System Version 10.5.2 (9C7010)

Physical RAM 8192 MB

Model MacPro3,1

Drive Type ST3500820AS ST3500820AS

CPU Test 148.73

GCD Loop 280.88 14.81 Mops/sec

Floating Point Basic 133.89 3.18 Gflop/sec

vecLib FFT 109.23 3.60 Gflop/sec

Floating Point Library 149.01 25.95 Mops/sec

Thread Test 204.08

Computation 193.76 3.93 Mops/sec, 4 threads

Lock Contention 215.57 9.27 Mlocks/sec, 4 threads

 

Here you are comparing apples to oranges comparing chips of different core architectures does not prove anything except that one core is better at the test than the other to get a real measurement you have to compare cores of the same chip family at different speeds to see if increasing the mhz results in faster/better performance.

 

As you can see, due to the small difference in multiplier can slow a processor down. The most efficient processor is actually the Q6600. When you overclock its FSB to 1333 it clocks at 3.2GHz.

 

It would be ~3GHz (9x333=2997MHz) and with a single threaded application an E6320@3GHz will run a test just as fast as a Q6600@3GHz if mutli-threaded the Q6600 wins hands down because of the extra cores..

Here you are comparing apples to oranges comparing chips of different core architectures does not prove anything except that one core is better at the test than the other to get a real measurement you have to compare cores of the same chip family at different speeds to see if increasing the mhz results in faster/better performance.

 

Precisely, it just goes to prove the megahertz myth of the Pentium era

Precisely, it just goes to prove the megahertz myth of the Pentium era

 

Well there is nothing new about computer manufacturers lying to get a marketing advantage its been going on forever and what your talking about here is precisely why AMD switched to their PR ratings on the model number for their chips not the MHz/GHz values to show how much work the chip did in relation to a different core arch.

For the every day computer shopping, it's all about the numbers. Nobody is going to whip out a dictionary and figure out what all this number crunching jargon means. They're going to buy the 4 GHz computer before they look at the 3 GHz. And these are the people who aren't worried about if they're "really" faster. They just see that number and they're happy. I mean, come on. 4 is always better than 3 unless you're talking about criminal charges. :-)

Here you are comparing apples to oranges comparing chips of different core architectures does not prove anything except that one core is better at the test than the other to get a real measurement you have to compare cores of the same chip family at different speeds to see if increasing the mhz results in faster/better performance.

 

I dont think you get the point of my article. My article is about how the number of Gigahertz is not relative to the speed of the processor. My point was that both processors had two physical CPUS and the one with less gigahertz ran faster that the one with more gigahertz. I am comparing different fruit.

description is absolute {censored}

 

what'll happen if all the code and data will reside in processor lvl 1 cache which run on chip speed?

 

accurate prediction of chip architecture real life performance - almost impossible(see, Intel itself errored on this with netburst). at least you should define set of tasks for which you'll try to optimize chip design and you should know bottlenecks of subsystems which will supply data for processor .

MegaHertz Myth is still valid. Core architecture uses 14 pipelines iirc... So the switch to intel for apple still hold water as intel jumped of the netburst route with more and more pipelines to facilitate more and more GHz. What is ugly is Prescott with 31 pipeline stages.

 

But I want to try to explain what is pipeline stages or processing pipelines.

Sorry if my english is making it difficult, just ask if it seems wrong or not understandable.

Because a CPU is a piece of electrical device, it is bound to limits of electrophysics. As you all should know it does mostly only one thing, sets something high or low -- i.e. either 1 or 0. This does not happen instantly, just because it is in a real world chip. But it also has some logic to find why and when it should switch, Because switching from 1 to 0 and 1 does not make anything useful. On the x86 CPU this is done by microops, that is very simple RISC style logic circuits. But wait you think, x86 is a CISC isn't it? Yes it is. The CPU is fed with it's "high level" CISC instructions. And the CPU thinks... great,what do I do with this.

 

Decoding.

The CPU needs to split it down and find out how can I work this out and fastest possible (this is of course already predesigned from factory and not done in runtime). So we now have smaller units, which can decode faster. But we still have a lot of units! And we also used time to decode! Yes, but we now have small units, can use very few clock cycles to run.

 

Pipeline.

To do all this the CPU uses pipelines, which is divided in stages. Usually all stages take one clock cycle to complete, so to make a CISC instruction you have to break it down to units that fit into each stage, including decoding and writeback etc. So why not have many pipeline stages, abviously you can do more per cycle if you have more stages?

 

Well, here is some problems,

1. you want to access memory now, luckily it is in the data cache so access speed is minimum, but consider it is in the main memory. What are the access times? Well check your memory data. Anyway it takes lots of cycles, which means full stop in the pipeline. It does not help with having higher frequency as memory speed is still the same. Longer pipeline stage means now that more instructions are on hold.

2. You want to branch in your code. Branching is also an instruction which needs decoding and execution, to be able to keep the pipeline full, the cpu loads what it thinks will be the next instruction in the stream, but oops that was not there we wanted to branch, so whats happened? The pipeline is filled with instructions that we dont want to run. So... we need to clear them out and start with the correctly branched instructions.

3. Sometimes the cpu misses something in the stages, wrong decode or some other error. This is bubbles and halts the pipeline, but it doesnt need a clear.

×
×
  • Create New...