mike wrote:
> Regarding the trade-off between random & sequential data access for SDR,
> DDR & DDR2 I would like to know if this information that I found at the
> link below is correct (from which I've summarized into this table):
>
> For base clock rate of 200MHz (5ns)
>
> Initial Random Subsequent Sequential
> Data Access Data Access
> -------------- ---------------------
> SDR 5 ns 5 ns
> DDR 10 ns 2.5 ns
> DDR2 20 ns 1.25 ns
>
> The reason I'm asking is that I believe this shows that the best
> performance for an application that has a very high proportion of random
> memory accesses will be achieved with DDR-400 memory as opposed
> to DDR2-800. True? By this logic, SDR-200 would be even better although I
> don't think there ever was such a thing. For the purpose of this
> comparison assume processor speed & cache sizes are the same.
>
>
> Source:
>
> http:/archives.postgresql.org/pgsql-performance/2006-04/msg00601.php
>
> "Note also what happens when transferring the first datum after a lull
> period. For purposes of example, let's pretend that we are talking about a
> base clock rate of 200MHz= 5ns.
>
> The SDR still transfers data every 5ns no matter what. The DDR transfers
> the 1st datum in 10ns and then assuming there are at least 2 sequential
> datums to be transferred will transfer the 2nd and subsequent sequential
> pieces of data every 2.5ns. The DDR2 transfers the 1st datum in 20ns and
> then assuming there are at least 4 sequential datums to be transferred
> will transfer the 2nd and subsequent sequential pieces of data every
> 1.25ns.
>
> Thus we can see that randomly accessing RAM degrades performance
> significantly for DDR and DDR2. We can also see that the conditions for
> optimal RAM performance become more restrictive as we go from SDR to DDR to
> DDR2. The reason DDR2 with a low base clock rate excelled at tasks like
> streaming multimedia and stank at things like small transaction OLTP DB
> applications is now apparent."
>
While shooting from the hip is fun (and Lord knows I've done it enough myself),
a latency analysis requires more than looking at the RAM interface. The
Northbridge or memory interface, also plays a part. There can be latency
in the Northbridge itself, and differences between Northbridges. Northbridges
can run sync or async, and that can add an extra cycle or two to the path.
There are other effects, which make measuring the latency harder. There
is pre-fetching activity on some of the latest Intel hardware, and the
tools used to measure latency have to disable the prefetching, as best
they can, to make a measurement. Pre-fetching could shoot you in the foot,
if the app does nothing but random access.
For SDRAM, yes, someone in fact did make "SDR200". You aren't likely to
find sticks of memory with these chips on them (they might be used in
embedded applications, or perhaps the cache on a hard drive controller
uses them). But these run with a 200MHz clock.
http://www.micron.com/products/partd...T48LC2M32B2P-5
The CAS Latency spec for those, is CAS3. The first cycle latency is
(3 * 5ns) = 15ns.
At DDR400, the best memories were CAS2 (I have some of those).
CAS2 times 5nS gives 10nS, and a little faster first cycle than the best SDRAM.
At DDR2-800, a CAS4 stick has the same latency as a CAS2 DDR400. If
you can find some CAS3 DDR2-800, then again, you are ahead by a little
bit.
On Newegg, I see three products at the DDR2-800 CAS3 level. This is the
cheapest of them. Timings 3-4-4-15 (first digit is CAS and is the most
important).
http://www.newegg.com/Product/Produc...82E16820227190
In the chipset itself, there are subtle differences between single
channel and dual channel operation. Some chipsets handle things differently
than others. For example, there were claims that the first cycle latency
of a P4PE single channel board, was better than a P4P800 dual channel
board run in dual channel mode. There have been review articles
comparing such things, but that would take hours of searches to dig up.
But really, you have to measure this, rather than looking at the memory by
itself.
And comparing Athlon64 with its built in AM2 DDR2 memory controller,
is different than the Intel approach. Since the memory interface
is right on the processor, there is an opportunity to shave off some
latency, after the memory is taken into account. The AMD processor has
a limit, as to how low the latency can be set, which is a slight
impediment.
So this is not really an easy question to answer at all.
The best way to answer it, is to benchmark representative systems,
and pick the winner that way. When enough money is involved, a
vendor will provide loaner systems for a couple days, so you can
test.
Also, this latency analysis (looking at CAS only), ignores the
rest of the memory cycle, and what mode the transaction runs in.
I expect most of the time, the controller is doing a burst. I'm
not even sure any more, whether you can do a single cycle on a
memory subsystem. You may always be paying for a burst transfer
and throwing away the unused bits. Systems now, tend to do things
in cache line sized chunks. So, in terms or "random access
transfers per second", you need to examine how many full memory
transactions fit per second. (This assumes the processor creates
the random requests, faster than the memory subsystem can satisfy
them, and thus you are waiting for the memory to become ready again,
for the next request.)
This document, shows sample timing diagrams. Figure 41 on
PDF page 32, shows a burst write. Notice how the end of the complete
memory transaction, is chewing up as much time as the data transfer.
The inverse of the time period for a complete transfer, determine
how many of these random accesses you can do a second. Most people
are shocked by just how low this number can be.
http://www.hynix.com/datasheet/Timin...m(Rev.0.1).pdf
Another thing to note, is that modern memory controllers do not use
all the features shown in that document. I presume the reason for
this, is chipset designers have done the analysis, and decided which
features are a win, and which ones aren't. So not all the crazy
DDR2 timing diagrams in that document, would be applicable.
Paul