On Fri, 14 Sep 2007 13:37:29 -0000,
for.fun@laposte.net
wrote:
>Hi all,
>
>Last month, my PC began to do strange things:
>
>Programs that were authorized to get through my firewall were suddenly
>considered as new programs and needed to be re-authorized.
>All my DVD burns failed.
>Some of my applications' licenses expired and I was asked to type the
>license again.
>When I copied a file from one disk to another, the file was changed.
>Finally, when I ran an MD5 checking software, the soft gave me a
>different MD5 for the same file each time.
>
>I scanned by computer for virus, trojans but found nothing.
>I checked my memory using MemTest86 but found nothing.
>I swapped my IDE cables and found the IDE1 to IDE0 failed 4 times out
>of 5.
>
>So I bought 2 IDE cables and a new 512 Mo DIMM (I had 256 Mo
>installed)
>Changing the IDE cables did not change anything but replacing the 256
>Mo DIMM but my new 512 Mo DIMM solved all my problems (I plugged the
>new DIMM on the same socket as the old one)
>
>It proved that the 256 Mo DIMM was bad.
Not necessarily. While it is theoretically possible for
memory to fail, presuming this was a system that had been
stable (had you ever checked it for memory errors up until
this point with memtest86+, not a tester that runs on a
large OS consuming a lot of memory?), more likely something
new had happened to cause the memory subsystem instability.
It is important to note whether the memory addresses with
errors are seemingly random, or always (and only) the same
ones over and over again. A program like memtest86+ will
show you this.
If it is always the same addresses and none seem to appear
and disappear from an error state, it is most likely a
physical problem with the memory module. If it is at all
varying in addresses, it is more likely the motherboard has
become instable to a small (perhaps becoming progressively
worse) extent, and when relief is seen by swapping in a
different module, it would tend to be one of two things,
either the memory to slot contact was poor and that is
improved, or the new memory has a larger stability margin
than the old one did. If it is the latter case and the
memory subsystem is progressively getting worse, you may
eventually find the new module is similarly instable.
>
>I do not understand why Windows XP OS did not alert me : I know that
>HD controllers and even RAM include CRC check, parity check and
>probably other security algorithms.
For the OS to do this, it would have to read back everything
written, or you'd need ECC memory.
>Instead of this, my system let me copy and consequenlty corrupt many
>of my files.
>My system was so unstable that I had to installed it again from
>scratch. Moreover, I can not trust anymore the files that are stored
>on my disk.
Agreed, BUT if these files were written a fair amount of
time ago, when the system had not yet exhibited any signs of
instability, the odds are fair that the files are mostly, if
not entirely intact. Only you can know the applications and
importance of minor errors... in some documents it would be
a minor problem, while in others or in applications it could
be more problematic. Until you are certain the system seems
100% stable including passing a 24 hour memtest86+ test, I
suggest that if you need access to the files that you pull
the hard drive out and copy them off onto another media on
another, known stable computer.
Above all, do not defrag your hard drive again until you
have some confidence the system is remaining 100% stable,
AND if it happens that the situation I briefly described
above is true (that the system is slowly becoming less and
less stable on it's memory subsystem and that the memory
module swap is only a temporary improvement) then you will
need to periodically retest the memory. Frankly, on any
critical system this should be a regularly scheduled event,
without ECC memory.
>
>
> => Could you tell me how this could happen ?
>
> => Why CRC/parity did not alert me something was going wrong ?
>
> => Does Windows XP OS implements data controls ?
Windows is definitely not fault tolerant. Remember that
even if it were, it still has to run on hardware that must
be stable for an assurance of the integrity and proper
function of any potential "data controls". Any application
you are running that generates data, can corrupt that data
long before it is even written to the drive.
>
> => Finally, is there a way to strengthen data control under Windows
>XP so I avoid this problem ?
Use ECC memory, periodically check memory subsystem,
periodically check CPU with a stress test like Prime 95's
Torture Test (again needing to run several hours if the
system is important, or a less thorough but faster check
would be to run Prime 95 torture test's "large in place
FFTs" setting. In other words, if the CPU produces errors,
having a stable main memory subsystem won't necessarily
guarantee data integrity.
>
>
>Thanks in advance for your replies.
>
>
>My config is the following one:
>
>OS: Windows XP Pro SP2
>CPU: AMD Athlon, 1400 MHz (10.5 x 133)
>MB: MSI K7T266 Pro (MS-6380) / MS-6380LE (5 PCI, 1 AGP, 1 CNR, 3
>DIMM, Audio)
>RAM: 512 Mo (PC2100 DDR SDRAM)
>GA: ATI Radeon 9550 (RV350)
>BIOS: American Megatrends Inc. v062710 (MS-6380)
>
>IDE HD1: IBM IC35L040AVER07-0 (40 Go, 7200 RPM, Ultra-ATA/100)
>IDE HD2: IBM IC35L060AVV207-0 (60 Go, 7200 RPM, Ultra-ATA/100)
See the following page, on which there is mention of
significant difference in memory stability from use of
different timings based on a resistor on certain board
versions.
http://www.xbitlabs.com/articles/mai...7t266-pro.html
On a related note, relaxing the memory timings in the bios
to higher numbers may improve (regain) stability, and/or
provide a larger stability margin in cases where the
stability is declining over time.
Finally, I can't know about your particular specimen of this
model, but around this era I had an MSI board (also Skt.
462/A) that had a barely stable memory bus, due to MSI
omitting capacitors on the board where there were empty
capacitor positions. I had initially wondered why a _very_
slight overclock had so quickly introduced instablity with
memory that had exhibited it could run quite a bit faster at
same timings on an equivalent different make and model
motherboard. Since I had a fair stock of capacitors from
other board failures/repairs during this era, I decided to
see if adding a couple helped. It did improve stability to
at least several MHz higher, but this was long enough ago
that I don't recall the exact numbers, except that I do
recall the default 133 MHz clocked (DDR266) memory rate was
originally instable a mere 3 MHz higher. At the time I had
not seen the webpage linked above, and IIRC my board was a
revision 2 that was red in color so I'm not even sure the
resistor issue was applicable to this different board.
In summary, since data integrity seems to be of high
importance to you, it may be time to think about replacing
the motherboard with one supporting ECC memory, and of
course some ECC memory. At this late date I would not try
to reuse the processor, it would be better to now upgrade
the entire platform to something modern like Athlon 64
(budget build) or Core2Duo (or quad core).