Paul wrote:
> E wrote:
>> http://members.localnet.com/~eddie18...i031508-01.dmp
>>
>
> Now, I don't know if it is really telling the truth or not. I suppose
> if I install debugdiag again, I could find out.
>
> *******
> Filename . . . . . . .Mini031508-01.dmp
> Signature. . . . . . .PAGE
> ValidDump. . . . . . .DUMP
> MajorVersion . . . . .free system
> MinorVersion . . . . .2600
> DirectoryTableBase . .0x00039000
> PfnDataBase. . . . . .0x80566e48
> PsLoadedModuleList . .0x805624a0
> PsActiveProcessHead. .0x80568558
> MachineImageType . . .i386
> NumberProcessors . . .2
> BugCheckCode . . . . .0x1000007f
> BugCheckParameter1 . .0x00000008
> BugCheckParameter2 . .0xf7abfd70
> BugCheckParameter3 . .0x00000000
> BugCheckParameter4 . .0x00000000
>
> ExceptionCode. . . . .0x80000003
> ExceptionFlags . . . .0x00000000
> ExceptionAddress . . .0x00000000
> *******
>
> When I look here -
>
> http://aumha.org/a/stop.htm
>
> I get this -
>
> 0x1000007F: UNEXPECTED_KERNEL_MODE_TRAP_M
>
> and that is why I don't trust the result.
>
> OK, I tried Debugdiag, and this is what was returned.
>
> "DebugDiag failed to locate the PEB (Process Environment Block)
> in Mini031508-01.dmp, and as a result, debug analysis for this
> dump may be incomplete or inaccurate."
>
> So maybe it is an actual kernel problem.
>
> Years and years ago, I used to do a lot of problem debugging
> on proprietary computers we used to build from scratch. My
> experience is, if it doesn't "crash once a day", it isn't
> possible to make good progress on fixing it. So if the
> problem is so infrequent as to be non-reproducible in
> a reasonable interval, then changing the hardware config
> may be the best thing for it. You may not get enough
> crashes, to figure it out.
>
> You can run a copy of Prime95, and this may help you determine
> if the motherboard/CPU/RAM has a load dependent problem. When you
> start this, and it offers to "Join GIMPS?", say No and choose
> the Torture Test instead. When the custom dialog comes up, it
> will offer to test some amount of memory (for me, it wants to
> test 760MB or so). You can turn that number down a bit, if
> you want to do a few other things on the machine at the same
> time as the test is running.
>
> http://www.mersenne.org/gimps/p95v255a.zip
>
> Once the custom dialog is set up, start it running. On a P4
> with Hyperthreading, the program should start two test threads.
>
> For bad memory, or a processor with problems, the program can
> detect an error in 10 seconds. (That is for a system overclocked
> into unstable territory.) It will run for hours on a conservatively
> set up system. The program won't tell you what is broken, but it
> will be something in the CPU-Northbridge-Memory area of the
> motherboard.
>
> The longer it runs error free, the "better" your system is. I've
> run it for 16 hours on my old P3 system. But never waited that
> long on more modern systems. When you're done, there are "stop" and
> "exit" options in the left-most menu.
>
> You can also run something like memtest86+ from memtest.org, but
> considering the infrequent errors, memtest86+ is too meek to
> really kick the wheels off the computer. Prime95 does a better
> job of that.
>
> Sometimes, when memory has a low measurable error rate, a
> little extra Vdimm in the BIOS can improve the error performance
> of the memory. (I use 2.7V on my 2.5V DDR memory, and any memory
> should be able to take that much. Winbond BH-5, if memory
> serves, could take 3.3V applied to it, to give an extreme
> example of voltage boost. But they don't make that memory
> any more.)
>
> Alternately, you can adjust the timings in the BIOS, and
> loosen them a bit. Say you had 2.5-3-3-6 memory, you could
> try 3-4-4-8 and see what happens. If you made such an adjustment
> in the BIOS, your first test would be to use memtest86+, as
> a quick check that you didn't mis-adjust anything. You don't
> want to boot Windows, if memory is messed up badly.
>
> Otherwise, your instinct, of removing the add-in cards and
> simplifying the setup, may be a next step. But as long as
> the crash rate of the box is low, it'll be a bugger to find.
>
> You can also do a quick visual check for bad caps (bulging
> tops or leaking electrolyte from the aluminum cylinders),
> but that isn't likely to be the problem. But since a
> visual check is fast and cheap, it is worth a look.
>
> http://www.badcaps.net/images/caps/kt7/image004.png
>
Thanks for spending some time on this. I am now reading
http://aumha.org/a/stop.htm as well as copying and pasting some of the
hex numbers into google. Of all the systems I've built from scratch
(about 4) and worked on, I've never had one that had these sparse yet
persistent blue screen stop errors. So I've never had to try any of this
type of debugging.
I'm hoping to find that one of the addresses listed in the debug info
you posted for me will be just one in a range of addresses for a piece
of hardware. BugCheckParameter2 . .0xf7abfd70 might have some
significants. Also 0x1000007F: UNEXPECTED_KERNEL_MODE_TRAP_M is worth
looking into.
I have also downloaded this free GUI application from MS...
http://www.microsoft.com/whdc/devtoo...allx86.mspx#E3
In addition to the three Dell supplied PCI cards, she is still using the
old Dell CRT monitor, and maybe even mouse and keyboard that came with
the Dell system. Maybe one of these is the culprit.
I also may check if there is a BIOS update for the board, remove the
three unneeded PCI cards, and update device drivers for remaining
hardware. Set BIOS to disable memory caching.
The current Windows XP Home install is the original install. I've never
reinstalled the OS. Its only been updated from Windows Update. The
system has never been over clocked, although the board was designed with
this in mind. The BIOS is set up conservatively. CPU came as boxed set
with Intel heat sink/fan, and runs ~33C idol. The hard drive is a
Seagate Barracuda SATA which should be pretty reliable. Visual
inspection was one of the first things I did when I opened it up this
time. The caps all look like they are intact to my eyes and I can't see
where anything is shorting to ground or to other hardware in the case. I
ran memtest before I changed out the system memory the first time, and
it checked out ok. Still I replaced it.
This may just be concept and correlation, but I have found a couple
others with similar problems in web based message boards that have this
same Asus motherboard. But Asus boards are highly regarded and it would
be the last thing that I would suspect.
Thanks again
Eddie