a month ago
@2Cavalier, thank you for that generous offer in time, effort and hardware!
It would be an interesting data point if other identical CPUs had the same problem, especially at stock settings (just to rule out the overclocking factor in the experiment). If only one chip fails, that'd indicate it's probably something in that particular chip. If they all fail, that'd indicate it's likely related to the CPU microarchitecture, or that somehow your other hardware is involved (motherboard etc.). That information could be helpful, but it probably won't immediately point us to a solution.
Still, not knowing your plans for those CPUs, I would hate for you to lose any value on them by opening their boxes for this test.
Regarding something in your previous post, I doubt the GPU matters, but it is possible. There's always the outside chance of some electrical or thermal communication. Also, different GPU and frame capping configurations can alter CPU timing, which can affect the CPU behavior.
a month ago
As more general info, the crash is happening inside a loop to decide what LOD to use for a bunch of models, and whether they should fade out. Looking at @Benef1cient's apex_crash.txt, I see that iteration 883 crashed processing model 96651. Looking at @TEZZ0FIN0's crash, I see that iteration 450 crashed processing model 24032. Looking at @Vaestroo's crash, he made it to iteration 3768 and model 408479.
There's no way through the loop that doesn't hit the instruction that crashes, so we've had hundreds or even thousands of iterations that didn't crash before we hit one that did.
a month ago
Third one today in ~15 games. 1660Ti
cpu: "Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz"
ram: 8 // GB
rax = 1
rbx = 0x0000006DD45BBB80
rcx = 0
rdx = 32
rsp = 0x0000006DD516F890
rbp = 298 // 0x0000012A
rsi = 0x000001DDD88336C0
rdi = 1914 // 0x0000077A
r8 = 0
r9 = 22965 // 0x000059B5
r10 = 447828 // 0x0006D554
r11 = 447828 // 0x0006D554
r12 = 448215 // 0x0006D6D7
r13 = 448215 // 0x0006D6D7
r14 = 1
r15 = 0x000001DB57632410
rip = 0x00007FF61F542DCA
xmm0 = [ [4.0952231e+14, 0, 0, 0], [0x57BA3AAC, 0x00000000, 0x00000000, 0x00000000] ]
xmm1 = [ [1.9501056e+15, 0, 0, 0], [0x58DDB38B, 0x00000000, 0x00000000, 0x00000000] ]
xmm2 = [ [5.7619019, 0, 0, 0], [0x40B86180, 0x00000000, 0x00000000, 0x00000000] ]
xmm3 = [ [2.3596274e+15, 0, 0, 0], [0x59062119, 0x00000000, 0x00000000, 0x00000000] ]
xmm4 = [ [4.2811824e+08, 0, 0, 0], [0x4DCC2487, 0x00000000, 0x00000000, 0x00000000] ]
xmm5 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm6 = [ [4.2811824e+08, 0, 0, 0], [0x4DCC2487, 0x00000000, 0x00000000, 0x00000000] ]
xmm7 = [ [1, 0, 0, 0], [0x3F800000, 0x00000000, 0x00000000, 0x00000000] ]
xmm8 = [ [4284900, 11902500, 23328900, 57608100], [0x4A82C3C8, 0x4B359E24, 0x4BB1FC42, 0x4C5BC1E9] ]
xmm9 = [ [4.2848998e+08, 3.4028235e+38, 3.4028235e+38, 3.4028235e+38], [0x4DCC51E8, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF] ]
xmm10 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm11 = [ [1, 0, 0, 0], [0x3F800000, 0x00000000, 0x00000000, 0x00000000] ]
xmm12 = [ [1, 0, 0, 0], [0x3F800000, 0x00000000, 0x00000000, 0x00000000] ]
xmm13 = [ [1, 0, 0, 0], [0x3F800000, 0x00000000, 0x00000000, 0x00000000] ]
xmm14 = [ [13.8, 0, 0, 0], [0x415CCCCD, 0x00000000, 0x00000000, 0x00000000] ]
xmm15 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
a month ago
Out of curiosity, would anything from Windows Event Viewer help in finding the cause to the crashes? If so, I could start posting the error from the Event Viewer alongside the crash it happened with.
a month ago
Thanks for the additional information @JorPorCorTTV.
It was a good idea to look at the Windows Event Viewer logs. Unfortunately, those logs basically say "I can't tell you where it crashed". I suspect that's because our anti-cheat is interfering with it.
I looked at your latest apex_crash.txt. It's saying the instruction "test r15, r15" was trying to write memory location 5. That's not a memory location we're allowed to write to, so it crashed. But the "test" instruction is asking "what bits are set", not writing memory... and there's no register that has the value 5, so I don't see any reason for it to complain about that address. It also looks like it's just done this instruction 137 times without crashing, since the loop counter is 137 (and starts at 0).
Yet again, it looks like the game is crashing because the CPU did something it wasn't asked to do. =/
I've got to believe there's some explanation for this, but right now it's a big mystery.
a month ago
a month ago
The code I mentioned is actually LOD for the stationary models, not the characters that move around.
I think the bug in your screenshot is something else. We store lighting at various points in space. We use the closest sample point to light a character. Sometimes you're outside on a roof and the closest point is inside near a dark ceiling. We work hard to avoid this in how we automatically place the sample points, but it isn't perfect.
a month ago
I want to thank you GREATLY for the time that you've spent here. It's VERY rare these days for programmers and developers to engage with gamers like you are doing, and this is very appreciated. Back during the glory days of Usenet (before the flight sim and space sim genres crashed), this was a lot more common. 3dfx devs posting in the 3dfx groups, flight sim pilots and programmers posting, and all of this before the Ultima IX Fiasco too...
In my case, all crashes are purely due to overclocking. And yes, I found out that if I ever get a random CTD (no error or log) or exception breakpoint (log), if I keep playing, eventually I will get a "Internal Parity Error" on any one of the 16 threads (8 physical cores), although it can sometimes take hours to happen. Once I increase vcore to a point or lower clockspeed to where these "Internal parity errors" NEVER happen, the game never crashes, ever.
Is it possible that the "Anticheat" is putting extra load on the CPU? Because I've never seen a game which will cause errors like this if there is is any sort of instability anywhere, not even Battlefield 5. Apex Legends is indeed the gold standard for testing CPU stability.
That being said, the people who are crashing at *STOCK* speeds concerns me greatly, as that should NOT be happening. Intel validates their chips to run up to 100C at the max single core turbo boost frequency and the max 4, 6 or 8 core turbo boost for that SKU. They should never crash, even with a power virus like Prime95 small FFT FMA3.
I want to ask EVERYONE here who is crashing at STOCK SPEEDS (NOT OVERCLOCKED):
1) Are you using windows version 1809 or newer?
2) is your CPU a 7700K or newer?
3) Are you using SPECTRE AND MELTDOWN mitigations enabled?
4) Is your Bios updated and your CPU microcode current?
If the answer to all four is YES, can you please do the following?
Please go here, download this:
then disable all protections.
Then run the game at your stock CPU settings and see if you crash anymore.
Please report back. This is important.
There were reports of early mitigation protections causing internal parity errors on several intel SKU's *even at idle*, and It may be possible you guys are running into a microcode bug which is making the game crash
The easiest way to determine if Intel's meltdown protections are making the game crash are to simply disable them.
This has NO effect if you are using a windows version which is not protection aware (like 1709 or older).
tl;dr: microcode bugs can cause strange problems. Intel has *PULLED* early microcodes for Spectre protection in the past because in SOME CASES they were actually *DESTROYING* CPU's--yes--DESTROYING them. (Prema, who is a bios modder over on notebookreview's forums, encountered this when testing early protection microcode--that's why some of you may remember those "rolled back" intel microcode/bios patches and windows updates...