On 6/13/2018 6:35 AM, Konstantin Belousov wrote: > Today I noted that AMD published the public errata document for Ryzens, > https://developer.amd.com/wp-content/resources/55449_1.12.pdf > > Some of the issues listed there looks quite relevant to the potential > hangs that some people still experience with the machines. I wrote > a script which should apply the recommended workarounds to the erratas > that I find interesting. > > To run it, kldload cpuctl, then apply the latest firmware update to your > CPU, then run the following shell script. Comments indicate the errata > number for the workarounds. Hi, tl;dr: The Microcode changes seem to fix a hard lockup I was able to reliable reproduce back in Feb. The BIOS on my AMD is pretty up to date. I think it has the same microcode as whats in the ports. x86info -a shows root_at_ryzenbsd11:/home/mdtancsa # x86info -a | grep -i microc Microcode patch level: 0x8001137 root_at_ryzenbsd11:/home/mdtancsa # after running the microcode update and root_at_ryzenbsd11:/home/mdtancsa # /usr/local/etc/rc.d/microcode_update onestart Updating CPU Microcode... Done. root_at_ryzenbsd11:/home/mdtancsa # x86info -a | grep -i microc Microcode patch level: 0x8001137 root_at_ryzenbsd11:/home/mdtancsa # However, the dmesg after the microcode update adds this line AMD Extended Feature Extensions ID EBX=0x1007<CLZERO,IRPerf,XSaveErPtr> CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.36-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX> Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics I ran the script root_at_ryzenbsd11:/home/mdtancsa # cat fix.sh #!/bin/sh # Enable workarounds for erratas listed in # https://developer.amd.com/wp-content/resources/55449_1.12.pdf # 1057, 1109 sysctl machdep.idle_mwait=0 sysctl machdep.idle=hlt for x in /dev/cpuctl*; do # 1021 cpucontrol -m '0xc0011029|=0x2000' $x # 1033 cpucontrol -m '0xc0011020|=0x10' $x # 1049 cpucontrol -m '0xc0011028|=0x10' $x # 1095 cpucontrol -m '0xc0011020|=0x200000000000000' $x echo $x done root_at_ryzenbsd11:/home/mdtancsa # sh ./fix.sh machdep.idle_mwait: 1 -> 0 machdep.idle: acpi -> hlt /dev/cpuctl0 /dev/cpuctl1 /dev/cpuctl10 /dev/cpuctl11 /dev/cpuctl2 /dev/cpuctl3 /dev/cpuctl4 /dev/cpuctl5 /dev/cpuctl6 /dev/cpuctl7 /dev/cpuctl8 /dev/cpuctl9 root_at_ryzenbsd11:/home/mdtancsa # Using a FreeBSD stable from back in Feb, I was able to crash Ryzen and Epyc based systems (https://lists.freebsd.org/pipermail/freebsd-stable/2018-February/088439.html) by generating a lot of traffic between the hypervisor and guests. The same tests on an intel based box ran just fine. e.g. start 3 guests in bhyve (amd64) and run combos of iperf3 between them. It would not take too long, but the box would hard lock-- i.e. blank screen, no crash dump etc. With the latest micro code update, I have been running the same sort of tests and so far so good. I will let them run overnight to see if things are now stable on STABLE. ---Mike > > Please report the results. If the script helps, I will code the kernel > change to apply the workarounds. > > #!/bin/sh > > # Enable workarounds for erratas listed in > # https://developer.amd.com/wp-content/resources/55449_1.12.pdf > > # 1057, 1109 > sysctl machdep.idle_mwait=0 > sysctl machdep.idle=hlt > > for x in /dev/cpuctl*; do > # 1021 > cpucontrol -m '0xc0011029|=0x2000' $x > # 1033 > cpucontrol -m '0xc0011020|=0x10' $x > # 1049 > cpucontrol -m '0xc0011028|=0x10' $x > # 1095 > cpucontrol -m '0xc0011020|=0x200000000000000' $x > done > > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" > > -- ------------------- Mike Tancsa, tel +1 519 651 3400 x203 Sentex Communications, mike_at_sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario CanadaReceived on Wed Jun 13 2018 - 18:41:06 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC