On Mon, Jan 04, 2016 at 03:34:09AM -0700, shahzaibcb wrote: > Hi, > > We've switched to FreeBSD recently to accomodate large video storage as we > are running video streaming website. So the job of the FreeBSD is to > transcode the uploaded videos using ffmpeg and serve them to users via nginx > webserver but so far our experience is not very good with it. It crashes > every 2-3 days and we're unable to track down the problem. The server specs > are pretty high : > > > Supermicro X5690 (12 cores, 24 threads - 2u) > 96GB RAM > 12x3TB RAID-10 (HBA-LSI9211) > > Here is the screenshot of recent crash : > > http://prntscr.com/9er3pk > > One thing worth mentioning is, before going down there's no load on server, > more or less free RAM usually is around 12GB. We've tried following > solutions so far : > > > - Updated FreeBSD OS > - Replaced 800W PS with 900W > - We've reduced CMOS from MAX(26x) to 18x as suggested in this post Do you try to replace CPU? > http://unix.stackexchange.com/questions/60574/determining-cause-of-linux-kernel-panic > > The solution we've not performed so far is : > > - Disable mca using (hw.mca.enabled: 0) - As we're getting MCA panics. > > Here is the crash dump : > > [root_at_cw001 /var/crash]# mcelog --no-dmi --ascii --file core.txt.1 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 3 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 3 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 2 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 2 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 3 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 3 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 2 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 2 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > > ----------------------------------------------------------------------------------- > > I showed those Hardware errors to Vendor from whom we purchased Supermicro > servers . This is what he has to say : > > ----------------------------------- > Why do you not made one test environment with CentOS or one other Linux that > you know to use, and see if you have same errors ??? if not than you know > that the errors come from OS not from hardware. ( CentOS, RedHead….work > diferend like FreeBSD – work direct on hardware if you don’t have the right > kernel settings can the server crashed. CentOS , RedHead…. don’t work direct > on hardware and distribute the resource load better and you have better > control and you can better debug one situation) > ----------------------------------- > > Now we're on a black hole and unable to find that either issue with FreeBSD > or Hardware. We're thinking to disable mca in loader.conf but ppl are not > suggesting it. If you guys can help us, it'd be very kind. > > > > -- > View this message in context: http://freebsd.1045724.n5.nabble.com/FreeBsd-MCA-Panic-Crash-tp6064691.html > Sent from the freebsd-current mailing list archive at Nabble.com. > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"Received on Mon Jan 04 2016 - 19:07:17 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:02 UTC