On 11/6/18 7:10 PM, John Naggets wrote:
> Thanks to both of you for your detailed information. So as you both do
> not have the intel-microcode package installed it can't be that the
> issue. I do not make use of that package either myself. So what is
> left? Well it looks like I am running on older hardware, at least 5
> years old hardware and who knows if this has some kind of influence.
> It might be interesting to get in touch with the hardware manufacturer
> (DELL?) and ask them if they have other customers with this issue. The
> only problem here is that as soon as you mention Debian they will stop
> listening to you :( If I remember correctly they only take support
> cases for supported commercial Linux distributions which basically
> boils down to RHEL and SLES... Maybe the DELL forums would be a better
> alternative. I would definitely recommend filling a bug issue with
> Debian and maybe even Xen... If you have some kind of stack trace that
> would also be interesting to see.
Hi all.
We also use XEN on Debian Strech here is the info.
Server 1: DELL T330 4 CPU about 2.5 years with Intel(R) Xeon(R) CPU
E3-1220 v5 @ 3.00GHz
Latest XEN package from debian intel-microcode 3.20180807a.1~deb9u1 with
kernel 4.9.110-3+deb9u6, Domu with a mix of strech and jessie with
kernels 3.16.59-1 and 4.9.110-3+deb9u6.
This one is stable.
Server 2. DELL R740 6 months old with Intel(R) Xeon(R) Gold 6132 CPU @
2.60GHz
Latest XEN package from debian intel-microcode (3.20180807a.1~deb9u1)
with kernel 4.9.110-3+deb9u4, Domu with a mix of strech and jessie with
kernels 3.16.59-1 and 4.9.110-3+deb9u6.
This one is stable.
Server 3. LENOVO RD650 about 4 years old with Intel(R) Xeon(R) CPU
E5-2650 v3 @ 2.30GHz with kernel 4.9.110-3+deb9u4
Latest XEN package from debian intel-microcode 3.20180703.2~deb9u1, Domu
with a mix of strech and jessie with kernels 3.16.59-1 and
4.9.110-3+deb9u6 and Centos kernel 4.10.
This one is stable.
On all XEN Dom0 server have we put
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=2048M,max:2048M" and sched-credit to
512 on dom0.
xl sched-credit
Name ID Weight Cap
Domain-0 0 512 0
Best regards Johnny
>
> J.
>
> On Tue, Nov 6, 2018 at 9:37 AM Roalt Zijlstra | webpower
> <***@webpower.nl <mailto:***@webpower.nl>> wrote:
>
> Hi John,
>
> Yes, we are using PV only and we only run Debian Linux on the
> servers. We still have some DomU Jessie servers running with the
> stock kernel. We did update our Dells to the latest firmware so it
> does include more recent intel microcode with that. But on Debian
> we did not yet enable the intel-firmware yet, since we had so much
> instability and so much parameters that could be the culprit, we
> did not want to add another.
> If your server is very busy, I think the chance to have a crash is
> higher. We have seen crashes on our active MySQL databases whereas
> the slave MySQL database server did not crash that quickly,
> however after using the slave MySQL database as primary database
> for a while (because we were debugging the crashed master
> database) it could very well happen that the slave would crash too.
>
> We have done tests with downgrading firmware of Dell (which also
> means using an older intel microcode) but that did not help. So
> having the latest firmware is okay.
> We are now testing a few scenarios:
>
> * one server with an older kernel (4.9.0-4-amd64), with DomU
> 3.16 kernel, which runs for 16 days now
> * one server with the updated -kernel (4.9.0-8-amd64), with
> DomU 3.16 kernel, which runs for 28 days now surprisingly
> * one server with the updated -kernel (4.9.0-8-amd64), and all
> DomUs on the backported 4.9 kernel.
>
> It all doesn't really make much sense. We do have the expectation
> that the older kernel will keep on running and that the 4.9 DomUs
> will help to keep the servers alive.
> We have tested with 4.14 and 4.16 kernels (from backports) but
> that did not make a difference in stability.
>
> Best regards,
>
> [Naam] Roalt Zijlstra
> Teamleader Infra & Deliverability
>
> [Email] ***@webpower.nl
> <mailto:***@webpower.nl>
> [Phone] +31 342 423 262
> [Skype] roalt.zijlstra
> [Phone] https://www.webpower-group.com
> <https://www.webpower-group.com/>
>
>
>
> [Webpower] <https://www.webpower-group.com/>
> Facebook <https://www.facebook.com/webpower.marketingautomation/>
> Twitter <https://twitter.com/webpower> Linkedin
> <https://www.linkedin.com/company/36782/>
>
>
> Barcelona | Barneveld | Beijing | Chengdu | Guangzhou
> Hamburg | Shanghai | Shenzhen | Stockholm
>
>
> <https://webpower.nl/event/kennissessies/?utm_source=GML&utm_medium=EMAIL&utm_campaign=EVENT&utm_term=KNOWLDGS&utm_content=NL>
>
>
>
> Op ma 5 nov. 2018 om 18:24 schreef John Naggets
> <***@gmail.com <mailto:***@gmail.com>>:
>
> It could be as you mention... your domU are they PV? I am
> using paravirtualization exclusively and on this specific
> server have the following CPU:
>
> Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
>
> Do you have the intel-microcode Debian package from the
> non-free repo installed on your servers? I currently don't...
>
> J.
>
>
> On Mon, Nov 5, 2018 at 3:04 PM Roalt Zijlstra | webpower
> <***@webpower.nl
> <mailto:***@webpower.nl>> wrote:
>
> Hi John,
>
> It could very well be that it is also restricted to some
> CPUs, but I am inclinded to believe that the used DomU
> kernels can influence stability. We did have a pretty
> busy SSL offloader running on a 3.16 kernel, which might
> have caused the crashes.
>
> Just for reference, we have the following two CPUs causing
> us trouble, but I am not sure if it matters.
> Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
> Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
>
> Roalt
>
>
> Op ma 5 nov. 2018 om 10:45 schreef John Naggets
> <***@gmail.com <mailto:***@gmail.com>>:
>
> Hi,
>
> Thanks for your feedback. I was wondering because I
> have just upgraded a Debian 9 server to the latest
> kernel with the latest Xen packages from the official
> Debian repo. The only difference is that I have an
> older IBM server which is already ~7 years old patched
> with the latest BIOS/UEFI and so far so good no crash.
> The uptime is 6 days for now. Here are the details
> about my kernel and xen packages.
>
> ii xen-hypervisor-4.8-amd64
> 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 amd64
> Xen Hypervisor on AMD64
> ii linux-image-4.9.0-8-amd64 4.9.110-3+deb9u6
> amd64 Linux 4.9 for 64-bit PCs
>
> Regards,
> J.
>
>
> On Fri, Nov 2, 2018 at 7:57 PM Volker Janzen
> <***@janzen.onl> wrote:
>
> Hi John,
>
> the problem is that I cannot provide any metrics
> or logfiles showing an error. I can only tell that
> dom0 is rebooting for a reason that is not logged.
> I have no physical access to the server. I got one
> other report about this kind of issue.
>
> My assumption the cause are the backported patches
> is based on the current 16 day uptime. 16 days ago
> the server rebooted every 3-5 days. It won’t be a
> useful bug report from my point of view.
>
> The other thing is that my two servers are now
> running upstream Xen and kernel and I might not go
> back to both old versions in Debian stretch. The
> other server had always running upstream versions
> and had never a problem, that’s why I updated the
> other, too.
>
>
> Best regards
> Volker
>
>
> Am 02.11.2018 um 17:23 schrieb John Naggets
> <***@gmail.com
> <mailto:***@gmail.com>>:
>
>> I was wondering if any of you guys reported this
>> bug/issue/problem back to the Debian community?
>> For example on their bugs.debian org web site?
>>
>> On Thu, Nov 1, 2018 at 1:47 PM Volker Janzen
>> <***@janzen.onl <mailto:***@janzen.onl>> wrote:
>>
>> Hi,
>>
>> I had these crash problems with the Xen
>> version in Debian stretch, too. After 3 to 7
>> days the Xen server rebooted without log
>> entry or something else to observe. The
>> problems started when the first patches were
>> applied by Debian. Some updates made it
>> better, the last worse again. I checked hard
>> drives, RAM and closely monitored metrics
>> what might be the cause.
>>
>> My solution after no longer suspecting a
>> hardware fault: build upstream Xen 4.11 for
>> Debian stretch. I am currently running this
>> setup with my own build of kernel 4.19. The
>> machines are now working stable again.
>>
>>
>> Volker
>>
>>
>> Am 29.10.2018 um 13:13 schrieb Roalt Zijlstra
>> | webpower <***@webpower.nl
>> <mailto:***@webpower.nl>>:
>>
>>> Hi there,
>>>
>>> Ever since all the Meltdown and Spectre
>>> kernel updates and possibly also Xen 4.8
>>> updates, we experience crashes of the Dom0
>>> just out of the blue. Sometimes after 1 day,
>>> sometimes after a few days or even 14 days,
>>> completely random.
>>>
>>> We have two Dell P730 servers and two Dell
>>> P720 servers with this behaviour. One thing
>>> is that we updated these machine to the
>>> latest available firmware, because that is
>>> the most secure way. Then we installed
>>> Debian Stretch with Xen 4.8 support
>>>
>>> We have done serveral installs and 4 servers
>>> seem to crash pretty fast and other don't.
>>> In the end we think that we can lead it back
>>> to the xen-4.8.4-pre version being stable
>>> and the xen-4.8.5-pre being unstable. This
>>> was kinda independent of the kernel that we
>>> were using 4.14 or 4.9.0-8-amd64. This is
>>> off course all Debian package numbering.
>>>
>>> As last resort we updated on one server all
>>> DomU kernels of our Jessie servers on this
>>> Dom0 to 4.9.0 from backports instead of the
>>> 3.16 kernel. For now that seems to work, but
>>> the crashes are random so it could happen
>>> any time again. The idea is that these
>>> kernels are completely spectre& meltdown
>>> unaware and might cause trouble in Xen
>>> kernel support. I am not sure if this is
>>> true at all, but we are pretty lost what the
>>> actual cause is.
>>>
>>> We also tested with CentOS and we also had
>>> these crashes there with certain
>>> combinations of kernel/Xen. The most recent
>>> updates seem to be more stable tough. The
>>> most frustrating part is the there is
>>> absolutely no logs to be found. No kernel
>>> oops or what.. the server just resets and
>>> boots again.
>>>
>>> Are there others experiencing problems like
>>> this? Do you see more frequent server/kernel
>>> crashes on production servers?
>>>
>>> Best regards,
>>>
>>> Roalt Zijlstra
>>>
>>> _______________________________________________
>>> Xen-users mailing list
>>> Xen-***@lists.xenproject.org
>>> <mailto:Xen-***@lists.xenproject.org>
>>> https://lists.xenproject.org/mailman/listinfo/xen-users
>> _______________________________________________
>> Xen-users mailing list
>> Xen-***@lists.xenproject.org
>> <mailto:Xen-***@lists.xenproject.org>
>> https://lists.xenproject.org/mailman/listinfo/xen-users
>>
> _______________________________________________
> Xen-users mailing list
> Xen-***@lists.xenproject.org
> <mailto:Xen-***@lists.xenproject.org>
> https://lists.xenproject.org/mailman/listinfo/xen-users
>
> _______________________________________________
> Xen-users mailing list
> Xen-***@lists.xenproject.org
> <mailto:Xen-***@lists.xenproject.org>
> https://lists.xenproject.org/mailman/listinfo/xen-users
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-***@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-users