marki
2018-09-14 11:04:34 UTC
Hi,
We're having trouble with a dd "benchmark". Even though that probably
doesn't mean much since multiple concurrent jobs using a benckmark like
FIO for example work ok, I'd like to understand where the bottleneck is
/ why this behaves differently.
In ESXi it looks like the following and speed is high: (iostat output
below)
(kernel 4.4)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-5 0.00 0.00 0.00 512.00 0.00 512.00
2048.00 142.66 272.65 0.00 272.65 1.95 100.00
sdb 0.00 0.00 0.00 512.00 0.00 512.00
2048.00 141.71 270.89 0.00 270.89 1.95 100.00
# dd if=/dev/zero of=/u01/dd-test-file bs=32k count=250000
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 9.70912 s, 844 MB/s
Now in a Xen DomU running kernel 4.4 it looks like the following and
speed is low / not what we're used to:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-0 0.00 0.00 0.00 100.00 0.00 99.00
2027.52 1.45 14.56 0.00 14.56 10.00 100.00
xvdb 0.00 0.00 0.00 2388.00 0.00 99.44
85.28 11.74 4.92 0.00 4.92 0.42 99.20
# dd if=/dev/zero of=/u01/dd-test-file bs=32k count=250000
1376059392 bytes (1.4 GB, 1.3 GiB) copied, 7.09965 s, 194 MB/s
Note the low queue depth on the LVM device and additionally the low
request size on the virtual disk.
(As in the ESXi VM there's an LVM layer inside the DomU but it doesn't
matter whether it's there or not.)
Inside Dom0 it looks like this:
This is the VHD:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-13 0.00 0.00 0.00 2638.00 0.00 105.72
82.08 11.67 4.42 0.00 4.42 0.36 94.00
This is the SAN:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-0 0.00 2423.00 0.00 216.00 0.00 105.71
1002.26 0.95 4.39 0.00 4.39 4.35 94.00
And these are the individual paths on the SAN (multipathing):
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdg 0.00 0.00 0.00 108.00 0.00 53.09
1006.67 0.50 4.63 0.00 4.63 4.59 49.60
sdl 0.00 0.00 0.00 108.00 0.00 52.62
997.85 0.44 4.04 0.00 4.04 4.04 43.60
The above applies to HV + HVPVM modes using kernel 4.4 in the DomU.
The following applies to a HV or PV DomU running kernel 3.12:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-1 0.00 0.00 41.00 7013.00 0.73 301.16
87.65 142.78 20.44 5.17 20.53 0.14 100.00
xvdb 0.00 0.00 41.00 7023.00 0.73 301.59
87.65 141.80 20.27 5.17 20.36 0.14 100.00
(Which is better but still not great.)
Any explanations on this one?
Thanks
marki
We're having trouble with a dd "benchmark". Even though that probably
doesn't mean much since multiple concurrent jobs using a benckmark like
FIO for example work ok, I'd like to understand where the bottleneck is
/ why this behaves differently.
In ESXi it looks like the following and speed is high: (iostat output
below)
(kernel 4.4)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-5 0.00 0.00 0.00 512.00 0.00 512.00
2048.00 142.66 272.65 0.00 272.65 1.95 100.00
sdb 0.00 0.00 0.00 512.00 0.00 512.00
2048.00 141.71 270.89 0.00 270.89 1.95 100.00
# dd if=/dev/zero of=/u01/dd-test-file bs=32k count=250000
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 9.70912 s, 844 MB/s
Now in a Xen DomU running kernel 4.4 it looks like the following and
speed is low / not what we're used to:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-0 0.00 0.00 0.00 100.00 0.00 99.00
2027.52 1.45 14.56 0.00 14.56 10.00 100.00
xvdb 0.00 0.00 0.00 2388.00 0.00 99.44
85.28 11.74 4.92 0.00 4.92 0.42 99.20
# dd if=/dev/zero of=/u01/dd-test-file bs=32k count=250000
1376059392 bytes (1.4 GB, 1.3 GiB) copied, 7.09965 s, 194 MB/s
Note the low queue depth on the LVM device and additionally the low
request size on the virtual disk.
(As in the ESXi VM there's an LVM layer inside the DomU but it doesn't
matter whether it's there or not.)
Inside Dom0 it looks like this:
This is the VHD:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-13 0.00 0.00 0.00 2638.00 0.00 105.72
82.08 11.67 4.42 0.00 4.42 0.36 94.00
This is the SAN:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-0 0.00 2423.00 0.00 216.00 0.00 105.71
1002.26 0.95 4.39 0.00 4.39 4.35 94.00
And these are the individual paths on the SAN (multipathing):
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdg 0.00 0.00 0.00 108.00 0.00 53.09
1006.67 0.50 4.63 0.00 4.63 4.59 49.60
sdl 0.00 0.00 0.00 108.00 0.00 52.62
997.85 0.44 4.04 0.00 4.04 4.04 43.60
The above applies to HV + HVPVM modes using kernel 4.4 in the DomU.
The following applies to a HV or PV DomU running kernel 3.12:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-1 0.00 0.00 41.00 7013.00 0.73 301.16
87.65 142.78 20.44 5.17 20.53 0.14 100.00
xvdb 0.00 0.00 41.00 7023.00 0.73 301.59
87.65 141.80 20.27 5.17 20.36 0.14 100.00
(Which is better but still not great.)
Any explanations on this one?
Thanks
marki