forked from mirror/qemu
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
bbd8dd5e45
Used gvec to translate XVTSTDCSP and XVTSTDCDP. xvtstdcsp: rept loop imm master version prev version current version 25 4000 0 0,206200 0,040730 (-80.2%) 0,040740 (-80.2%) 25 4000 1 0,205120 0,053650 (-73.8%) 0,053510 (-73.9%) 25 4000 3 0,206160 0,058630 (-71.6%) 0,058570 (-71.6%) 25 4000 51 0,217110 0,191490 (-11.8%) 0,192320 (-11.4%) 25 4000 127 0,206160 0,191490 (-7.1%) 0,192640 (-6.6%) 8000 12 0 1,234719 0,418833 (-66.1%) 0,386365 (-68.7%) 8000 12 1 1,232417 1,435979 (+16.5%) 1,462792 (+18.7%) 8000 12 3 1,232760 1,766073 (+43.3%) 1,743990 (+41.5%) 8000 12 51 1,239281 1,319562 (+6.5%) 1,423479 (+14.9%) 8000 12 127 1,231708 1,315760 (+6.8%) 1,426667 (+15.8%) xvtstdcdp: rept loop imm master version prev version current version 25 4000 0 0,159930 0,040830 (-74.5%) 0,040610 (-74.6%) 25 4000 1 0,160640 0,053670 (-66.6%) 0,053480 (-66.7%) 25 4000 3 0,160020 0,063030 (-60.6%) 0,062960 (-60.7%) 25 4000 51 0,160410 0,128620 (-19.8%) 0,127470 (-20.5%) 25 4000 127 0,160330 0,127670 (-20.4%) 0,128690 (-19.7%) 8000 12 0 1,190365 0,422146 (-64.5%) 0,388417 (-67.4%) 8000 12 1 1,191292 1,445312 (+21.3%) 1,428698 (+19.9%) 8000 12 3 1,188687 1,980656 (+66.6%) 1,975354 (+66.2%) 8000 12 51 1,191250 1,264500 (+6.1%) 1,355083 (+13.8%) 8000 12 127 1,197313 1,266729 (+5.8%) 1,349156 (+12.7%) Overall, these instructions are the hardest ones to measure performance as the gvec implementation is affected by the immediate. Above there are 5 different scenarios when it comes to immediate and 2 when it comes to rept/loop combination. The immediates scenarios are: all bits are 0 therefore the target register should just be changed to 0, with 1 bit set, with 2 bits set in a combination the new implementation can deal with using gvec, 4 bits set and the new implementation can't deal with it using gvec and all bits set. The rept/loop scenarios are high loop and low rept (so it should spend more time executing it than translating it) and high rept low loop (so it should spend more time translating it than executing this code). These comparisons are between the upstream version, a previous similar implementation and a one with a cleaner code(this one). For a comparison with o previous different implementation: <20221010191356.83659-13-lucas.araujo@eldorado.org.br> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-13-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> |
2 years ago | |
---|---|---|
.. | ||
branch-impl.c.inc | 3 years ago | |
dfp-impl.c.inc | 3 years ago | |
fixedpoint-impl.c.inc | 2 years ago | |
fp-impl.c.inc | 2 years ago | |
fp-ops.c.inc | 2 years ago | |
processor-ctrl-impl.c.inc | 2 years ago | |
spe-impl.c.inc | 4 years ago | |
spe-ops.c.inc | 4 years ago | |
storage-ctrl-impl.c.inc | 2 years ago | |
vmx-impl.c.inc | 2 years ago | |
vmx-ops.c.inc | 2 years ago | |
vsx-impl.c.inc | 2 years ago | |
vsx-ops.c.inc | 2 years ago |