QtWayland compositor very bad performance
-
Hi all,
I'm testing QtWayland performance on an embedded armhf device with imx6q processor. It is supported SoC (by Qt5) long ago.
Its Qt 5.8.0 and the OS is Debian StretchWhen I run the QtWayland Compositor: examples/qwindow-compositor
$ glmark2-es2-wayland
=======================================================
glmark2 2014.03
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC2000
GL_VERSION: OpenGL ES 3.0 V5.0.11.p8.41671
=======================================================
[build] use-vbo=false: FPS: 119 FrameTime: 8.403 ms
[build] use-vbo=true: FPS: 200 FrameTime: 5.000 ms
<snip-snip-snip>
....
=======================================================
glmark2 Score: 102
=======================================================When I run the QtWayland Compositor: examples/pure-qml
$ glmark2-es2-wayland
tele@stretch-dev:/opt/qt5/examples/qt3d$ glmark2-es2-wayland
=======================================================
glmark2 2014.03
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC2000
GL_VERSION: OpenGL ES 3.0 V5.0.11.p8.41671
=======================================================
[build] use-vbo=false: FPS: 72 FrameTime: 13.889 ms
[build] use-vbo=true: FPS: 111 FrameTime: 9.009 ms
<snip-snip-snip>
....
=======================================================
glmark2 Score: 79
=======================================================When I run the good old Weston compositor:
$ glmark2-es2-wayland
tele@stretch-dev:/opt/qt5/examples/qt3d$ glmark2-es2-wayland
=======================================================
glmark2 2014.03
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC2000
GL_VERSION: OpenGL ES 3.0 V5.0.11.p8.41671
=======================================================
[build] use-vbo=false: FPS: 419 FrameTime: 2.387 ms
[build] use-vbo=true: FPS: 682 FrameTime: 1.466 ms
<snip-snip-snip>
....
=======================================================
glmark2 Score: 253
=======================================================As you can see the performance with the same glmark2-es2-wayland client is :
C++ QtWayland Compositor 102
QML QtWayland Compositor 79
Pure C Weston compositor 253Just for comparison, on X11 the glmark2-es2 score is 194 with this board.
Wayland is supposed to be faster than x11. And Weston proves that, its about 30% faster.
But Qtwayland compositors are much worse, the pure qml compositor actually unusable in this embedded environment.My questions are :
Is this expected ? Can anyone confirm this ? Or should I expect better performance ? Do I need to check settings or something ?
Seemingly everything works good, no error, no warning, even the spinning cubes and animals look the same, its not sluggish (of course 72 fps is just fast enough for the eye)
In my opinion wayland is much more important on embedded, than on desktop platform.
On a desktop PC you can't ask any graphics task on x11 what is slow with a modern nVidia or AMD video-card and with a good SW, they are bloody fast GPU's and the desktop graphics stupid effects are nothing to them. They start to swear only at 4k high FPS 3D games, but those are not written for x11 or wayland. So the faster wayland makes no big difference on PC.
But running on low-end embedded GPU's, wayland could make a big difeerence compared to X11.
In theory. But I experience the opposite and wayland compositor is much worse than X11. At least if its QtWayland.This is a huge disappointment. Will this be any better in Qt5.9 ? I mean the QtWayland Compositor class.
Thanks,
Laci -
Hi and welcome to devnet,
You should rather bring that question to the interest mailing list. You'll find there Qt's developers/maintainers. This forum is more user oriented.
-
I'm finding that my QT applications have very poor performance on top of the weston compiler, even with libEGL connections testing properly and utilizing RHI - setup (time between launching the example "calculator" app, fullscreened, and it showing up on the screen) is about 3 seconds, with another 2-3 seconds before input is accepted. Once touch input is accepted, there is ~1/2 second delay between the touch and response. I have much better performance with EGLFS, but we want to use wayland (so that we can utilize things like waylandvnc). I will report back once I do some testing with sway and/or the QT compiler.
-
I'm finding that my QT applications have very poor performance on top of the weston compiler, even with libEGL connections testing properly and utilizing RHI - setup (time between launching the example "calculator" app, fullscreened, and it showing up on the screen) is about 3 seconds, with another 2-3 seconds before input is accepted. Once touch input is accepted, there is ~1/2 second delay between the touch and response. I have much better performance with EGLFS, but we want to use wayland (so that we can utilize things like waylandvnc). I will report back once I do some testing with sway and/or the QT compiler.
@nbaldy Hi and welcome to devnet,
You do not say what your device/machine specifications are nor the version of Linux, Qt, wayland/weston or the graphics setup of your device.
All these are important piece to factor in when testing performances.
-
Sorry, yes, good points.
# ./sgx_check.sh WSEGL settings [default] WindowSystem=libpvrDRMWSEGL.so DefaultPixelFormat=RGB888 #DefaultPixelFormat=RGB565 ------ ARM CPU information processor : 0 model name : ARMv7 Processor rev 2 (v7l) BogoMIPS : 597.60 Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x3 CPU part : 0xc08 CPU revision : 2 Hardware : Generic OMAP36xx (Flattened Device Tree) Revision : 0000 Serial : 0000000000000000 ------ SGX driver information Version SGX_DDK sgxddk 1.17@4948957 (release) dm37xx_linux System Version String: SGX revision = 125 ------ Framebuffer settings mode "1280x720" geometry 1280 720 1280 720 32 timings 0 0 0 0 0 0 0 accel true rgba 8/16,8/8,8/0,0/0 endmode Frame buffer device information: Name : omapdrmdrmfb Address : (nil) Size : 3686400 Type : PACKED PIXELS Visual : TRUECOLOR XPanStep : 1 YPanStep : 1 YWrapStep : 0 LineLength : 5120 Accelerator : No ------ Rotation settings 0 ------ PVR Module information Module Size Used by pvrsrvkm 393216 2 ------ Boot settings console=ttyO0,115200n8 rootwait=1 rw ubi.mtd=7,512 rootfstype=ubifs root=ubi0:compu-XXXX mtdoops.mtddev=omap2.nand earlyprintk=ttyO0,115200n8 nohlt omapfb.rotate=0 vram=40M omapfb.vram=20M,1:1M,2:1M omapfb.vrfb=y cma=64MB 5 ------ Linux Kernel version Linux compu-XXXX 5.10.168-1-ctx-g991c5ce91e #1 SMP PREEMPT Fri Apr 7 09:34:04 UTC 2023 armv7l GNU/Linux ------ Weston.ini [core] require-input=false idle-timeout=0 gbm-format=xrgb8888 #gbm-format=rgb565 [output] name=DPI-1 [libinput] touchscreen_calibrator=true calibration_helper=/bin/echo [shell] locking=false animation=none panel-position=none close-animation=none startup-animation=none focus-animation=none ------ /etc/profile.d/qt_env.sh #!/bin/sh ### QT Environment Variables ### # export QT_QPA_EVDEV_TOUCHSCREEN_PARAMETERS="rotate=180" export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt export QT_QPA_EGLFS_KMS_CONFIG=/etc/qt6/eglfs_kms_cfg.json #export QT_QPA_EGLFS_INTEGRATION=eglfs_kms export QT_QPA_EGLFS_ALWAYS_SET_MODE=1 export QT_WAYLAND_SHELL_INTEGRATION=xdg-shell # SECCOMP-BPF Sandbox does not work due to unexpected FUTEX_UNLOCK_PI call # from the pthread implementation. Disable this feature temporarily until # those issues are resolved. export QTWEBENGINE_CHROMIUM_FLAGS="--disable-seccomp-filter-sandbox" export QT_QPA_EGLFS_INTEGRATION=none export QSG_RHI_PREFER_SOFTWARE_RENDERER=0 export QT_WIDGETS_RHI_BACKEND=opengl export QT_WIDGETS_HIGHDPI_DOWNSCALE=1 export QT_WIDGETS_RHI=1 export QT_OPENGL_NO_SANITY_CHECK=1 export QT_QPA_PLATFORM="wayland-egl" export QT_WAYLAND_CLIENT_BUFFER_INTEGRATION="linux-dmabuf-unstable-v1" export QT_WAYLAND_HARDWARE_INTEGRATION="linux-dmabuf-unstable-v1" export QT_WAYLAND_SERVER_BUFFER_INTEGRATION="linux-dmabuf-unstable-v1" export QT_WAYLAND_SHELL_INTEGRATION="xdg-shell" export QT_WAYLAND_TEXT_INPUT_PROTOCOL="zwp_text_input_v1" --- Version info: # weston --version weston 10.0.2 nsions string: EGL_EXT_client_extensions EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless GBM platform: MESA: info: Loaded libpvr_dri_support.so EGL API version: 1.4 EGL vendor string: Mesa Project EGL version string: 1.4 EGL client APIs: OpenGL_ES EGL extensions string: EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_yuv_surface EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_IMG_cl_image Configurations: bf lv colorbuffer dp st ms vis cav bi renderable supported id sz l r g b a th cl ns b id eat nd gl es es2 vg surfaces --------------------------------------------------------------------- 0x01 32 0 8 8 8 8 0 0 0 0 0x34325241-- a y y win,pb 0x02 32 0 8 8 8 8 0 0 4 1 0x34325241-- a y y win,pb 0x03 32 0 8 8 8 8 24 8 0 0 0x34325241-- a y y win,pb 0x04 32 0 8 8 8 8 24 8 4 1 0x34325241-- a y y win,pb 0x05 24 0 8 8 8 0 0 0 0 0 0x34325258-- y y y win,pb 0x06 24 0 8 8 8 0 0 0 4 1 0x34325258-- y y y win,pb 0x07 24 0 8 8 8 0 24 8 0 0 0x34325258-- y y y win,pb 0x08 24 0 8 8 8 0 24 8 4 1 0x34325258-- y y y win,pb 0x09 16 0 5 6 5 0 0 0 0 0 0x36314752-- y y y win,pb 0x0a 16 0 5 6 5 0 0 0 4 1 0x36314752-- y y y win,pb 0x0b 16 0 5 6 5 0 24 8 0 0 0x36314752-- y y y win,pb 0x0c 16 0 5 6 5 0 24 8 4 1 0x36314752-- y y y win,pb MESA: info: Unloaded libpvr_dri_support.so Wayland platform: MESA: info: Loaded libpvr_dri_support.so EGL API version: 1.4 EGL vendor string: Mesa Project EGL version string: 1.4 EGL client APIs: OpenGL_ES EGL extensions string: EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_present_opaque EGL_EXT_swap_buffers_with_damage EGL_EXT_yuv_surface EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image_base EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_KHR_swap_buffers_with_damage EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_WL_create_wayland_buffer_from_image EGL_IMG_cl_image Configurations: bf lv colorbuffer dp st ms vis cav bi renderable supported id sz l r g b a th cl ns b id eat nd gl es es2 vg surfaces --------------------------------------------------------------------- 0x01 32 0 8 8 8 8 0 0 0 0 0x00-- a y y win,pb 0x02 32 0 8 8 8 8 0 0 4 1 0x00-- a y y win,pb 0x03 32 0 8 8 8 8 24 8 0 0 0x00-- a y y win,pb 0x04 32 0 8 8 8 8 24 8 4 1 0x00-- a y y win,pb 0x05 24 0 8 8 8 0 0 0 0 0 0x00-- y y y win,pb 0x06 24 0 8 8 8 0 0 0 4 1 0x00-- y y y win,pb 0x07 24 0 8 8 8 0 24 8 0 0 0x00-- y y y win,pb 0x08 24 0 8 8 8 0 24 8 4 1 0x00-- y y y win,pb MESA: info: Unloaded libpvr_dri_support.so --- QT Settings: export QT_QPA_PLATFORM="wayland-egl" export QT_WAYLAND_SHELL_INTEGRATION="xdg-shell" export QT_WIDGETS_RHI=1 export QT_WIDGETS_RHI_BACKEND=opengl
Results (All taken with the PVRTune server running for looking at the results)
- Running (Qt 6.8.3):
rhiwindow on Weston compositor: ~3.5 fps
rhiwindow on QT Fancy compositor: ~3 fps
rhiwindow in sway compositor : 0.5 fps. ~2.3 fps when pvrtune is not running
rhiwindow without compositor, using EGLFS: 35 fps
Note: Looks like GLES2 doesn't connect fully in Sway:
- 00:00:01.840 [wlr] [render/gles2/renderer.c:704] Failed to create GLES2 renderer
calculator (maximized):
-
Weston : ~1/4 second delay in reaction to touch. Quickly pressing a number 5 times takes 12 seconds to resolve all 5 presses (counting from end of the last touch).
-
QT Fancy compositor: ~1/4 s delay in reaction, 5 numbers takes 5 seconds to resolve all presses. Note that maximization of the calculator fails, so this is not as large as the Weston example. Error: Can't configure xdg_toplevel with an invalid size QSize(-1, -1)
-
Sway: Several seconds between press and response.
-
Sway: Run WITHOUT any RHI components (unset the QT_RHI... variables):
- ~1/4 second delay in response, ~1 second to resolve all presses. Not utilizing GPU at all.
- When I turn off PVTune, no noticable delay in either case.
I think this actually indicates to me that there is a problem in my EGL setup with QT, more than the compositor, because in the pure-GPU case of using sway without RHI, we are very fast. However, it should be noted that the weston-simple-egl application gets around 30 fps when fullscreened, and 60fps when about 1/2 size and does utilize the GPU. I will post this information in the PVR forum, as it could be a problem with my PowerVR EGL connection... but it's odd to me that the simple-egl test application in Weston works perfectly well. that QT with EGLFS (no compositor) is substantially faster, and I get ~35 FPS when running rhiwindow and see good GPU utilization.
So it's not QT->EGL, and it's not Weston->EGL, it's QT-><any Compositor>->EGL which has a slowdown, even further in simple applications like the calculator than pure-cpu rendering.
EDIT:
I wrote up a post to debug the EGL/PowerVR side here: https://forums.imgtec.com/t/qt-slow-to-connect-to-pvr-using-weston/4167/2What I noticed while writing it was that there is a big difference in performance between QT having to change the calculator number or not. (Pressing "clear" 5x was at least twice as fast as pressing a number 5x).
- Running (Qt 6.8.3):