Improving drawing/painting performance

SysTech

Hi All,

I'm writing an app that will end up being cross-platform between at minimum Mac and PC. Because of the desire for cross platform I'm trying to stay away from using PC or Mac specific graphic improves. I'm trying to stay completely within the Qt framework.

The app receives data from and controls a "Software Defined Radio" which is connected over network. The radio is controlled by TCP commands and returns a large amount of data in the form of Vita49 UDP packets.

I'm running into some puzzling performance issues. I realize my questions might be general in nature at this point but I'm trying to understand where the issues come from.

I set up a thread that receives the UDP packets from the radio. This thread gets a packet, decides where to put it and simply adds the packet to a list. It then signals the appropriate object that a new packet of data is available.

The "foreground" version of the app, these signals travel all the way up to the UI widget, that triggers a repaint. The repaint grabs the latest packet from the list, clears the paint rectangle and graphs the data. The lists are mutex protected.

In the "threaded" version the UDP thread signal that new packets arrived goes to worker threads that render new images to be painted to screen. So there is one thread that renders the FFT data, another that renters the bottom part called the waterfall. Once new images are rendered signals are sent to the UI which simply gets the image, and uses drawImage to move it to screen.

The graph is two parts: Top part is a simple line graph that represents the FFT signal values from the radio plotted against frequency in X. The bottom part is what is called a "waterfall" were over time we add lines with are the signal strength to the top of the list and we plot the lines to show sort of a history of the signals. This link will show you what the display looks like:

https://dl.dropboxusercontent.com/u/7578983/panafall.png

Now on to the performance questions:

Threading. The FFT data and waterfall data can arrive pretty fast. The FFT can be up to 30 frames per second that should be drawn. The waterfall can be up to 10 lines added per second. Logically I thought to handle these data rates I should put the rendering of the graphics into threads. So I'd have the UDP thread gathering data, signalling two other threads that render new images to be displayed, then those would send signals to the UI components to simply get the image and paint it. I have this working HOWEVER it is actually significantly slower than drawing in real time. I'm trying to understand why this would be. The UI simply gets an image and does painter.drawImage( 0,0, theImage ); Yet this is much slower than doing all the work in the foreground to draw the graph. Basically in this example the threads begin to stack up images as they are drawing in the background and the UI is simply not able to get them and paint them fast enough. Is painter.drawImage() slow? Is there a better/faster way to move an image from a QImage to the screen?
Signals. I release that signals by default are in "auto" mode and supposedly signals from other threads are asynchronous. Is this true between different threads? In my threaded example I have the UDP thread signaling new data to worker threads that render a new image and signal the UI with new image. As mentioned above this is slower than just letting the foreground draw the image. I'm wondering if there is a threading/signaling issue.
QImage shift. The waterfall data is a historical listing of the signals heard from the radio. Basically each line of pixels in x represents a signal strength. The waterfall is drawn by adding new data to the top and rendering the older data further down. Right now I'm redrawing all of this data which requires going through the data and computing heatmap values on each pixel. I am wondering if I could improve performance in this thread by being able to keep a QImage around that is my master image and when new data comes in I need to be able to shift the data in the kept QImage down one row making the top row of the image available to render the new data into. Is there an easy way to take a QImage and shift the image down one row and paint in new pixels into the top row? The painting part I can do. It is the shifting I'm not sure about.

Ok so I realize this is long, general, and no code is posted. I apologize. I know it is much easier to help people with specific examples. I'm really looking for a couple of pointers at this stage:

Why would using threads to render images and using painter.drawImage be slower than doing all the complex drawing in the foreground?

Could it be that signals between threads are actually causing some issues? (interrupting the thread?)

QImage content shift? What is the best way to do this? Is it simply accessing the data and doing some kind of a memcpy?

Anyway... Thanks! I'm going to keep digging.

kshegunov

@SysTech said:
Hi!

This thread gets a packet, decides where to put it and simply adds the packet to a list.

Why is the list needed?

It then signals the appropriate object that a new packet of data is available.

How is the thread signalling an object, where is the thread object located, where is the receiver located, how is the connection done?

The "foreground" version of the app, these signals travel all the way up to the UI widget, that triggers a repaint.

You should rather schedule regular update() calls (with a timer) instead of reapint()-ing the widget.

So there is one thread that renders the FFT data, another that renters the bottom part called the waterfall.

Is the FFT threaded itself? As far as I understand to get the "waterfall" you need the FFT data, so if the "rendering" workers are also doing the FFT either both should make their wn calculation or one will have to wait for the other.

The UI simply gets an image and does painter.drawImage( 0,0, theImage ); Yet this is much slower than doing all the work in the foreground to draw the graph.

The problem might be completely unrelated to drawImage, although drawing images is not the most efficient way (pixmaps would be proffered, but they are not reentrant).

Is painter.drawImage() slow?

Yes, but only relatively to painting pixmaps for example.

Is there a better/faster way to move an image from a QImage to the screen?

Painting on an offscreen buffer and then manually doing the double-buffering would be one option. However, you should be sure that this is your last resort before starting such implementations.

I release that signals by default are in "auto" mode and supposedly signals from other threads are asynchronous.

By themselves signals are oblivious to threads. The connection between a signal and another signal/slot can be in different modes. Auto means queued for different threads, or direct when sender and receiver are in the same thread.

I'm wondering if there is a threading/signaling issue.

This would be my suspicion, so probably yes.

I am wondering if I could improve performance in this thread by being able to keep a QImage around that is my master image and when new data comes in I need to be able to shift the data in the kept QImage down one row making the top row of the image available to render the new data into. Is there an easy way to take a QImage and shift the image down one row and paint in new pixels into the top row?

I'd rather keep a QPixmap in the GUI thread with the old state, scroll the old data, paint the new data onto the pixmap and finally paint the whole pixmap on the screen.

SysTech

Hello Sir,

Thanks for the reply:

Why is the list needed?

It might not be... My goal was to allow the UDP thread to stuff data in pretty much as quickly as possible and if something delayed the processing of that data at some other point the list would grow but then shrink back down as data was processed.

How is the thread signalling an object, where is the thread object located, where is the receiver located, how is the connection done?

A Qt signal. The objects are probably sitting in the foreground main thread. This is why my question about signals. I'm suspecting:

UDP thread -> signal main thread (object) -> signal main thread paint. But still investigating this.

You should rather schedule regular update() calls (with a timer) instead of reapint()-ing the widget.

Ok... I can certain try this.

Is the FFT threaded itself? As far as I understand to get the "waterfall" you need the FFT data, so if the "rendering" workers are also doing the FFT either both should make their wn calculation or one will have to wait for the other.

No that is not correct. Both the FFT data and waterfall data are pre-computed by the radio and come down in UDP packets.

For the FFT data it is actually already setup to draw. It has been converted to points relative to the size of the widget so it is really as simple as lineto( point1x, point1y, point2x, point2y ) in a loop.

The waterfall data on the other hand has to be interpreted. It comes over as a single line (pixel row) of data but you must interpolate to match frequency with the FFT display.

The idea on the waterfall is that you get a line of data, that gets drawn at the top, pushing the other lines down until you decide to kill the historical data.

The problem might be completely unrelated to drawImage, although drawing images is not the most efficient way (pixmaps would be proffered, but they are not reentrant).

Well I was thinking that I could render all of this stuff in a thread and then tell the foreground that a new image was available and simply "copy" it to screen. While this seems to work it is not as quick as I expected. In fact the threaded FFT seems to fall behind. Again all it has to do is draw points. So in a thread I just draw the points onto an image. I then signal the foreground an image is ready and it grabs and paints that image.

I thought that was going to really help. Turns out it is worse than drawing in the foreground.

Painting on an offscreen buffer and then manually doing the double-buffering would be one option. However, you should be sure that this is your last resort before starting such implementations.

It seemed to me that since Qt4 some buffering was done for you. I remember reading that somewhere. But in effect my threaded rendering is kind of like a double buffer. I mean the thread should be working on a new image while the foreground is painting a new new image (or rather copying an image to screen) But it doesn't seem to be that fast.

I wonder about your comment on scheduling updates. I should give that a try and see if maybe it helps to not be relying on signals so much.

I'd rather keep a QPixmap in the GUI thread with the old state, scroll the old data, paint the new data onto the pixmap and finally paint the whole pixmap on the screen.

Hum... interesting... I'll take a lot at that.

Thanks so much for taking the time to reply!

kshegunov

@SysTech said:
Hi.

It might not be... My goal was to allow the UDP thread to stuff data in pretty much as quickly as possible and if something delayed the processing of that data at some other point the list would grow but then shrink back down as data was processed.

You should already have a queue in the thread! Use that instead. Just emit a signal with each new piece of data and connect that to a slot of an object that resides in a separate thread. I suppose you haven't derived from QThread, or am I wrong?
See here for a decent threading tutorial in case you have.

No that is not correct. Both the FFT data and waterfall data are pre-computed by the radio and come down in UDP packets.

For the FFT data it is actually already setup to draw. It has been converted to points relative to the size of the widget so it is really as simple as lineto( point1x, point1y, point2x, point2y ) in a loop.

The waterfall data on the other hand has to be interpreted. It comes over as a single line (pixel row) of data but you must interpolate to match frequency with the FFT display.

The idea on the waterfall is that you get a line of data, that gets drawn at the top, pushing the other lines down until you decide to kill the historical data.

Well, if the data is almost ready, then you don't need so many threads. You'll have to serialize the painting anyway, as widgets can be painted from the GUI thread only. I'd only thread the communication channel, and do the interpolation there. If the throughput of said communication is not enough (interpolation is heavy and the comm thread lags behind) only then I'd consider making a separate thread for the interpolation only. Once the data is prepared for painting, the painting itself is better done in the main thread.

It seemed to me that since Qt4 some buffering was done for you. I remember reading that somewhere. But in effect my threaded rendering is kind of like a double buffer.

Yes, it is, but you can double-buffer manually. However, as I said this should be a last resort, most of the time it's not really worth the trouble.

I wonder about your comment on scheduling updates.

Update events are compressed, so you're definitely better off using update() instead of forcing repaint()s.

I should give that a try and see if maybe it helps to not be relying on signals so much.

On the contrary, you should only(!) rely on Qt signals and slots. You're working threaded, so it's the best way to both decouple your components and have a thread-safe way of transferring reentrant data.

It's hard to give decent advice without code, but I hope this is somewhat helpful.
Cheers!

SysTech

You should already have a queue in the thread! Use that instead. Just emit a signal with each new piece of data and connect that to a slot of an object that resides in a separate thread. I suppose you haven't derived from QThread, or am I wrong?

Thank you. I have read that. I have a queue in the thread but was trying to see if moving the data out of the thread caused issues. I did not derive. I'm using the worker concept.

Well, if the data is almost ready, then you don't need so many threads.

This was my original design. Since the FFT data is so close to ready to draw, it is only missing grids and labels that I thought I could use UDP thread to gather the data and signals to get it to draw.

This indeed does work very well up to a point where it begins to get a little slow and jerky and the UI starts to get kind of overloaded.

On the contrary, you should only(!) rely on Qt signals and slots. You're working threaded, so it's the best way to both decouple your components and have a thread-safe way of transferring reentrant data.

It's hard to give decent advice without code, but I hope this is somewhat helpful.

It is very helpful and the project is not downsizeable right now to be able to post small bits of code. If I continue to have issues I will boil things down to some examples and see help.

My point about signals is this: I was relying 100% on signals. IE in my original invocation of this thing:

UDP thread got data, sent a signal
GUI thread received signal, draw

That was the simplest form. It works but like I said when the data rate starts to become about 30 fps it starts to smother the UI so it becomes somewhat unresponsive.

I need to try and figure out what is taking the time and for that I need some benchmarking which I don't have installed at the moment.

I'm going to try something this morning: Move all of the data queues back to the UDP thread. So it does one single job for the most part: receive data and push on to queues.

As each data is received I'm emit a signal. Attach those signals to the Widgets that draw and try gathering data and calling "update".

The second thing I'm going to try is to not trigger updates on the signal but rather on a timer as you suggested before. So the widgets would have fairly high-speed timers polling for updates. I want to see if this works better.

Thanks again for the help and conceptual ideas. This is very helpful and at least gets me thinking about options.

kshegunov

Hello,

As each data is received I'm emit a signal. Attach those signals to the Widgets that draw and try gathering data and calling "update".

The second thing I'm going to try is to not trigger updates on the signal but rather on a timer as you suggested before. So the widgets would have fairly high-speed timers polling for updates. I want to see if this works better.

Another thing you could try (and probably would scale/perform much better) is to "request" data for display, instead of pushing the data from the worker threads. It seems your workers are very prolific, and even if you were able to display the data 60+ times per second it wouldn't be really visible/perceptible for the user. So suppose you set a fixed frame rate for display (let's say 30fps). Then you start a timer for that frame rate and on each tick you request data from the workers (you can simply signal the worker object) only then, in the slot handling the "request data" signal the worker sends back (again with a signal) the data to the GUI thread for drawing. This'd mean that you may skip a few frames, but I'm quite sure it'll be unnoticeable. Still, the worker should keep some history of the FFT data (for the "waterfall") and you may draw in the GUI thread a few lines at once, instead of single one, but I think this would work much, much better.

Thanks again for the help and conceptual ideas. This is very helpful and at least gets me thinking about options.

No problem.

SysTech

I have to report that your ideas and concepts really did help.

I will mark the thread as solved. What I did was this:

In my low-level UDP thread I setup mutex protected queues for the different kinds of data.

I then have two processing threads. One that takes the FFT data from the queue in the UDP thread and draws a QImage with it. It pushes the QImage onto a mutex protected image list.

The other thread takes the waterfall data and processes it into a line list. Each time new waterfall data arrives it paints the lines onto a QImage and pushes that into a mutex protected list.

Basically the two processing threads are currently polling the UDP thread for more data to work on. This seems to work great. I have the option of using signals here too but I decided to try the polling method and see what it did. So far it seems rock solid.

I've even increased my polling thread count to 8 all banging on the UDP thread for data and have not seen any issues.

At the UI level I decided to use QTimer with a time of 0 since it states this will run as fast as possible but not block UI ops. In my first invocation the UI widgets with timer(0) calls are polling the thread image lists for new images to paint. If one is there then it grabs it, paints it to screen and removes it.

So far this is working wonders. I realize I'm not using signals at this point but I have an option to re-enable them. What I will say is this is working very very well as it stands.

The FFT frame rate is actually controllable by a command to the radio. So I don't really have to do that work in my GUI. I can just send a command to the radio that I want 10, 15 etc FPS and it just happens.

Likewise the waterfall data rate can be controlled as well. I have the option to limit what the user can select. In my testing on my MacBookPro with a core i7 I can easily support the maximum FPS and line rate on the waterfall for up to four displays.

This causes no delays, no backlogs or anything. For the FFT data I have a routine that keeps about 50 of the last FFT lines and I plot them in a 3D fashion. When I do this with a high frame rate after a while it can get a little behind when I have a bunch of displays working.

Anyway here is a short movie of the process in action with two displays:

https://dl.dropboxusercontent.com/u/7578983/3DPanaFall.mp4

As you can see a short time into the movie if I shift or change parameters I'm not currently killing the FFT history but that will be added.

The movie is showing about 15 fps for the FFT data and the waterfall data is about 50 ms I think between lines but I can't remember for sure.

Anyway this is all based on the help you gave me in thinking about things differently.

Thanks again!

mrjj

wow
That looks cool!

kshegunov

@SysTech
It certainly looks great. And what is even more impressive, it was done with Qt's raster paint engine (if I understood your setup correctly). Good job!

SysTech

@kshegunov said:

@SysTech
It certainly looks great. And what is even more impressive, it was done with Qt's raster paint engine (if I understood your setup correctly). Good job!

I believe that answer would be yes. I'm not doing anything special. In the paint event I get a painter and draw the QImage.

The vertical yellow lines are receivers that can be tuned to a specific frequency. Those are drawn AFTER the QImage is painted.

Not all issues are solved but it does work very well.

Thanks again!

SysTech

@mrjj said:

wow
That looks cool!

Thanks very much! Still working on how I want the UI to look. Those 3D displays are kind of handy for visualizing the radio output but they are right now pretty CPU intensive.

kshegunov

@SysTech

Thanks again!

You're very welcome.

Those 3D displays are kind of handy for visualizing the radio output but they are right now pretty CPU intensive.

If I may insert yet another suggestion here. While I don't believe you'd gain much by using OpenGL painting for the "waterfall" data, I think switching to it for the 3D displays would work better. I nice side effect would be that you can also render OpenGL from different threads, provided the appropriate locking mechanisms are in place. You could, as the most simple test, try using QOpenGLWidget for those FFT displays.

Kind regards.