QTreeView with lots of items is really slow. Can it be optimised or is something buggy?

Cipherstream

@JonB
I tested in windows 11 and ubuntu 24.04 using python 3.12.3 on both and got similar timing figures, ie tableview was basically instant no matter how many items while listview and treeview had dramatically noticable slowdown which scaled with the number of items.

The code where I play around with the header values in the table case didn't change the timing for me either. That was an attempt at working around a separate issue (as mentioned near the top of my code) where tableview will fail to render at all if there are more than 71,582,788 items. I refer to this as the "72m item bug". According to bug info link saved in my sourcecode this is due to 32bit signed integer max value being divded by a row height of 30 which gives the magic value 71,582,788. A bug fix for this has apparently been submitted to the dev branch of Qt so will hopefully make it into the main branch sometime soon. They mentioned working around this bug by doing the "header value code" I included. For me the addition of this code stopped the app from freezing, but resulted in no item values actually being rendered in the table.

For now I do like you suggested and use a tableview instead of a listview for when I am displaying a flat 2D list of data. However I also need to actually display my modelled data as a tree, so I do need to use treeview too. My example code didn't use any hierarchical data in treeview both for simplicity and as an attempt at optimising treeview/model as much as possible to narrow down where the cause of the timing stems from.

Cipherstream

@IgKh

I will add some counters to the models in my python code to see how many times the data access (and other methods) are being called. I did try some python profiling on the code from my real app, which iirc mainly pointed to window.show() taking all the time. I will try profiling the sample code and share any worthwhile results here too.

With regard to the python wrappers potentially being a cause for the large timings, I do expect them to make it slower than native C++ code. However the fact that using tableview with python wrappers is able to handle the 2D model data basically instantly, suggests that such things should be possible for listview and treeview too.

Perhaps listview and treeview have more underlying calls to stuff that cross the barriers between C++ and python code that increase timing in ways that is more negligible when coding in C++ alone. I mean if I was coding this app in C++ and getting differences in timing like you got where table is 0.336 seconds and treeview was 2.216 seconds for 70m items, I'd most likely just accept it as "not being too long to wait" and use it without any further question.

Cipherstream

I ran tests with my sample code changed to only use BigTableModel() for all 3 view types and with counters added to count the number of times the methods I provide are called. My model only has 4 methods in it:
headerData(), rowCount(), columnCount() and data().

Possibly the underlying QAbstractTableModel code has the other methods in python code? I am not sure on how the underlying code is handled whether it is handled in C++ code or python code unless I provide my own implementations of the methods and virtual methods. I guess I could try adding methods to my model for every possible method and virtual method in order to count their usage and to see if adding them adds more slowdown.

This is the result for call counts:

1,000,000 items

time taken to display table with 1000000 items = 0.081 seconds
data() cnt = 84
headerData() cnt = 1680
rowCount() cnt = 18
columnCount() cnt = 17

time taken to display list with 1000000 items = 5.859 seconds
data() cnt = 85
headerData() cnt = 0
rowCount() cnt = 2000062
columnCount() cnt = 2000059

time taken to display tree with 1000000 items = 2.975 seconds
data() cnt = 156
headerData() cnt = 54
rowCount() cnt = 1000039
columnCount() cnt = 1000042


10,000,000 items

time taken to display table with 10000000 items = 0.138 seconds
data() cnt = 84
headerData() cnt = 1680
rowCount() cnt = 18
columnCount() cnt = 17

time taken to display list with 10000000 items = 57.361 seconds
data() cnt = 379
headerData() cnt = 0
rowCount() cnt = 20000474
columnCount() cnt = 20000471

time taken to display tree with 10000000 items = 28.985 seconds
data() cnt = 493
headerData() cnt = 107
rowCount() cnt = 10000204
columnCount() cnt = 10000207

These results show that data() is only being called a sane amount of times, so shouldnt be responsible for the huge amount of time taken. However rowCount() and ColumnCount() are being called an insane amount of times xD

Also their call counts scale linearly with the number of items, just like the time taken. AND the call counts for listview are double the call counts for treeview, just like how listview takes double the time that treeview takes.

This makes a lot of sense. From the developers point of view they would assume that calls to data() could be costly, and so are careful about calls to it. As is seen in the sane amount of 493 calls to data() for 10million items in treeview. However developers would also probably assume that a call to rowCount() and columnCount() would be "cheap" computationally wise, so not worry about how many times it is called. And in fact it probably is fairly cheap when the code is all native C++, as seen in tests done by @IgKh where even 70m items only took around 2 seconds. However when rowCount() and columnCount() have to cross the boundary between C++ and python millions of times, it is no longer a cheap operation.

Perhaps armed with this knowledge the underlying C++ code could be tweaked to either use "cached" values for row and column counts (at least during expensive operations such as setup) or to be mindful of the number of calls performed. I have often seen loops like:

for(int i=0 i<obj->columnCount(); i++)
{
  do_stuff();
}

This could be refactored to only call columnCounbt() once instead of potentially millions of times:

int count = obj->columnCount();
for(int i=0 i<count; i++)
{
  do_stuff();
}

I don't know if this is the case here, but there is a good chance that something like it is occuring.

Cipherstream

When profiling the PySide6 code, it doesn't show a lot of what is going on "under the hood" since the python code is for the most part just wrappers on top of the C++ code. But here is the info I do get from it:

I ran a test using listview with 10 million items.
The test ran for 175 seconds, but 105 of those seconds was the app sitting there once loaded and me not realising it had finished loading yet. So disregard the extra 105 seconds. My debug prints told me that the processing time took 70 seconds, and the profile data backs that up. Note that times are longer than normal processing of 10m items due to the profiling that is being done at the same time. This 70 seconds of processing would normally be done in 5 seconds if it werent also profiling.

Cumulative Time
These are the functions called ordered by "cumulative time", so pay attention to the "cumtime" column. The value in this column shows the time spent in that function and all subfunctions that it calls.

The 105 seconds for "built in method exec" is the time spent where the app sat there once loaded.
You can then see 70 seconds spent in the MainForm.show() method. This is where the excess processing time is occuring, inside this function and whatever subfunctions it calls.
Next up is rowCount() which takes 16 seconds (of the 70 seconds). Internally it calls the 2 of the other entries in this list: isRootIndex() which calls isValid().
The only other call in the list is columnCount() which takes about 5 seconds

I truncated the list there as all entries that followed took less than a second and didn't appear important.

This means that python code for rowCount() and columnCount() combined take up about 20 seconds of the 70 seconds for show. I am not sure if the other 50 seconds is taken up in handling profiling or in the C++ code underneath. It does suggest that at least 20/70 = 28% of that time is taken handling calls to rowCount and columnCount on the python side.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     42/1    0.000    0.000  175.459  175.459 {built-in method builtins.exec}
        1    0.004    0.004  175.459  175.459 ..\bak\BigItemModel2.py:1(<module>)
        1    0.020    0.020  175.327  175.327 ..\bak\BigItemModel2.py:286(main)
        1  105.265  105.265  105.266  105.266 {built-in method exec}
        1   48.765   48.765   70.037   70.037 {method 'show' of 'PySide6.QtWidgets.QWidget' objects}
 20000230   10.507    0.000   16.283    0.000 ..\bak\BigItemModel2.py:118(rowCount)
 20000230    3.338    0.000    5.776    0.000 ..\bak\BigItemModel2.py:102(isRootIndex)
 20000227    4.989    0.000    4.989    0.000 ..\bak\BigItemModel2.py:126(columnCount)
 20000230    2.438    0.000    2.438    0.000 {method 'isValid' of 'PySide6.QtCore.QModelIndex' objects}

Cipherstream

I guess the next step is to look through the C++ code.
@IgKh are you able to run a profiler on your C++ code?

I have started looking through the QListView sourcecode online. I didn't find any simple cause yet for my problem, though I did identify an issue exactly as I suggested might occur earlier in this thread.

The sourcecode
https://code.qt.io/cgit/qt/qtbase.git/tree/src/widgets/itemviews/qlistview.cpp?h=6.8

There is the potential for unneeded calling of rowCount() once for every item in the list when using selectAll(). If you look at line 603 inside "QListViewPrivate::selectAll()" you can see it.

for(; row < model->rowCount(root); ++row) {

JonB

@Cipherstream said in QTreeView with lots of items is really slow. Can it be optimised or is something buggy?:

However when rowCount() and columnCount() have to cross the boundary between C++ and python millions of times, it is no longer a cheap operation.

Well, let's examine this instead of guessing!

As before I am just testing the list case. With my starting code from where I am now:

time taken to display list with 10000000 items = 17.531 seconds

You show rowCount() & columnCount() each being called twice as many times as the 10 million items. So I go to BigListModel.__init__() and append the following at the end of it:

        for i in range(self.max_num_nodes * 2):
            if self.rowCount(parent) < 0 or self.columnCount(parent) < 0:
                print("Whoops!")

Now

time taken to display list with 10000000 items = 27.802 seconds

Well, that's a fair amount of the original time. 10 of the original 17 seconds are being spent just doing these ...count() calls. (It's then not hard to imagine something else taking up a lot of the remaining original 7 seconds.) You hypothesise that "crossing the boundary between C++ and Python" is particularly expensive. So we'd better test without a Qt virtual method. We just copy the definitions of your rowCount() & columnCount() and rename the copies row/columnCount2() and call those instead in the for loop.

time taken to display list with 10000000 items = 27.831 seconds

So no virtual or C++<->Python boundaries to cross yet identical time. The huge overhead is just Python.

Which leaves us with two (white? black?) elephants in the room, neither of which you are going to like:

If you are going to have "millions" of items and care about performance you had better not use Python. It seems it is demonstrably just not up to the job here. Although your point about "wouldn't it be better if the original Qt C++ code cached the value of rowCount() instead of calling the method so often" may be valid you are peeing in the wind if you expect it to be written this way to help out, say, Python, if apparently the original C++ calls are perhaps inline or in any case vastly faster. It's just not going to happen, and would require an enormous rewrite of existing Qt code.
Don't put a model with 10 million items in a UI view. Maybe it happens to be fast enough with a QTableView, for whatever reason, but not with a QListView or QTableView, but it's just "way too many" for something intended to be shown to a user. And for the record, once those 10 million records are stored in a database I think you're going to be spending a lot of time (and memory) reading them all in, on top of the display time. I really expect any application wanting to show this many records to have some sort of "paging" mechanism, and perhaps in the case of the treeview display of nodes without their children initially and creating them on parent node expansion.

Conclusion: Once we discover that something as "insignificant" as the fact that many calls to rowCount() are in the Qt code and that alone is incredibly expensive for Python it becomes unsurprising that there may be a surprisingly large difference between the timings of the 3 types of view due to what may be innocuous differences in their code.

JonB

@Cipherstream
P.S.
I just had a think. You want to display 70 million items. Let's assume each item costs 100 bytes. (Don't know how big your items are, there are all sorts of overheads, and on top of whatever it takes up in the model there must be a further overhead per item to put it in a view. Anyway, I'm taking 100 as my multiplier.)

70 million times 100 is 7 billion. 7 billion bytes is 7GB. That is a lot of memory to use! Just OOI how much space is your Python app using for these rows? If you want 70 million items now, perhaps you'll want 140 million tomorrow, or 700 million...?

Christian Ehrlicher

@Cipherstream said in QTreeView with lots of items is really slow. Can it be optimised or is something buggy?:

for(; row < model->rowCount(root); ++row) {

Nice finding even though it's not that expensive here since root is most likely the invalid root index and therefore the calculation is cheap. But nonetheless it's useless. You might provide a patch - I can approve it.

Cipherstream

@Christian-Ehrlicher
I am new to Qt so I am sure you know more than me. When I looked at the code for selectAll() it looked like it was iterating over all rows on the root. Wouldn't all rows in a listview be on the root? ie if you had a listview with 10 million items, wouldn't all 10 million items be rows "on the root"? Which would mean that rowCount() would get called 10 million times when calling selectAll()?

Cipherstream

@JonB
I'll answer your shorter post first, since it is quicker :)

Yes a display with 70 million items could indeed take up a lot of ram. However due to the model/view paradigm you can potentially look up the data as needed and not need to keep any items in memory.

You might then say "that sounds like a great case for fetch more", however I did initially try using fetch more and as far as I could tell it was an iterative fetcher from index 0 upwards. So to get the last item in 70m items it would still have to fetch all 70m items.

If instead I have a fileformat on disk with a header that says "this file has 70m items", I can just read in the header and know how many items my view needs to handle. The user then scrolls the scroll bar to the location they want to look at and only that data will then be accessed. So they could do ctrl+end to go to the end of the list and the model will only access the items displayed on the last "page" of the view.

This does indeed work well like this. If you let this initial slow setup part finish and then attempt to move through the data, you can see that it only fetches the minimal amount of data needed.

This is for an app similar to wireshark where you may indeed need to have millions of items/packet in the one view. Funnily enough while looking into this issue I came across the developers of wireshark also talking about this problem (as they also use Qt it seems) and arguing that yes they do indeed to handle that much data in a view, however they are not doing it in python hehe.

Cipherstream

@JonB
Your tests are a good idea to look at getting real world timing for these calls. However the point is that I can't think of any reason why rowCount() or columnCount() needs to be called 20m times each if I have 10m items. Whatever code is doing this is imo the code that needs fixing.

Your timing test results suggest that calling them so many times is taking at least 10 of the 17 seconds. Therefore optimising whatever is calling them so many times to only call once and cache the result value would immediately improve the time taken by 10 seconds. A 59% speed increase.

For the elephants in the room :)

Yes python is slower. However the simple fact that tableview is able to handle the processing instantly suggests that python speed in general is not the issue. I agreee that expecting Qt code to be changed just for the sake of python wrapper speed may be a bit of an ask. However I also feel that a model implementation that is making a linear amount of these calls based on the number of items is faulty. The C++ implementation by @IgKh above shows listview and treeview takes 7 times longer than tableview which also suggests something is not right. However since the "7 times longer" value is only 2 seconds it can be ignored.
It seems I answered this in my previous reply to you. I don't agree with the blanket statement that "10 million items in a UI view is too much". Yes it is a lot and most of the time it is not needed. But like a lot of things there are times where it is needed. The "paging" mechanism is exactly what the model/view paradigm is for! And indeed only "pages" of the data are actually accessed at a time when these UI views are in use. It is only during startup that some of these views are (incorrectly imo) doing some kind of processing for every single item.

Yes many calls to rowCount() is expensive. I don't see a reason why that many are done at startup. I hope to track down the cause to see if there is a valid reason :)

Cipherstream

Since it appeared that the cause of the issue was in the C++ Qt code I installed Qt C++ dev env and was able to build and debug the code that @IgKh provided.

I tracked down the cause of the many calls to rowCount and columnCount for the list view. They were coming from line 2600 onwards in:
https://code.qt.io/cgit/qt/qtbase.git/tree/src/widgets/itemviews/qlistview.cpp?h=6.8

The method QListModeViewBase::doStaticLayout() iterates over every row in the model to precalculate offsets for items in the list. This code does an optimisation where if gridSize has been set it will use the grid size for all items instead of calculating it for every single item. I confirmed that this optimisation had an effect by adding the line self.view.setGridSize( QtCore.QSize(18,18) ) to my python code.

The list view already has another existing optimisation for item sizes enabled by doing self.view.setUniformItemSizes(True). This "fixed item size" flag should also be used in this setup code in the same way the fixed size from a grid setting is used.

However if you remember, list view was calling rowCount and columnCount twice for every item in the model. The 2nd calls are done in this same method when checking if the row is hidden. The same loop that iterates over all items in the model also does an "is hidden" check on every item. The "is hidden" check looks to see if each item is present in the list of "hidden rows". An optimisation that can be done for this is to check if that list of hidden rows is empty, in which case no items are hidden and so it doesn't need to individually check whether every single item is hidden.

The first optimisation makes sense since if a "uniform item size" optimisation flag is set, then the code shouldn't check every single item in the list for its size!
The second optimisation might be up for debate as to whether it should be included or not. It does make a drastic change in processing time for python list views with lots of items however.

It turns out that Qt is very easy to build from source on windows, so I was able to make a custom Qt build and test my optimisations with my original python code.

Without either optimisation:
time taken to display list with 70000000 items = 358.775 seconds

With "size" only optimisation:
time taken to display list with 70000000 items = 198.576 seconds

With "size" and "hidden" optimisations:
time taken to display list with 70000000 items = 0.992 seconds

So the python code went from taking 6 minutes preprocessing before showing the view to just under 1 second.

Christian Ehrlicher

Wrt the rowCount calls: https://codereview.qt-project.org/c/qt/qtbase/+/601341

Cipherstream

I looked into optimising the TreeView and unfortunately I can't see any way to optimise it without a major rewrite. :(

I assumed there was an issue with ListView as it did not make sense that a 1D list of items would take longer than the 2D TableView. This assumption turned out to be correct, and I was hoping that upon finding the fix it may also apply to TreeView.

The TreeView slowdown also occurs during its layout function QTreeViewPrivate::layout() where it loops over every row calling model->index() directly and model->rowCount.() via QTreeViewPrivate::hasVisibleChildren(). It is again the huge number of calls to these model functions across the C++/python barrier (due to the model being in python code) that cause the slowdown.

Interestingly while looking into the TreeView code I saw that it did implement an "is hidden" optimisation like I was suggesting for ListView. Based off that I think both ListView optimisations I suggested should be added to the source code. I'll add a post below with the changes.

Cipherstream

@Christian-Ehrlicher

These are the two optimisations that I feel should be made for ListView. Would you be able to submit them for me?
(Feel free to change them as needed)

1) uniform / fixed size optimisation

qtbase\src\widgets\itemviews\qlistview.cpp: line 2561
original:

void QListModeViewBase::doStaticLayout(const QListViewLayoutInfo &info)
{
    const bool useItemSize = !info.grid.isValid();
    const QPoint topLeft = initStaticLayout(info);
    QStyleOptionViewItem option;
    initViewItemOption(&option);
    option.rect = info.bounds;
    option.rect.adjust(info.spacing, info.spacing, -info.spacing, -info.spacing);

    // The static layout data structures are as follows:
    // One vector contains the coordinate in the direction of layout flow.
    // Another vector contains the coordinates of the segments.
    // A third vector contains the index (model row) of the first item
    // of each segment.

    int segStartPosition;
    int segEndPosition;
    int deltaFlowPosition;
    int deltaSegPosition;
    int deltaSegHint;
    int flowPosition;
    int segPosition;

    if (info.flow == QListView::LeftToRight) {
        segStartPosition = info.bounds.left();
        segEndPosition = info.bounds.width();
        flowPosition = topLeft.x();
        segPosition = topLeft.y();
        deltaFlowPosition = info.grid.width(); // dx
        deltaSegPosition = useItemSize ? batchSavedDeltaSeg : info.grid.height(); // dy
        deltaSegHint = info.grid.height();
    } else { // flow == QListView::TopToBottom
        segStartPosition = info.bounds.top();
        segEndPosition = info.bounds.height();
        flowPosition = topLeft.y();
        segPosition = topLeft.x();
        deltaFlowPosition = info.grid.height(); // dy
        deltaSegPosition = useItemSize ? batchSavedDeltaSeg : info.grid.width(); // dx
        deltaSegHint = info.grid.width();
    }

    for (int row = info.first; row <= info.last; ++row) {
        if (isHidden(row)) { // ###
            flowPositions.append(flowPosition);
        } else {
            // if we are not using a grid, we need to find the deltas

optimised

void QListModeViewBase::doStaticLayout(const QListViewLayoutInfo &info)
{
    const bool useItemSize = !info.grid.isValid() && !uniformItemSizes();
    const QPoint topLeft = initStaticLayout(info);
    QStyleOptionViewItem option;
    initViewItemOption(&option);
    option.rect = info.bounds;
    option.rect.adjust(info.spacing, info.spacing, -info.spacing, -info.spacing);
    const QSize uniformSize = (uniformItemSizes() && cachedItemSize().isValid())
                        ? cachedItemSize() : itemSize(option, modelIndex(info.first));
    const QSize fixedSize = info.grid.isValid() ? info.grid : uniformSize;

    // The static layout data structures are as follows:
    // One vector contains the coordinate in the direction of layout flow.
    // Another vector contains the coordinates of the segments.
    // A third vector contains the index (model row) of the first item
    // of each segment.

    int segStartPosition;
    int segEndPosition;
    int deltaFlowPosition;
    int deltaSegPosition;
    int deltaSegHint;
    int flowPosition;
    int segPosition;

    if (info.flow == QListView::LeftToRight) {
        segStartPosition = info.bounds.left();
        segEndPosition = info.bounds.width();
        flowPosition = topLeft.x();
        segPosition = topLeft.y();
        deltaFlowPosition = fixedSize.width(); // dx
        deltaSegPosition = useItemSize ? batchSavedDeltaSeg : fixedSize.height(); // dy
        deltaSegHint = fixedSize.height();
    } else { // flow == QListView::TopToBottom
        segStartPosition = info.bounds.top();
        segEndPosition = info.bounds.height();
        flowPosition = topLeft.y();
        segPosition = topLeft.x();
        deltaFlowPosition = fixedSize.height(); // dy
        deltaSegPosition = useItemSize ? batchSavedDeltaSeg : fixedSize.width(); // dx
        deltaSegHint = fixedSize.width();
    }

    for (int row = info.first; row <= info.last; ++row) {
        if (isHidden(row)) { // ###
            flowPositions.append(flowPosition);
        } else {
            // if we are not using a fixed size, we need to find the deltas

2) is hidden optimisation

qtbase\src\widgets\itemviews\qlistview_p.h: line 358
original:

    inline bool isHidden(int row) const {
        QModelIndex idx = model->index(row, 0, root);
        return isPersistent(idx) && hiddenRows.contains(idx);
    }

optimised:

    inline bool isHidden(int row) const {
        if (hiddenRows.isEmpty())
            return false;
        QModelIndex idx = model->index(row, 0, root);
        return isPersistent(idx) && hiddenRows.contains(idx);
    }

Explanation:

QListModeViewBase::doStaticLayout() is called when the view is initially setup or resized. It loops over every item in the model calculating item layout based off the size and hidden attributes for each item. Currently the view is accessing the model once per item to calculate the size and then a second time per item when checking if it is hidden.

When using python wrappers, calls between C++ code and python code is not as "cheap" as calls between C++ code. This potentially applies to other wrappers for Qt too.

Now look at the case where you have a list view in C++ code accessing a model in python code, and the model has 10 million items. This will do 20 million calls beteen C++ and python code every time QListModeViewBase::doStaticLayout() is called. This creates a huge lag when initially showing or resizing a list view.

Optimisation Recommendations:

QListView already has an optimisation flag for 'uniformItemSizes'. If this is set then it can assume all items are the same size and should not have to get the size individually for each item. This flag and behaviour should be used in QListModeViewBase::doStaticLayout().
QTreeView already has an optimisation for QTreeViewPrivate::isRowHidden() which first checks if the list of hidden rows is empty, in which case there are no hidden rows and so no need to check each individual item to see if they are hidden. This behaviour should be added to QListViewPrivate::isHidden().

Optimisation Test Results:

A listview and model were created with 70 million items.
The time taken by doStaticLayout() to process this before showing it was:

Using python wrappers:

359 seconds before optimisation
199 seconds with "fixed size" optimisation
1 second with "fixed size" and "is hidden" optimisation

In C++ code:

2.2 seconds before optimisation
1.5 seconds after both optimisations

So even the C++ code gets a 30% speed boost.
This was on a pretty fast PC, on a slower device the 30% speed boost in C++ code might be more meaningful.

Cipherstream

While looking through the ListView code I realised that it also had item limits due to INT_MAX size (ie the largest value a signed 32bit integer can hold).

This occurs due to the values calculated and stored in QList<int> QListModeViewBase::flowPositions;. These values are basically item index * row height (for a normal top to bottom list).

The default row height appears to be 18 and the value for INT_MAX is 2147483647. So 2147483647 / 18 = 119,304,647

So the maximum number of items for ListView is just under 120 million items. I tested this and indeed it fails to handle 120m items.

A fix for this might be to use 64bit signed integers like
```QList<int64_t> QListModeViewBase::flowPositions;``, and then check anywhere where flowPositions values are handled to make sure that are treated correctly for 64bit values.

JonB

@Cipherstream
Your work on tracking down and optimising has been amazing. I would not want in any way to detract from that, but may I make a couple of general observations.

You keep pointing the figure at "calls between C++ and python code" as being a big issue. But earlier I found and suggested to you this is not the case? Scroll up to my https://forum.qt.io/post/813335 where I claim timings show there is "no" overhead across the language boundaries. If you do millions of direct, Python calls to some standalone nonVirtualRowCount() function, no C++ involved, I find exactly the same timing.
You were only able to improve (some of) the speed by being prepared to go into C++ and change the source of Qt. That improved C++ by 30%, which is absolutely not to be sniffed at, but Python by a factor of 360x! That indicates to me that Python has a problem, and in general Qt is written for C++. Nobody notices the overhead of calling rowCount() because it's either inlined or so fast as to be insignificant. But if it is indeed the case that calling a simple, tiny Python function is that much slower than C++ (which I did not know and am horrified at the timings) then you may always encounter other places where something innocuous in C++ is turning out to be horrific in Python.

As I say, not in any way to dismiss the case you have examined and the enormous improvement you have achieved by changing the C++ code. Just that there may always be issues from Python if speed is critical.

IgKh

@JonB said in QTreeView with lots of items is really slow. Can it be optimised or is something buggy?:

which I did not know and am horrified at the timings

A kind of poorly kept secret is that CPython (which is pretty much is Python) is considered to be a mere "reference implementation" and its' maintainers put implementation simplicity above performance, or at least that was the case in the past. The implication of this is that CPython is horrendously slow, even for a dynamic scripting language. There is no JIT, every line of bytecode is interpreted as it is executed. There is no inlining, function and method calls are dynamically dispatched by looking for string keys in a series of dictionaries for every call. Absolutely every little thing is allocated on the heap with the full PyObject overhead and has reference counting. FFI calls into Python require acquiring the Global Interpreter Lock, which is not huge overhead on your typical GUI app (as the GIL is normally not contended), but is not free.

This situation led to the fact that most high-performance Python toolkits and libraries are implemented in C, Fortran, C++ or Rust and are exposed via bindings. PyQt / PySide included. For a Wireshark-like application like @Cipherstream is describing, that's probably the approach that will be the path of least resistance too - write the data layer in C++ (the data model, and the custom delegate which you'll probably end up needing) and expose those to Python via Shiboken. Leave to Python the overall construction and management of the UI, where it has the relative advantage.

Cipherstream

@JonB
Yes it seems python really shows its speed limitations once you start calling things 10s of millions of times xD

You are right with your timings showing that python calls by themselves are indeed slow. In my last writeup I am trying to point out that when rowCount etc are called within C++ code, there is minimal cost. So the devs might easily overlook (or not care) about calling them one or more times for every item in the model. It it not until these calls are routed out of the fast C++ code and into the slow python code that it becomes really noticable.

With that said, the fact that there is an optimisation flag for "uniform sizes" means that it must have come up as an issue at some point. So it makes sense for the code to use that same flag to skip variable size processing during layout too. The same goes for the "is hidden" processing where treeview already implements the optimisation I put forward for use in llistview. So it must also have come up as being something worth optimising in the past.

Cipherstream

@IgKh
I have been tossing up how to proceed, since I do need to use a treeview for some parts of my app. I wanted to make the app in python so that it would easily work cross platform.

Stay 100% python and just have slow load times for big files. Smaller files are still very quick.
Have C++ module for bottleneck code (like you suggest), but then once there is some C++ code it makes it less easily cross platform compatible.
Make it fully C++

I am not really happy with any of the options hehe, so progress has stalled for now. Maybe if I went with option 2 but made the C++ code be part of a pip installable module, it would still stay easily cross platform...