Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QTextDocument and ts_tree_edit
Forum Updated to NodeBB v4.3 + New Features

QTextDocument and ts_tree_edit

Scheduled Pinned Locked Moved Unsolved General and Desktop
7 Posts 2 Posters 145 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E Offline
    E Offline
    eineskamelles
    wrote last edited by
    #1

    I am writing a text editor and I am using Qt for graphics. I use TreeSitter for syntax highlighting and other things. It is fast enough on small and medium sized buffers, but when a buffer is very large, it can slow down due to TreeSitter parsing the entire thing when it changes.

    TreeSitter has a function called ts_tree_edit which one must apply on a TSTree* before using that tree as an old_tree parameter when parsing. This makes parsing much faster, but the function ts_tree_edit requires the position in point and bytes, and it also requires the old and new end positions in points and byte offset.
    QTextDocument has a contentsChange signal, which has the following signature:

    void QTextDocument::contentsChange(int position, int charsRemoved, int charsAdded)
    

    The problem is that when this signal fires the contents are already changed so I have no way of getting the old point, since a newline character could be deleted and I would have no way of knowing. Similarly I couldn't get the old end offset in bytes, because TreeSitter expects utf8 encoded string where some characters take more bytes than others, yet the callback only gives me the number of characters removed.

    So does anyone know how I can use this TreeSitter functionality with QTextDocument? I haven't found anything about it online, which is why I am making this post. One thing I was considering is using the undo/redo functionality for this, but I am not sure if this will be efficient.

    Thank you in advance for your help!

    I 1 Reply Last reply
    0
    • E eineskamelles

      I am writing a text editor and I am using Qt for graphics. I use TreeSitter for syntax highlighting and other things. It is fast enough on small and medium sized buffers, but when a buffer is very large, it can slow down due to TreeSitter parsing the entire thing when it changes.

      TreeSitter has a function called ts_tree_edit which one must apply on a TSTree* before using that tree as an old_tree parameter when parsing. This makes parsing much faster, but the function ts_tree_edit requires the position in point and bytes, and it also requires the old and new end positions in points and byte offset.
      QTextDocument has a contentsChange signal, which has the following signature:

      void QTextDocument::contentsChange(int position, int charsRemoved, int charsAdded)
      

      The problem is that when this signal fires the contents are already changed so I have no way of getting the old point, since a newline character could be deleted and I would have no way of knowing. Similarly I couldn't get the old end offset in bytes, because TreeSitter expects utf8 encoded string where some characters take more bytes than others, yet the callback only gives me the number of characters removed.

      So does anyone know how I can use this TreeSitter functionality with QTextDocument? I haven't found anything about it online, which is why I am making this post. One thing I was considering is using the undo/redo functionality for this, but I am not sure if this will be efficient.

      Thank you in advance for your help!

      I Offline
      I Offline
      IgKh
      wrote last edited by
      #2

      @eineskamelles Welcome to the forum.

      I've never worked with tree-sitter, so can't really help you with anything specific with its' API. I can give you some tips on the Qt side of things.

      The contentsChange signal is absolutely the correct way to go about syncing changes to a QTextDocument with an outside parser, and actually gives you everything you need. The key is to interpret its' parameters as saying that the text between UTF-16 offset range [position, position + charsRemoved] in the document before the change, should be changed to contain the text which is in the UTF-16 offset range [position, position + charsAdded] as it is now in the document after the change.

      Regarding turning the UTF-16 offsets to UTF-8 offsets. If you are sure that Tree-Sitter absolutely requires the edited range start/end byte offsets to be according to the UTF-8 encoding, Qt doesn't help you there. You need to maintain a global index to map between UTF-16 and UTF-8 positions in your document, and maintain it manually after edits (possibly within the same slot) - sorry. You can improve the performance by keeping for each text block its' UTF-8 start and end offsets. Though a cursory Google search suggests that Tree-Sitter natively supports UTF-16?

      @eineskamelles said in QTextDocument and ts_tree_edit:

      One thing I was considering is using the undo/redo functionality for this, but I am not sure if this will be efficient.

      Don't bother, QTextDocument doesn't provide access to its' undo stacks, and even if it did - the actual QUndoCommand implementation that actually contains the change data is not exported at all (not even as private API).

      E 1 Reply Last reply
      1
      • I IgKh

        @eineskamelles Welcome to the forum.

        I've never worked with tree-sitter, so can't really help you with anything specific with its' API. I can give you some tips on the Qt side of things.

        The contentsChange signal is absolutely the correct way to go about syncing changes to a QTextDocument with an outside parser, and actually gives you everything you need. The key is to interpret its' parameters as saying that the text between UTF-16 offset range [position, position + charsRemoved] in the document before the change, should be changed to contain the text which is in the UTF-16 offset range [position, position + charsAdded] as it is now in the document after the change.

        Regarding turning the UTF-16 offsets to UTF-8 offsets. If you are sure that Tree-Sitter absolutely requires the edited range start/end byte offsets to be according to the UTF-8 encoding, Qt doesn't help you there. You need to maintain a global index to map between UTF-16 and UTF-8 positions in your document, and maintain it manually after edits (possibly within the same slot) - sorry. You can improve the performance by keeping for each text block its' UTF-8 start and end offsets. Though a cursory Google search suggests that Tree-Sitter natively supports UTF-16?

        @eineskamelles said in QTextDocument and ts_tree_edit:

        One thing I was considering is using the undo/redo functionality for this, but I am not sure if this will be efficient.

        Don't bother, QTextDocument doesn't provide access to its' undo stacks, and even if it did - the actual QUndoCommand implementation that actually contains the change data is not exported at all (not even as private API).

        E Offline
        E Offline
        eineskamelles
        wrote last edited by
        #3

        @IgKh Thank you for your response.

        I must admit that it hadn't occurred to me to use UTF-16 encoding in tree-sitter. Sorry about that. But I also need to provide the old end point (row, col) and new end point to this function, and I cannot determine the old end point based on those three parameters alone, because in those x characters removed there might be a newline or not.
        Do you have any suggestions for determining the point of position + charsRemoved in the old document? Thanks

        I 1 Reply Last reply
        0
        • E eineskamelles

          @IgKh Thank you for your response.

          I must admit that it hadn't occurred to me to use UTF-16 encoding in tree-sitter. Sorry about that. But I also need to provide the old end point (row, col) and new end point to this function, and I cannot determine the old end point based on those three parameters alone, because in those x characters removed there might be a newline or not.
          Do you have any suggestions for determining the point of position + charsRemoved in the old document? Thanks

          I Offline
          I Offline
          IgKh
          wrote last edited by
          #4

          @eineskamelles said in QTextDocument and ts_tree_edit:

          But I also need to provide the old end point (row, col) and new end point to this function, and I cannot determine the old end point based on those three parameters alone, because in those x characters removed there might be a newline or not.

          Indeed you can't. It is strange why it would need both, but the information needed to obtain the block number and offset within the block of a character position in the document pre-editing is no longer available from the QTextDocument. The recourse here is to maintain a copy yourself... For each QTextBlock number (not the object itself, since it can be destroyed), save your own copy of its' position and length and recreate it after edits.

          E 1 Reply Last reply
          0
          • I IgKh

            @eineskamelles said in QTextDocument and ts_tree_edit:

            But I also need to provide the old end point (row, col) and new end point to this function, and I cannot determine the old end point based on those three parameters alone, because in those x characters removed there might be a newline or not.

            Indeed you can't. It is strange why it would need both, but the information needed to obtain the block number and offset within the block of a character position in the document pre-editing is no longer available from the QTextDocument. The recourse here is to maintain a copy yourself... For each QTextBlock number (not the object itself, since it can be destroyed), save your own copy of its' position and length and recreate it after edits.

            E Offline
            E Offline
            eineskamelles
            wrote last edited by
            #5

            @IgKh I have measured the time it takes for undo() and redo() methods and together it is about 30ms on a 250.000 file so this might be doable to get the points. The function signature says:

            /**
             * Edit the syntax tree to keep it in sync with source code that has been
             * edited.
             *
             * You must describe the edit both in terms of byte offsets and in terms of
             * (row, column) coordinates.
             */
            

            Although according to some github issues, it seems like using bytes only might work too, but it is not guaranteed to work in the future. Also one last question, are you sure that charsAdded and charsRemoved have to be added to position? Because when I create a QTextCursor on the added position it says that the index is out of range. Thanks.

            I 1 Reply Last reply
            0
            • E eineskamelles

              @IgKh I have measured the time it takes for undo() and redo() methods and together it is about 30ms on a 250.000 file so this might be doable to get the points. The function signature says:

              /**
               * Edit the syntax tree to keep it in sync with source code that has been
               * edited.
               *
               * You must describe the edit both in terms of byte offsets and in terms of
               * (row, column) coordinates.
               */
              

              Although according to some github issues, it seems like using bytes only might work too, but it is not guaranteed to work in the future. Also one last question, are you sure that charsAdded and charsRemoved have to be added to position? Because when I create a QTextCursor on the added position it says that the index is out of range. Thanks.

              I Offline
              I Offline
              IgKh
              wrote last edited by
              #6

              @eineskamelles said in QTextDocument and ts_tree_edit:

              Also one last question, are you sure that charsAdded and charsRemoved have to be added to position? Because when I create a QTextCursor on the added position it says that the index is out of range

              Yes, position + charsAdded is a valid cursor position pointing to immediately after the last character inserted as part of the edit. position + charsRemoved is not a valid position in the modified document, of course. If that doesn't work for you please post an example.

              @eineskamelles said in QTextDocument and ts_tree_edit:

              I have measured the time it takes for undo() and redo() methods and together

              I wouldn't do that. The interaction between the undo stack and the contentsChange signal is problematic, and will break in subtle ways, for example if you use edit blocks. You could do such things in response to the undoCommandAdded signal, but that is probably not helpful for your use case.

              E 1 Reply Last reply
              0
              • I IgKh

                @eineskamelles said in QTextDocument and ts_tree_edit:

                Also one last question, are you sure that charsAdded and charsRemoved have to be added to position? Because when I create a QTextCursor on the added position it says that the index is out of range

                Yes, position + charsAdded is a valid cursor position pointing to immediately after the last character inserted as part of the edit. position + charsRemoved is not a valid position in the modified document, of course. If that doesn't work for you please post an example.

                @eineskamelles said in QTextDocument and ts_tree_edit:

                I have measured the time it takes for undo() and redo() methods and together

                I wouldn't do that. The interaction between the undo stack and the contentsChange signal is problematic, and will break in subtle ways, for example if you use edit blocks. You could do such things in response to the undoCommandAdded signal, but that is probably not helpful for your use case.

                E Offline
                E Offline
                eineskamelles
                wrote last edited by
                #7

                @IgKh I had some internal error and that is why it was off. I fixed it now. Thank you for your help.

                1 Reply Last reply
                0

                • Login

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Users
                • Groups
                • Search
                • Get Qt Extensions
                • Unsolved