Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to parse xml with inline (embedded tags)?
QtWS25 Last Chance

How to parse xml with inline (embedded tags)?

Scheduled Pinned Locked Moved Solved General and Desktop
xml parsingxmlqxmlstreamreade
12 Posts 4 Posters 5.4k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S SGaist
    21 Aug 2017, 19:56

    Hi,

    Do you mean you want to get some text link some other text ?

    B Offline
    B Offline
    billconan
    wrote on 21 Aug 2017, 21:15 last edited by
    #3

    @SGaist

    no, I want to parse it as if this was html

    I want to be able to read the above xml and get the tree

    so

    root 
    |-</p>
    |- text block 1: some text
    |- <a>
    |-- inner text of <a>: "link"
    |-</a>
    |- text block 2: some other text
    |- </p>
    

    then re-render it as

    <para>some text <urllink>link</urlink> some other text</para>
    
    1 Reply Last reply
    0
    • S Offline
      S Offline
      SGaist
      Lifetime Qt Champion
      wrote on 21 Aug 2017, 21:28 last edited by
      #4

      So replace p and a with para and urllink ?

      Interested in AI ? www.idiap.ch
      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

      B 1 Reply Last reply 21 Aug 2017, 22:41
      0
      • S SGaist
        21 Aug 2017, 21:28

        So replace p and a with para and urllink ?

        B Offline
        B Offline
        billconan
        wrote on 21 Aug 2017, 22:41 last edited by billconan
        #5

        @SGaist that's very hacky and not a generic solution.

        I used <p> and <para> as an example, in reality, the tags could be anything (I won't be able to know in advance).

        my question is regarding how to properly parse inline tags like above.

        The QXmlStreamReader api doesn't seem to handle this case, which is quite common in html.

        What my program actually does is parsing the xml files generated with doxygen and re-rendering them as html pages.

        1 Reply Last reply
        0
        • B Offline
          B Offline
          billconan
          wrote on 21 Aug 2017, 23:11 last edited by
          #6

          for example, the above case is relatively easy to do with rapidxml

              rapidxml::xml_document<> doc;
              char data[1024] = "<p>some text <a>link</a> some other text</p>";
              doc.parse<0>(data);
          
              qDebug() << "first node" << doc.first_node()->name();
          
              qDebug() << "first node children" << doc.first_node()->first_node()->type();
          
              qDebug() << "content" << doc.first_node()->first_node()->value();
          
              qDebug() << "next" << doc.first_node()->first_node()->next_sibling()->name();
          
              qDebug() << "next next" << doc.first_node()->first_node()->next_sibling()->first_node()->value();
          
              qDebug() << "next next next" << doc.first_node()->first_node()->next_sibling()->next_sibling()->value();
          

          as rapidxml will parse the xml into a tree structure, you can easy query for a tree node's siblings and children.

          whereas Qt's QXmlStreamReader only tokenize the xml file in to a flat token sequence. And it doesn't expect inline tags.

          1 Reply Last reply
          0
          • H Offline
            H Offline
            hskoglund
            wrote on 22 Aug 2017, 01:18 last edited by
            #7

            Hi, Qt has the same xml tree structure parsing stuff in the DOM classes:
            add xml in your.pro file, eg: QT += core gui xml
            #include "QDomDocument"
            then you can use code like this:

            QDomDocument doc;
            doc.setContent(QString("<p>some text <a>link</a> some other text</p>"));
            
            for (auto c1 = doc.documentElement().firstChild(); !c1.isNull(); c1 = c1.nextSibling())
            {
                qDebug() << "level1: " << c1.toText().data();
            
                for (auto c2 = c1.firstChild(); !c2.isNull(); c2 = c2.nextSibling())
                    qDebug() << "       level2: " << c2.toText().data();
            }
            
            B 1 Reply Last reply 22 Aug 2017, 23:12
            3
            • V Offline
              V Offline
              VRonin
              wrote on 22 Aug 2017, 07:35 last edited by
              #8

              https://forum.qt.io/topic/80350/read-xml-file/7 and https://forum.qt.io/topic/81011/blank-out-put-when-i-am-trying-to-parse-the-xml-file-using-qxmlstreamreader-class/11 should help.

              "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
              ~Napoleon Bonaparte

              On a crusade to banish setIndexWidget() from the holy land of Qt

              1 Reply Last reply
              0
              • S Offline
                S Offline
                SGaist
                Lifetime Qt Champion
                wrote on 22 Aug 2017, 07:38 last edited by
                #9

                @billconan it was meant as a question not a suggestion.

                How do you know what tag you will replace and by what ? Is it something your application users will provide ?

                Interested in AI ? www.idiap.ch
                Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                B 1 Reply Last reply 22 Aug 2017, 23:16
                0
                • H hskoglund
                  22 Aug 2017, 01:18

                  Hi, Qt has the same xml tree structure parsing stuff in the DOM classes:
                  add xml in your.pro file, eg: QT += core gui xml
                  #include "QDomDocument"
                  then you can use code like this:

                  QDomDocument doc;
                  doc.setContent(QString("<p>some text <a>link</a> some other text</p>"));
                  
                  for (auto c1 = doc.documentElement().firstChild(); !c1.isNull(); c1 = c1.nextSibling())
                  {
                      qDebug() << "level1: " << c1.toText().data();
                  
                      for (auto c2 = c1.firstChild(); !c2.isNull(); c2 = c2.nextSibling())
                          qDebug() << "       level2: " << c2.toText().data();
                  }
                  
                  B Offline
                  B Offline
                  billconan
                  wrote on 22 Aug 2017, 23:12 last edited by
                  #10

                  @hskoglund thanks this works

                  1 Reply Last reply
                  0
                  • S SGaist
                    22 Aug 2017, 07:38

                    @billconan it was meant as a question not a suggestion.

                    How do you know what tag you will replace and by what ? Is it something your application users will provide ?

                    B Offline
                    B Offline
                    billconan
                    wrote on 22 Aug 2017, 23:16 last edited by
                    #11

                    @SGaist yes, I know what will be replaced by what.

                    my task is converting doxygen xml files into html.

                    your solution would work, but it's hacky. because it's not always one-to-one translation, for example, from <para> to <p>.

                    being able to parse the initial xml into a tree structure gives me great flexibility.

                    1 Reply Last reply
                    0
                    • S Offline
                      S Offline
                      SGaist
                      Lifetime Qt Champion
                      wrote on 23 Aug 2017, 06:57 last edited by
                      #12

                      Again, it wasn't a suggestion, I was just asking whether you would simply do tag for tag replacement.

                      Out of curiosity, since you are using doxygen, why not make it generate the html directly ?

                      Interested in AI ? www.idiap.ch
                      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                      1 Reply Last reply
                      0

                      12/12

                      23 Aug 2017, 06:57

                      • Login

                      • Login or register to search.
                      12 out of 12
                      • First post
                        12/12
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved