Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QJsonDocument::fromJson fails on "foreign" characters
QtWS25 Last Chance

QJsonDocument::fromJson fails on "foreign" characters

Scheduled Pinned Locked Moved Unsolved General and Desktop
qjsondocumentutf8fromjson
17 Posts 4 Posters 10.5k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    Per Gunnar Holm
    wrote on 17 Aug 2016, 11:37 last edited by
    #1

    I have boiled the problem down to this small program below.

    When I run it, it is successful on the first text input, but fails on the second (which contains the norwegian characters Æ,Ø and Å).

    I am guessing that I am doing something wrong. It seems unlikely that noone had complained loudly if this really was an error in QJsonDocument::fromJson...

    As additional information, I am running this code on:

    • CentOS release 6.8
    • Linux 2.6.32-642.1.1.el6.x86_64
    • gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
    • Qt 5.6
    #include <QtCore/QCoreApplication>
    #include <QtCore/QJsonDocument>
    
    #include <iostream>
    
    void textMessageReceived(const QString &msg)
    {
        std::cerr << "msg = " << msg.toStdString() << std::endl;
        QJsonParseError error;
    
        // Create a Json document from text. Fails for foreign characters!
        QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error);
    
        if(doc.isNull())
            std::cerr << "(Broken json), error code = "
                      << error.errorString().toStdString() << std::endl;
        else
            std::cerr << "(Valid json) : " << doc.toJson().toStdString() << std::endl;
    }
    
    int main()
    {
    
        QString jsontext = "{\"string\": \"abc\"}";
        textMessageReceived(jsontext);
    
        jsontext = "{\"string\": \"æøå\"}";
        textMessageReceived(jsontext);
    
        return 0;
    }
    

    When run, the program will output

    msg = {"string": "abc"}
    (Valid json) : {
        "string": "abc"
    }
    
    msg = {"string": "æøå"}
    (Broken json), error code = invalid UTF8 string
    
    1 Reply Last reply
    2
    • M Offline
      M Offline
      mrjj
      Lifetime Qt Champion
      wrote on 17 Aug 2016, 12:46 last edited by mrjj
      #2

      @Per-Gunnar-Holm said:

      jsontext = "{"string": "æøå"}";

      Hej
      Im wondering if it will be encoded correctly as UTF8
      when inline in source file?

      I did load some json with ÆØÅ ( im dane) from
      a utf8 file and didnt notice
      anything/ errors.
      But this file was already encoded as utf8.

      Maybe when inline text it becomes full unicode or something?
      what happens if u dont do toUtf8() ?

      1 Reply Last reply
      1
      • P Offline
        P Offline
        Per Gunnar Holm
        wrote on 17 Aug 2016, 13:13 last edited by
        #3

        Thank you for your interest!

        The text is inline in my sample program, but this just to keep the size of the problem to a minimum. In my real-life case I receive json text over (Qt) web socket. The behavior seems to be identical though.

        I tried modifying to not use QString::toUtf8, but there is no change.

        void textMessageReceived(const QString &msg)
        {
            :
            QByteArray ba (msg.toStdString().c_str());
            // Create a Json document from text string. Fails for foreign characters!
            QJsonDocument doc = QJsonDocument::fromJson(ba, &error);
            :
        }
        

        I have, by the way, verified that 'æ', 'ø' and 'å' get the correct unicode encoding. What separates them from the other characters is that they require 2 bytes. E.g:
        's' has code 0x73
        't' has code 0x74
        but
        'æ' has code 0xC3A6
        'ø' has code 0xC3B8
        'å' has code 0xC3A5

        When stopping with the debugger in QtCreator I can see that the unicode values are correct...

        1 Reply Last reply
        0
        • M Offline
          M Offline
          mrjj
          Lifetime Qt Champion
          wrote on 17 Aug 2016, 13:20 last edited by
          #4

          @Per-Gunnar-Holm said:
          Hi, maybe something ruins the encoding on the way.
          I wondering if
          msg.toStdString().c_str()
          handle UNICODE ?

          maybe
          msg.toUTF8().toStdString().c_str()

          I cant test anything currently , so out of suggestions.
          Im 99% sure it should handle UTF/unicode

          P 1 Reply Last reply 17 Aug 2016, 13:52
          0
          • M mrjj
            17 Aug 2016, 13:20

            @Per-Gunnar-Holm said:
            Hi, maybe something ruins the encoding on the way.
            I wondering if
            msg.toStdString().c_str()
            handle UNICODE ?

            maybe
            msg.toUTF8().toStdString().c_str()

            I cant test anything currently , so out of suggestions.
            Im 99% sure it should handle UTF/unicode

            P Offline
            P Offline
            Per Gunnar Holm
            wrote on 17 Aug 2016, 13:52 last edited by
            #5

            @mrjj said:

            msg.toUTF8().toStdString().c_str()

            Thanks again.
            I tested the last suggestion too, but the contents of 'ba' are always the same.
            So, this (ba) is the content that is being sent to QJsonDocument::fromJson.
            Also, you can note that msg (QString) is not in utf8.

            		ba	"{"string": "æøå"}"	QByteArray
            				'{' 	123    	0x7b	char
            				'"' 	34    	0x22	char
            				's' 	115    	0x73	char
            				't' 	116    	0x74	char
            				'r' 	114    	0x72	char
            				'i' 	105    	0x69	char
            				'n' 	110    	0x6e	char
            				'g' 	103    	0x67	char
            				'"' 	34    	0x22	char
            				':' 	58    	0x3a	char
            				' ' 	32    	0x20	char
            				'"' 	34    	0x22	char
            				'ᅢ' 	-61/195	0xc3	char
            				'ᆭ' 	-90/166	0xa6	char
            				'ᅢ' 	-61/195	0xc3	char
            				'ᄌ' 	-72/184	0xb8	char
            				'ᅢ' 	-61/195	0xc3	char
            				'ᆬ' 	-91/165	0xa5	char
            				'"' 	34    	0x22	char
            				'}' 	125    	0x7d	char
            		msg	"{"string": "æøå"}"	QString &
            			[0]	'{' 	123	0x007b	QChar
            			[1]	'"' 	34	0x0022	QChar
            			[2]	's' 	115	0x0073	QChar
            			[3]	't' 	116	0x0074	QChar
            			[4]	'r' 	114	0x0072	QChar
            			[5]	'i' 	105	0x0069	QChar
            			[6]	'n' 	110	0x006e	QChar
            			[7]	'g' 	103	0x0067	QChar
            			[8]	'"' 	34	0x0022	QChar
            			[9]	':' 	58	0x003a	QChar
            			[10]	' ' 	32	0x0020	QChar
            			[11]	'"' 	34	0x0022	QChar
            			[12]	'æ' 	230	0x00e6	QChar
            			[13]	'ø' 	248	0x00f8	QChar
            			[14]	'å' 	229	0x00e5	QChar
            			[15]	'"' 	34	0x0022	QChar
            			[16]	'}' 	125	0x007d	QChar
            
            1 Reply Last reply
            0
            • VRoninV Offline
              VRoninV Offline
              VRonin
              wrote on 17 Aug 2016, 14:56 last edited by VRonin
              #6

              can you replace jsontext = "{\"string\": \"æøå\"}"; with jsontext =QString::fromWCharArray(L"{\"string\": \"æøå\"}"); or try

              jsontext =QString::fromWCharArray(
                              L"{\"string\": \""
                           L"\u00E6"
                           L"\u00F8"
                              L"\u00E5"
                              L"\"}"
                              );
              

              What I mean is that the problem is not how you read the string but how you create it

              "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
              ~Napoleon Bonaparte

              On a crusade to banish setIndexWidget() from the holy land of Qt

              P 1 Reply Last reply 17 Aug 2016, 15:17
              2
              • VRoninV VRonin
                17 Aug 2016, 14:56

                can you replace jsontext = "{\"string\": \"æøå\"}"; with jsontext =QString::fromWCharArray(L"{\"string\": \"æøå\"}"); or try

                jsontext =QString::fromWCharArray(
                                L"{\"string\": \""
                             L"\u00E6"
                             L"\u00F8"
                                L"\u00E5"
                                L"\"}"
                                );
                

                What I mean is that the problem is not how you read the string but how you create it

                P Offline
                P Offline
                Per Gunnar Holm
                wrote on 17 Aug 2016, 15:17 last edited by
                #7

                @VRonin
                Thank you for looking into it!
                Unfortunately the result is the same in both cases:

                jsontext = "{\"string\": \"æøå\"}"
                msg = {"string": "æøå"}
                (Broken json), error code = invalid UTF8 string
                

                This is a bit of a surprise to me, I thought the second option would force the characters down to single byte representation?

                Here's the actual code I ran (first suggestion commented out):

                int main()
                {
                    QString jsontext = "{\"string\": \"abc\"}";
                    textMessageReceived(jsontext);
                //    jsontext = QString::fromWCharArray(L"{\"string\": \"æøå\"}");
                    jsontext = QString::fromWCharArray(L"{\"string\": \"" L"\u00E6" L"\u00F8" L"\u00E5" L"\"}");
                    qDebug() << "jsontext =" << jsontext;
                    textMessageReceived(jsontext);
                
                    return 0;
                }
                
                1 Reply Last reply
                0
                • VRoninV Offline
                  VRoninV Offline
                  VRonin
                  wrote on 17 Aug 2016, 15:24 last edited by VRonin
                  #8

                  This works for me. Qt 5.5.1 on MSVC2013

                  #include <QtCore/QCoreApplication>
                  #include <QtCore/QJsonDocument>
                  
                  #include <QDebug>
                  
                  void textMessageReceived(const QString &msg)
                  {
                      qDebug() << "msg = " << msg << '\n';
                      QJsonParseError error;
                  
                      // Create a Json document from text. Fails for foreign characters!
                      QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error);
                  
                      if(doc.isNull())
                          qDebug() << "(Broken json), error code = "
                                    << error.errorString() << '\n';
                      else
                          qDebug() << "(Valid json) : " << QString(doc.toJson()) << '\n';
                  }
                  
                  int main()
                  {
                  
                      QString jsontext = "{\"string\": \"abc\"}";
                      textMessageReceived(jsontext);
                  
                      jsontext = QString::fromWCharArray(
                                  L"{\"string\": \""
                               L"\u00E6"
                               L"\u00F8"
                                  L"\u00E5"
                                  L"\"}"
                                  );
                      textMessageReceived(jsontext);
                  
                      return 0;
                  
                  }
                  

                  Output:

                  msg =  "{\"string\": \"abc\"}" 
                  
                  (Valid json) :  "{\n    \"string\": \"abc\"\n}\n" 
                  
                  msg =  "{\"string\": \"æøå\"}" 
                  
                  (Valid json) :  "{\n    \"string\": \"æøå\"\n}\n" 
                  

                  "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                  ~Napoleon Bonaparte

                  On a crusade to banish setIndexWidget() from the holy land of Qt

                  P 1 Reply Last reply 17 Aug 2016, 15:38
                  5
                  • VRoninV VRonin
                    17 Aug 2016, 15:24

                    This works for me. Qt 5.5.1 on MSVC2013

                    #include <QtCore/QCoreApplication>
                    #include <QtCore/QJsonDocument>
                    
                    #include <QDebug>
                    
                    void textMessageReceived(const QString &msg)
                    {
                        qDebug() << "msg = " << msg << '\n';
                        QJsonParseError error;
                    
                        // Create a Json document from text. Fails for foreign characters!
                        QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error);
                    
                        if(doc.isNull())
                            qDebug() << "(Broken json), error code = "
                                      << error.errorString() << '\n';
                        else
                            qDebug() << "(Valid json) : " << QString(doc.toJson()) << '\n';
                    }
                    
                    int main()
                    {
                    
                        QString jsontext = "{\"string\": \"abc\"}";
                        textMessageReceived(jsontext);
                    
                        jsontext = QString::fromWCharArray(
                                    L"{\"string\": \""
                                 L"\u00E6"
                                 L"\u00F8"
                                    L"\u00E5"
                                    L"\"}"
                                    );
                        textMessageReceived(jsontext);
                    
                        return 0;
                    
                    }
                    

                    Output:

                    msg =  "{\"string\": \"abc\"}" 
                    
                    (Valid json) :  "{\n    \"string\": \"abc\"\n}\n" 
                    
                    msg =  "{\"string\": \"æøå\"}" 
                    
                    (Valid json) :  "{\n    \"string\": \"æøå\"\n}\n" 
                    
                    P Offline
                    P Offline
                    Per Gunnar Holm
                    wrote on 17 Aug 2016, 15:38 last edited by
                    #9

                    Thanks again @VRonin !

                    That means this is platform dependent I suppose. As far as I can see there are no differences between your code and mine (except you use qDebug).
                    It would also explain why there hasn't been a torrent of complaints to Qt if this is a bug and not me making a mistake; probably not too many on my platform.
                    As stated initially I have Qt 5.6.0 and I'm on Linux (CentOS 6.8).

                    1 Reply Last reply
                    0
                    • VRoninV Offline
                      VRoninV Offline
                      VRonin
                      wrote on 17 Aug 2016, 15:45 last edited by
                      #10

                      Even on Qt5.7 works for me.
                      2 questions:

                      • did you try my code?
                      • What encoding do you use on your .cpp file? (in QtCreator you can see it in edit->select encoding

                      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                      ~Napoleon Bonaparte

                      On a crusade to banish setIndexWidget() from the holy land of Qt

                      P M 2 Replies Last reply 17 Aug 2016, 15:52
                      0
                      • VRoninV VRonin
                        17 Aug 2016, 15:45

                        Even on Qt5.7 works for me.
                        2 questions:

                        • did you try my code?
                        • What encoding do you use on your .cpp file? (in QtCreator you can see it in edit->select encoding
                        P Offline
                        P Offline
                        Per Gunnar Holm
                        wrote on 17 Aug 2016, 15:52 last edited by
                        #11

                        @VRonin

                        Now I did :-)
                        Same result (unfortunately):

                        msg =  "{\"string\": \"abc\"}" 
                        (Valid json) :  "{\n    \"string\": \"abc\"\n}\n" 
                        msg =  "{\"string\": \"æøå\"}" 
                        (Broken json), error code =  "invalid UTF8 string" 
                        

                        I checked the encoding as instructed, and the "Text Encoding" window comes up with "UTF-8" high-lighted.
                        I am assuming this is OK?
                        -- Gunnar

                        1 Reply Last reply
                        0
                        • VRoninV VRonin
                          17 Aug 2016, 15:45

                          Even on Qt5.7 works for me.
                          2 questions:

                          • did you try my code?
                          • What encoding do you use on your .cpp file? (in QtCreator you can see it in edit->select encoding
                          M Offline
                          M Offline
                          mrjj
                          Lifetime Qt Champion
                          wrote on 18 Aug 2016, 06:15 last edited by mrjj
                          #12

                          hi
                          window 7, Qt 5.7 also shows it ok.

                          Linux Mint, Qt 5.5

                          Notice it dont render as æøå but still consider it valid.

                          P 1 Reply Last reply 18 Aug 2016, 09:35
                          1
                          • M mrjj
                            18 Aug 2016, 06:15

                            hi
                            window 7, Qt 5.7 also shows it ok.

                            Linux Mint, Qt 5.5

                            Notice it dont render as æøå but still consider it valid.

                            P Offline
                            P Offline
                            Per Gunnar Holm
                            wrote on 18 Aug 2016, 09:35 last edited by
                            #13

                            Thanks @mrjj
                            It looks increasingly as if this is a problem specific to my platform.
                            That's good news for the world, but bad news for me, of course, since it decreases the likelihood of getting it fixed :-(

                            1 Reply Last reply
                            0
                            • P Offline
                              P Offline
                              Per Gunnar Holm
                              wrote on 18 Aug 2016, 12:48 last edited by
                              #14

                              Today I have tested on virtual installations of Ubuntu 16.04 and CentOS-7, both using QT 5.6.1.
                              The test program passes without problems there, so this is definitely a CentOS-6 problem.

                              M kshegunovK 2 Replies Last reply 18 Aug 2016, 12:52
                              1
                              • P Per Gunnar Holm
                                18 Aug 2016, 12:48

                                Today I have tested on virtual installations of Ubuntu 16.04 and CentOS-7, both using QT 5.6.1.
                                The test program passes without problems there, so this is definitely a CentOS-6 problem.

                                M Offline
                                M Offline
                                mrjj
                                Lifetime Qt Champion
                                wrote on 18 Aug 2016, 12:52 last edited by
                                #15

                                @Per-Gunnar-Holm
                                well maybe Qt 5.6.1 has a bug on that distro.

                                I hope Is it an option to use CentOs-7 instead :)

                                1 Reply Last reply
                                0
                                • P Per Gunnar Holm
                                  18 Aug 2016, 12:48

                                  Today I have tested on virtual installations of Ubuntu 16.04 and CentOS-7, both using QT 5.6.1.
                                  The test program passes without problems there, so this is definitely a CentOS-6 problem.

                                  kshegunovK Offline
                                  kshegunovK Offline
                                  kshegunov
                                  Moderators
                                  wrote on 18 Aug 2016, 14:02 last edited by kshegunov
                                  #16

                                  @Per-Gunnar-Holm

                                  QByteArray ba (msg.toStdString().c_str());
                                  

                                  Bug or no, the above line doesn't seem correct. You should enforce the required encoding (as @VRonin has done) instead of relying on the internal representation of std::string and/or QString.
                                  The above should be:

                                  QByteArray ba = msg.toUtf8();
                                  

                                  If you need to output QStrings to the standard streams, attach a QTextStream to them instead of converting the objects to std::string:

                                  QTextStream cout(stdout);
                                  QTextStream cerr(stderr);
                                  QTextStream cin(stdin);
                                  

                                  Kind regards.

                                  Read and abide by the Qt Code of Conduct

                                  P 1 Reply Last reply 18 Aug 2016, 14:45
                                  2
                                  • kshegunovK kshegunov
                                    18 Aug 2016, 14:02

                                    @Per-Gunnar-Holm

                                    QByteArray ba (msg.toStdString().c_str());
                                    

                                    Bug or no, the above line doesn't seem correct. You should enforce the required encoding (as @VRonin has done) instead of relying on the internal representation of std::string and/or QString.
                                    The above should be:

                                    QByteArray ba = msg.toUtf8();
                                    

                                    If you need to output QStrings to the standard streams, attach a QTextStream to them instead of converting the objects to std::string:

                                    QTextStream cout(stdout);
                                    QTextStream cerr(stderr);
                                    QTextStream cin(stdin);
                                    

                                    Kind regards.

                                    P Offline
                                    P Offline
                                    Per Gunnar Holm
                                    wrote on 18 Aug 2016, 14:45 last edited by
                                    #17

                                    Thanks @kshegunov !

                                    The c_str() was just something we tested along the way!
                                    The original code was

                                        // Create a Json document from text. Fails for foreign characters!
                                        QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error);
                                    

                                    However, all the variations we/I have tried display the same problem (on CentOS 6.5, as we have discovered).

                                    1 Reply Last reply
                                    0

                                    1/17

                                    17 Aug 2016, 11:37

                                    • Login

                                    • Login or register to search.
                                    1 out of 17
                                    • First post
                                      1/17
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • Users
                                    • Groups
                                    • Search
                                    • Get Qt Extensions
                                    • Unsolved