Unexpected result from QTextCodec::canEncode(QString&)
-
Dear Qt-ists,
I have created a QTextCodec instance, named codec, for the encoding "US-ASCII", using the method QTextCodec::codecForName.
Now, I have a QString object str that contains symbols that cannot be encoded using this encoding (© and é, precisely). Indeed, these symbols are replaced with character ? in the QByteArray resulting from a call to codec->fromUnicode(str).
However the call to codec->canEncode(str) yields true, which I find rather counterintuitive as a result.
Is this expected behaviour ? If so, then I suppose that the documentation of the method canEncode should be expanded.
-
Hi and welcome to devnet,
What version of Qt are you using ?
On what platform ?Can you post a minimal compilable sample code that reproduces that ?
-
@DDEH said in Unexpected result from QTextCodec::canEncode(QString&):
I have created a QTextCodec instance, named codec, for the encoding "US-ASCII", using the method QTextCodec::codecForName.
What do you get when you do
qDebug() << codec-> mibEnum() << codec->name();
?Indeed, these symbols are replaced with character ? in the QByteArray resulting from a call to codec->fromUnicode(str).
- How did you create the unicode string?
- How did you display the encoded string?
- Call
toHex()
on your QByteArray. Are the?
characters 0x3F?
-
@SGaist said in Unexpected result from QTextCodec::canEncode(QString&):
Hi and welcome to devnet,
Hi,
Thanks for the welcome.
What version of Qt are you using ?
5.5.1
On what platform ?
linux-x86_64
Can you post a minimal compilable sample code that reproduces that ?
Yes, kind of.
main.cpp
#include "monediteur.h" #include <QApplication> int main(int argc, char *argv[]) { QApplication a(argc, argv); MonEditeur w; w.show(); return a.exec(); }
monediteur.h
#ifndef MONEDITEUR_H #define MONEDITEUR_H #include <QMainWindow> namespace Ui { class MonEditeur; } class MonEditeur : public QMainWindow { Q_OBJECT public: explicit MonEditeur(QWidget *parent = 0); ~MonEditeur(); private: Ui::MonEditeur *ui; private slots: void process(); }; #endif // MONEDITEUR_H
monediteur.cpp
#include "monediteur.h" #include "ui_monediteur.h" #include <QTextCodec> #include <QDebug> MonEditeur::MonEditeur(QWidget *parent) : QMainWindow(parent), ui(new Ui::MonEditeur) { ui->setupUi(this); connect(ui->pushButton, SIGNAL(clicked(bool)), this, SLOT(process())); } MonEditeur::~MonEditeur() { delete ui; } void MonEditeur::process() { QString codecId("US-ASCII"); const QString contents= ui->textEditor->toPlainText(); QTextCodec* codec=QTextCodec::codecForName(codecId.toLatin1()); qDebug() << "Some attributes of this codec:"; qDebug() << codec-> mibEnum() << codec->name(); if (codec->canEncode(contents)) { qDebug() << codecId << " can encode the contents"; qDebug() << contents; QByteArray ba=codec->fromUnicode(contents); QString check=codec->toUnicode(ba); qDebug() << "check: "; qDebug() << check; qDebug() << "--"; QByteArray hex = ba.toHex(); qDebug() << "toHex: "; qDebug() << hex; } }
monediteur.ui
<?xml version="1.0" encoding="UTF-8"?> <ui version="4.0"> <class>MonEditeur</class> <widget class="QMainWindow" name="MonEditeur"> <property name="geometry"> <rect> <x>0</x> <y>0</y> <width>381</width> <height>324</height> </rect> </property> <property name="windowTitle"> <string>MonEditeur</string> </property> <widget class="QWidget" name="centralWidget"> <widget class="QPlainTextEdit" name="textEditor"> <property name="geometry"> <rect> <x>0</x> <y>0</y> <width>381</width> <height>241</height> </rect> </property> </widget> <widget class="QPushButton" name="pushButton"> <property name="geometry"> <rect> <x>130</x> <y>250</y> <width>80</width> <height>25</height> </rect> </property> <property name="text"> <string>Process</string> </property> </widget> </widget> <widget class="QToolBar" name="mainToolBar"> <attribute name="toolBarArea"> <enum>TopToolBarArea</enum> </attribute> <attribute name="toolBarBreak"> <bool>false</bool> </attribute> </widget> <widget class="QStatusBar" name="statusBar"/> </widget> <layoutdefault spacing="6" margin="11"/> <resources/> <connections/> </ui>
-
Thanks for taking some of your time to look at this issue.
My answers are embedded in your post.
@JKSH said in Unexpected result from QTextCodec::canEncode(QString&):@DDEH said in Unexpected result from QTextCodec::canEncode(QString&):
I have created a QTextCodec instance, named codec, for the encoding "US-ASCII", using the method QTextCodec::codecForName.
What do you get when you do
qDebug() << codec-> mibEnum() << codec->name();
?3 "US-ASCII"
Indeed, these symbols are replaced with character ? in the QByteArray resulting from a call to codec->fromUnicode(str).
- How did you create the unicode string?
The string is the result of
toPlainText()
fromaqTextEdit
instance.- How did you display the encoded string?
With
qDebug()
for instance.- Call
toHex()
on your QByteArray. Are the?
characters 0x3F?
Yes they are.
The output of the program I posted in my previous post is the following:
Some attributes of this codec: 3 "US-ASCII" "US-ASCII" can encode the contents "© André Cymone" check: "? Andr? Cymone" -- toHex: "3f20416e64723f2043796d6f6e65"
-
@DDEH said in Unexpected result from QTextCodec::canEncode(QString&):
The output of the program I posted in my previous post is the following:
Some attributes of this codec: 3 "US-ASCII" "US-ASCII" can encode the contents "© André Cymone" check: "? Andr? Cymone" -- toHex: "3f20416e64723f2043796d6f6e65"
Looks like you found some incorrect behaviour; I agree that
canEncode()
should return false in your example.If it still behaves the same in the latest release (Qt 5.11.1), then you can submit a bug report at https://bugreports.qt.io/. However, I'm guessing that the report will be given low priority since US-ASCII is not a recommended encoding nowadays. (The devs are already putting all their time and energy into fixing much more serious bugs and adding new features)