QByteArray to string?
-
@SGaist
I promise you all you'll see is aQByteArray
being returned with the sub-process's output, and I'm trying to convert that to aQString
to put into aQTextEdit
. That's all the question is. And I get aUnicodeDecodeError
, probably whenrobocopy
echoes the name of a file which has that 0x9c character in it via PyQt'sdecode()
:can't decode byte 0x9c in position 32: invalid start byte
So presumably all you have to do is create a
QByteArray
, put a0x9c
in its first byte, and tryqba.data().decode('utf8')
. That's what this thread is about.This whole issue where I'm discussing the code is in https://forum.qt.io/topic/85493/unicodedecodeerror-with-output-from-windows-os-command. If you'd be kind enough to look at that, I think that's a more appropriate place to discuss the code than here? If you still want more code there, let me know, and I'll supply.
-
I don't have a Windows machine at hand. Doing this on macOS yields correct results
from PyQt5.QtCore import QByteArray ba = QByteArray() ba.append(u"\u009C") PyQt5.QtCore.QByteArray(b'\xc2\x9c') ba.data().decode('utf-8') '\x9c' ba.data().decode('utf-16') '鳂'
-
@SGaist
I'm afraid I don't believe that relates to the situation.I now have information from the client:
The exception occurs (only) when a filename
robocopy
encounters ---robocopy
is echoing filenames as it goes --- contains the£
(UK pound sterling) character (I am in the UK, you may not be). In that situation,ba.data().decode('utf-8')
(whereba
is theQByteArray
fromQProcess.readAllStandardOutput()
) results in:Unhandled Exception: 'utf-8' codec can't decode byte 0x9c in position 32: invalid start byte <class 'UnicodeDecodeError'> File "C:\HJinn\widgets\messageboxes.py", line 289, in processReadyReadStandardOutput output = output.data().decode('utf-8')
Now, armed with that information:
- In a Command Prompt I type in:
echo £ > file
- I dump the file and I see:
9C 20 0D 0A
- So the
£
character is single byte with value 0x9C
- In a Command Prompt I type in:
-
What do you get if you use
unicode_escape
in place ofutf-8
? -
@SGaist
I don't know, because I don't have access to the code right now, but I will tomorrow.Thank you, your suggestion is much more like what I have been looking for. We are now discussing the argument to
decode()
:- I believe
utf-8
is definitely right for Linux, where I develop. - I'm beginning to learn (whether I like it or not) that it is not for Windows.
- Under Windows
utf-8
does work 99% of the time, but not always, and now I know not for the£
character. - I believe that either
latin-1
orwindows_1252
may be able to handle this correctly. - I will also try your
unicode_escape
if you think it's worthwhile.
- I believe
-
@SGaist
I believe what I am seeking from you is: Haven't I seen that Qt has some function to "get the current system encoding", but I can't spot it?Then my code would be:
ba.data().decode(Qt.getCurrentSystemEncoding())
and everything would just work....
[EDIT: Ooohhhh, is http://doc.qt.io/qt-5/qtextcodec.html#codecForLocale what I'm looking for, perhaps?
QTextCodec *QTextCodec::codecForLocale()
Returns a pointer to the codec most suitable for this locale.
On Windows, the codec will be based on a system locale. On Unix systems, the codec will might fall back to using the iconv library if no builtin codec for the locale can be found.
Or, was I thinking of the Python
sys.getfilesystemencoding()
https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding
But that seems filename-specific, my output could be anything, not especially file names. -
[This post cross-posed to https://forum.qt.io/topic/85493/unicodedecodeerror-with-output-from-windows-os-command/18 ]
For the record, I have done exhaustive investigation, and there is only one solution which "correctly" displays the
£
character under Windows. I am exhausted so will keep this brief:-
To create a file name with a
£
in it: Go into, say, Notepad and use its Save to name a file likeabc£.txt
. This is in the UK, using a UK keyboard and a standard UK-configured Windows. -
Note that at this point if you view the filename in either Explorer or, say, via
dir
you do see a£
, not some other character. That's what my user will want to see in the output of the command he will run. -
Run an OS command like
robocopy
or evendir
, which will include the filename in its output. -
Read the output with
QProcess.readAllStandardOutput()
. I'm saying the£
character will arrive as a single byte of value 0x9c. -
For the required Python/PyQt decoding
bytes->str
(QByteArray->QString
) line, the only thing which works (does not raise an exception) AND represents the character as a£
is:ba.bytes().decode("cp850")
.
That is the "Code Page 850", used in UK/Western Europe (so I'm told). It is the result output of you open a Command Prompt and execute just
chcp
.Any other decoding either raises
UnicodeDecodeError
(e.g. ifutf-8
) or decodes but represents it with another character (e.g. ifwindows_1252
orcp1252
).I still haven't found a way of getting that
cp850
encoding name programatically from anywhere --- if you ask Python for, say, the "system encoding" or "user's preferred encoding" you get thecp1252
--- so I've had to hard-code it. [EDIT: If you want it, it'sctypes.cdll.kernel32.GetConsoleOutputCP()
.]So there you are. I don't have C++ as opposed to Python for Qt, but I have a suspicion that if anyone tries it using the straight C++ Qt way of
text = QString(process.readAllStandardOutput())
they'll find they do not actually get to see the£
symbol.... -
-
Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.
We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is "utf-8" , so you can use directly:
b"python byte to string".decode("utf-8")
-
@germyrinn
Hi, this was an old post of mine.As I wrote, the problem is that for the
£
sign e.g. read from a file created in the way I describe,decode("utf-8")
gives me aUnicodeDecodeError
. I found the only conversion which works isdecode("cp850")
.