Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Normalizing PCM Audio
QtWS25 Last Chance

Normalizing PCM Audio

Scheduled Pinned Locked Moved Solved General and Desktop
qaudioinputaudio engineaudio waveformqiodevice
13 Posts 3 Posters 3.1k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    rtavakko
    wrote on last edited by rtavakko
    #1

    Hi guys,

    This is more of an audio question and not directly related to QT but I'm not sure if I'm making a mistake somewhere with my logic. I posted a similar question on Stack overflow also but I couldn't find the answer Link here.
    Basically I'm using a QIODevice and QAudioInput to read and process audio data (I started from a QT example and modified it a little). The issue is I'm trying to normalize the signal coming from the Windows Stereo Mix and Microphone inputs and I'm not sure where the midpoint of the signal should be. With 16-bit WAV files, the signal has the right dynamic range from -32,768 to 32,767 and 0 as the midpoint (when no signal is coming in) but with Stereo Mix or Microphone, the 8-bit signed signal varies from -128 to 127 with the midpoint being -128 which is not consistent. Here is the code to process the audio data:

    AudioIODevice::AudioIODevice(QObject *parent, const QAudioFormat &deviceFormat) :
        QIODevice(parent),
        format(deviceFormat)
    {
       int sampleSize = format.sampleSize();
       switch (format.sampleType())
       {
        case QAudioFormat::UnSignedInt:
           minValue = 0.0f;
           maxValue = static_cast<float>(std::pow(2,sampleSize) - 1);
           break;
        case QAudioFormat::SignedInt:
           minValue = static_cast<float>((std::pow(2,sampleSize)/2) * (-1));
           maxValue = static_cast<float>((std::pow(2,sampleSize)/2) - 1);
           break;  
       case QAudioFormat::Float:
            break;
        default:
           break;
       }
    }
    
    qint64 AudioIODevice::writeData(const char *data, qint64 len)
    {
        unsigned int sampleBytes = format.sampleSize() / 8;                 //Number of bytes for each interleaved channel sample
        unsigned int combSampleBytes = format.channelCount() * sampleBytes; //Number of bytes for all channel samples
        unsigned int numSamples = len / combSampleBytes;                    //Total number of samples
    
        if(format.sampleSize() % 8 != 0 || len % sampleBytes != 0)
            return -1;
    
        //Prepare our output buffer
        buffer.clear();
        buffer.resize(numSamples,0);
    
        const unsigned char* uData = reinterpret_cast<const unsigned char*>(data);
    
        for(unsigned int i = 0; i < numSamples; i++)
        {        
            float monoValue = minValue;
            float value = minValue;
    
            //Process data for all interleaved samples
            for(unsigned int j = 0; j < format.channelCount(); j++)
            {
                switch (format.sampleType())
                {
                    case QAudioFormat::UnSignedInt:
                    switch(format.sampleSize())
                    {
                        case 8:
                        value = *reinterpret_cast<const quint8*>(uData);
                        break;
                        case 16:
                        value = (format.byteOrder()==QAudioFormat::LittleEndian)?
                                    (qFromLittleEndian<quint16>(*reinterpret_cast<const quint16*>(uData))):
                                    (qFromBigEndian<quint16>(*reinterpret_cast<const quint16*>(uData)));
                        break;
                        case 32:
                        value = (format.byteOrder()==QAudioFormat::LittleEndian)?
                                    (qFromLittleEndian<quint32>(*reinterpret_cast<const quint32*>(uData))):
                                    (qFromBigEndian<quint32>(*reinterpret_cast<const quint32*>(uData)));
                        break;
                        default:
                        break;
                    }
                    break;
                    case QAudioFormat::SignedInt:
                    switch(format.sampleSize())
                    {
                        case 8:
                        value = *reinterpret_cast<const qint8*>(uData);
                        break;
                        case 16:
                        value = (format.byteOrder()==QAudioFormat::LittleEndian)?
                                    (qFromLittleEndian<qint16>(*reinterpret_cast<const qint16*>(uData))):
                                    (qFromBigEndian<qint16>(*reinterpret_cast<const qint16*>(uData)));
                        break;
                        case 32:
                        value = (format.byteOrder()==QAudioFormat::LittleEndian)?
                                    (qFromLittleEndian<qint32>(*reinterpret_cast<const qint32*>(uData))):
                                    (qFromBigEndian<qint32>(*reinterpret_cast<const qint32*>(uData)));
                        break;
                        default:
                        break;
                    }
                    break;
                    case QAudioFormat::Float:
                    break;
                    default:
                    break;
                }
                monoValue = std::max(value,monoValue);
                uData += sampleBytes; //Get data for the next sample
            }
            buffer[i] = (monoValue - minValue) / (maxValue - minValue);    //Normalize the value to [0-1]
    
        }
        emit bufferReady();
        return len;
    }
    

    Should I be expecting 0 as the midpoint of a signed 8-bit PCM signal or is there no standard for this and I have to figure out another way?

    Cheers!

    1 Reply Last reply
    0
    • R Offline
      R Offline
      rtavakko
      wrote on last edited by
      #13

      After a few months of searching for an definitive answer to this topic, I've reached the conclusion that my original assumption would be correct. This page describes how to determine the midpoint of a standard PCM audio signal:

      https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d

      For example an 8-bit signed PCM signal has these ranges:

      Min: -128
      Max: 128
      Midpoint: 0

      As to why the signal I'm getting from my soundcard sits at -128 when there is no sound, I'm going to assume that this is related to a driver problem or could be that this particular piece of hardware does not follow the PCM standard.

      Converting to the logarithmic scale in my understanding is not related to this issue because you should be able to normalize the signal in time-domain even though eventually you will most likely need to convert it to the log scale if you are doing anything in the frequency domain (e.g. FFT).

      If anyone has any input, please feel free to add it.

      1 Reply Last reply
      0
      • Kent-DorfmanK Offline
        Kent-DorfmanK Offline
        Kent-Dorfman
        wrote on last edited by Kent-Dorfman
        #2
        This post is deleted!
        1 Reply Last reply
        0
        • Kent-DorfmanK Offline
          Kent-DorfmanK Offline
          Kent-Dorfman
          wrote on last edited by
          #3

          Ok, so looks like you're asking about a complete sound sample and not "on the fly", which is good...because you really cannot normalize sound "on the fly". Keep in mind that sound energy is non-linear. It's propagating in 3 dimensions so the dB scale is logarithmic, and not linear. Mathematical mid point will be based on the PCM format you are using: signed vs unsigned, but as I mentioned 128 as a midpoint of [0..255] signed is not an auditory midpoint. you should probably map your midpoint based on a logarithmic scale in the available range, and be careful about signed conversions. I never use signed data to represent PCM data because the electronics of the sound card are always some positive voltage level.

          R 1 Reply Last reply
          1
          • Kent-DorfmanK Kent-Dorfman

            Ok, so looks like you're asking about a complete sound sample and not "on the fly", which is good...because you really cannot normalize sound "on the fly". Keep in mind that sound energy is non-linear. It's propagating in 3 dimensions so the dB scale is logarithmic, and not linear. Mathematical mid point will be based on the PCM format you are using: signed vs unsigned, but as I mentioned 128 as a midpoint of [0..255] signed is not an auditory midpoint. you should probably map your midpoint based on a logarithmic scale in the available range, and be careful about signed conversions. I never use signed data to represent PCM data because the electronics of the sound card are always some positive voltage level.

            R Offline
            R Offline
            rtavakko
            wrote on last edited by
            #4

            @Kent-Dorfman Thanks a lot for your response! Yes, the writeData method gives me a buffer of data which has a format already determined by the QAudioFormat of the device supplying it (including unsigned / signed).

            I understand the need to convert to a logarithmic scale and do this at a later stage when I take the FFT of the data but I need a reference amplitude for that conversion and I'm not sure what I need to use for that.

            One thing I noticed is that you can set the data type (unsigned / signed) of the QAudioFormat but whether or not it actually sets will depend on if that setting is supported by the device. I'll try messing around with that to see if it does something useful.

            1 Reply Last reply
            0
            • R Offline
              R Offline
              rtavakko
              wrote on last edited by
              #5

              Any more thoughts on this guys? I'm stuck on this

              1 Reply Last reply
              0
              • SGaistS Offline
                SGaistS Offline
                SGaist
                Lifetime Qt Champion
                wrote on last edited by
                #6

                Hi,

                You might want to check the DSPfilters. It might offers you what you need.

                Interested in AI ? www.idiap.ch
                Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                R 1 Reply Last reply
                1
                • SGaistS SGaist

                  Hi,

                  You might want to check the DSPfilters. It might offers you what you need.

                  R Offline
                  R Offline
                  rtavakko
                  wrote on last edited by
                  #7

                  @SGaist Thanks for that link! The library looks cool, I'm going through the source to see how they did things but I'm trying to build my own little audio engine.
                  Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                  SGaistS 1 Reply Last reply
                  0
                  • R rtavakko

                    @SGaist Thanks for that link! The library looks cool, I'm going through the source to see how they did things but I'm trying to build my own little audio engine.
                    Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                    SGaistS Offline
                    SGaistS Offline
                    SGaist
                    Lifetime Qt Champion
                    wrote on last edited by
                    #8

                    @rtavakko said in Normalizing PCM Audio:

                    Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                    I am sorry but I am not sure to understand exactly what you are looking for.

                    Interested in AI ? www.idiap.ch
                    Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                    R 1 Reply Last reply
                    0
                    • SGaistS SGaist

                      @rtavakko said in Normalizing PCM Audio:

                      Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                      I am sorry but I am not sure to understand exactly what you are looking for.

                      R Offline
                      R Offline
                      rtavakko
                      wrote on last edited by rtavakko
                      #9

                      @SGaist The issue I'm stuck solving is that I'm not sure where a 'silent' audio signal that is represented as signed int data should be.

                      For example 16-bit WAV files are in the range -32,768 to 32,767 and 0 is the midpoint but the 8-bit signal that I get from a microphone or other live feeds are in the -128 to 127 range with -128 also being the midpoint (silent signal).

                      So I'm stuck trying to find a universal way to normalize audio to the 0 - 1 range with 0.5 being the midpoint. Anything I've read so far suggests that 0 should always be the midpoint for signed audio but I don't know for sure at this point

                      Kent-DorfmanK 1 Reply Last reply
                      0
                      • R rtavakko

                        @SGaist The issue I'm stuck solving is that I'm not sure where a 'silent' audio signal that is represented as signed int data should be.

                        For example 16-bit WAV files are in the range -32,768 to 32,767 and 0 is the midpoint but the 8-bit signal that I get from a microphone or other live feeds are in the -128 to 127 range with -128 also being the midpoint (silent signal).

                        So I'm stuck trying to find a universal way to normalize audio to the 0 - 1 range with 0.5 being the midpoint. Anything I've read so far suggests that 0 should always be the midpoint for signed audio but I don't know for sure at this point

                        Kent-DorfmanK Offline
                        Kent-DorfmanK Offline
                        Kent-Dorfman
                        wrote on last edited by Kent-Dorfman
                        #10

                        @rtavakko

                        For PCM audio a silent signal is represented by a a contiuous stream of values that are the same. It is the changes in amplitude that form the sound waveform. You can have silence at any output amplitude if the sample values don't change. Obviously any changes to those values create a waveform. So you cannot look for silence in the method you are thinking.

                        If you use 8khz as your carrier and create a u16 stream of shorts such as

                        16384,0,16384,0,16384,0... then you will get a loud 8khz (harsh) tone.

                        1000,0,1000,0,1000,0... give the same harsh 8khz tone, but at a greatly diminished volume.

                        any stream of x,x,x,x,x,x,x... will create silence.

                        download and play with audacity, and programmatically create audio files to experiment with different effects: sin, square, sawtooth waveforms of different amplitudes.

                        EDIT - actually I screwed up. If the sample rate is 8khz, then you can only reproduce frequencies up to 4khz, since it's the change that forms the wave, not the data points themselves.

                        R 1 Reply Last reply
                        0
                        • Kent-DorfmanK Kent-Dorfman

                          @rtavakko

                          For PCM audio a silent signal is represented by a a contiuous stream of values that are the same. It is the changes in amplitude that form the sound waveform. You can have silence at any output amplitude if the sample values don't change. Obviously any changes to those values create a waveform. So you cannot look for silence in the method you are thinking.

                          If you use 8khz as your carrier and create a u16 stream of shorts such as

                          16384,0,16384,0,16384,0... then you will get a loud 8khz (harsh) tone.

                          1000,0,1000,0,1000,0... give the same harsh 8khz tone, but at a greatly diminished volume.

                          any stream of x,x,x,x,x,x,x... will create silence.

                          download and play with audacity, and programmatically create audio files to experiment with different effects: sin, square, sawtooth waveforms of different amplitudes.

                          EDIT - actually I screwed up. If the sample rate is 8khz, then you can only reproduce frequencies up to 4khz, since it's the change that forms the wave, not the data points themselves.

                          R Offline
                          R Offline
                          rtavakko
                          wrote on last edited by
                          #11

                          @Kent-Dorfman I understand the concept but I'm still not sure how I would go about normalizing the signal. I'm still processing the signal as it comes in as an instantaneous set of values. Do I need to compare each value in the array to the previous one and set it to the lower limit of a dBFS scale if they are equal?

                          1 Reply Last reply
                          0
                          • R Offline
                            R Offline
                            rtavakko
                            wrote on last edited by
                            #12

                            Still trying to figure this out. Any thoughts on converting to the right scale (log scale seems to be appropriate) would help

                            1 Reply Last reply
                            0
                            • R Offline
                              R Offline
                              rtavakko
                              wrote on last edited by
                              #13

                              After a few months of searching for an definitive answer to this topic, I've reached the conclusion that my original assumption would be correct. This page describes how to determine the midpoint of a standard PCM audio signal:

                              https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d

                              For example an 8-bit signed PCM signal has these ranges:

                              Min: -128
                              Max: 128
                              Midpoint: 0

                              As to why the signal I'm getting from my soundcard sits at -128 when there is no sound, I'm going to assume that this is related to a driver problem or could be that this particular piece of hardware does not follow the PCM standard.

                              Converting to the logarithmic scale in my understanding is not related to this issue because you should be able to normalize the signal in time-domain even though eventually you will most likely need to convert it to the log scale if you are doing anything in the frequency domain (e.g. FFT).

                              If anyone has any input, please feel free to add it.

                              1 Reply Last reply
                              0

                              • Login

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • Users
                              • Groups
                              • Search
                              • Get Qt Extensions
                              • Unsolved