Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Normalizing PCM Audio

Normalizing PCM Audio

Scheduled Pinned Locked Moved Solved General and Desktop
qaudioinputaudio engineaudio waveformqiodevice
13 Posts 3 Posters 3.1k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    Kent-Dorfman
    wrote on 12 Aug 2020, 00:58 last edited by Kent-Dorfman 8 Dec 2020, 01:00
    #2
    This post is deleted!
    1 Reply Last reply
    0
    • K Offline
      K Offline
      Kent-Dorfman
      wrote on 12 Aug 2020, 01:13 last edited by
      #3

      Ok, so looks like you're asking about a complete sound sample and not "on the fly", which is good...because you really cannot normalize sound "on the fly". Keep in mind that sound energy is non-linear. It's propagating in 3 dimensions so the dB scale is logarithmic, and not linear. Mathematical mid point will be based on the PCM format you are using: signed vs unsigned, but as I mentioned 128 as a midpoint of [0..255] signed is not an auditory midpoint. you should probably map your midpoint based on a logarithmic scale in the available range, and be careful about signed conversions. I never use signed data to represent PCM data because the electronics of the sound card are always some positive voltage level.

      R 1 Reply Last reply 12 Aug 2020, 11:20
      1
      • K Kent-Dorfman
        12 Aug 2020, 01:13

        Ok, so looks like you're asking about a complete sound sample and not "on the fly", which is good...because you really cannot normalize sound "on the fly". Keep in mind that sound energy is non-linear. It's propagating in 3 dimensions so the dB scale is logarithmic, and not linear. Mathematical mid point will be based on the PCM format you are using: signed vs unsigned, but as I mentioned 128 as a midpoint of [0..255] signed is not an auditory midpoint. you should probably map your midpoint based on a logarithmic scale in the available range, and be careful about signed conversions. I never use signed data to represent PCM data because the electronics of the sound card are always some positive voltage level.

        R Offline
        R Offline
        rtavakko
        wrote on 12 Aug 2020, 11:20 last edited by
        #4

        @Kent-Dorfman Thanks a lot for your response! Yes, the writeData method gives me a buffer of data which has a format already determined by the QAudioFormat of the device supplying it (including unsigned / signed).

        I understand the need to convert to a logarithmic scale and do this at a later stage when I take the FFT of the data but I need a reference amplitude for that conversion and I'm not sure what I need to use for that.

        One thing I noticed is that you can set the data type (unsigned / signed) of the QAudioFormat but whether or not it actually sets will depend on if that setting is supported by the device. I'll try messing around with that to see if it does something useful.

        1 Reply Last reply
        0
        • R Offline
          R Offline
          rtavakko
          wrote on 26 Aug 2020, 00:22 last edited by
          #5

          Any more thoughts on this guys? I'm stuck on this

          1 Reply Last reply
          0
          • S Offline
            S Offline
            SGaist
            Lifetime Qt Champion
            wrote on 26 Aug 2020, 18:50 last edited by
            #6

            Hi,

            You might want to check the DSPfilters. It might offers you what you need.

            Interested in AI ? www.idiap.ch
            Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

            R 1 Reply Last reply 6 Sept 2020, 21:02
            1
            • S SGaist
              26 Aug 2020, 18:50

              Hi,

              You might want to check the DSPfilters. It might offers you what you need.

              R Offline
              R Offline
              rtavakko
              wrote on 6 Sept 2020, 21:02 last edited by
              #7

              @SGaist Thanks for that link! The library looks cool, I'm going through the source to see how they did things but I'm trying to build my own little audio engine.
              Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

              S 1 Reply Last reply 7 Sept 2020, 18:42
              0
              • R rtavakko
                6 Sept 2020, 21:02

                @SGaist Thanks for that link! The library looks cool, I'm going through the source to see how they did things but I'm trying to build my own little audio engine.
                Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                S Offline
                S Offline
                SGaist
                Lifetime Qt Champion
                wrote on 7 Sept 2020, 18:42 last edited by
                #8

                @rtavakko said in Normalizing PCM Audio:

                Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                I am sorry but I am not sure to understand exactly what you are looking for.

                Interested in AI ? www.idiap.ch
                Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                R 1 Reply Last reply 8 Sept 2020, 00:49
                0
                • S SGaist
                  7 Sept 2020, 18:42

                  @rtavakko said in Normalizing PCM Audio:

                  Do you know by any chance if there is a set standard for how an audio signal is supposed to be processed, as in the midpoint, min and max levels?

                  I am sorry but I am not sure to understand exactly what you are looking for.

                  R Offline
                  R Offline
                  rtavakko
                  wrote on 8 Sept 2020, 00:49 last edited by rtavakko 9 Aug 2020, 00:51
                  #9

                  @SGaist The issue I'm stuck solving is that I'm not sure where a 'silent' audio signal that is represented as signed int data should be.

                  For example 16-bit WAV files are in the range -32,768 to 32,767 and 0 is the midpoint but the 8-bit signal that I get from a microphone or other live feeds are in the -128 to 127 range with -128 also being the midpoint (silent signal).

                  So I'm stuck trying to find a universal way to normalize audio to the 0 - 1 range with 0.5 being the midpoint. Anything I've read so far suggests that 0 should always be the midpoint for signed audio but I don't know for sure at this point

                  K 1 Reply Last reply 8 Sept 2020, 03:15
                  0
                  • R rtavakko
                    8 Sept 2020, 00:49

                    @SGaist The issue I'm stuck solving is that I'm not sure where a 'silent' audio signal that is represented as signed int data should be.

                    For example 16-bit WAV files are in the range -32,768 to 32,767 and 0 is the midpoint but the 8-bit signal that I get from a microphone or other live feeds are in the -128 to 127 range with -128 also being the midpoint (silent signal).

                    So I'm stuck trying to find a universal way to normalize audio to the 0 - 1 range with 0.5 being the midpoint. Anything I've read so far suggests that 0 should always be the midpoint for signed audio but I don't know for sure at this point

                    K Offline
                    K Offline
                    Kent-Dorfman
                    wrote on 8 Sept 2020, 03:15 last edited by Kent-Dorfman 9 Aug 2020, 05:34
                    #10

                    @rtavakko

                    For PCM audio a silent signal is represented by a a contiuous stream of values that are the same. It is the changes in amplitude that form the sound waveform. You can have silence at any output amplitude if the sample values don't change. Obviously any changes to those values create a waveform. So you cannot look for silence in the method you are thinking.

                    If you use 8khz as your carrier and create a u16 stream of shorts such as

                    16384,0,16384,0,16384,0... then you will get a loud 8khz (harsh) tone.

                    1000,0,1000,0,1000,0... give the same harsh 8khz tone, but at a greatly diminished volume.

                    any stream of x,x,x,x,x,x,x... will create silence.

                    download and play with audacity, and programmatically create audio files to experiment with different effects: sin, square, sawtooth waveforms of different amplitudes.

                    EDIT - actually I screwed up. If the sample rate is 8khz, then you can only reproduce frequencies up to 4khz, since it's the change that forms the wave, not the data points themselves.

                    R 1 Reply Last reply 20 Sept 2020, 18:38
                    0
                    • K Kent-Dorfman
                      8 Sept 2020, 03:15

                      @rtavakko

                      For PCM audio a silent signal is represented by a a contiuous stream of values that are the same. It is the changes in amplitude that form the sound waveform. You can have silence at any output amplitude if the sample values don't change. Obviously any changes to those values create a waveform. So you cannot look for silence in the method you are thinking.

                      If you use 8khz as your carrier and create a u16 stream of shorts such as

                      16384,0,16384,0,16384,0... then you will get a loud 8khz (harsh) tone.

                      1000,0,1000,0,1000,0... give the same harsh 8khz tone, but at a greatly diminished volume.

                      any stream of x,x,x,x,x,x,x... will create silence.

                      download and play with audacity, and programmatically create audio files to experiment with different effects: sin, square, sawtooth waveforms of different amplitudes.

                      EDIT - actually I screwed up. If the sample rate is 8khz, then you can only reproduce frequencies up to 4khz, since it's the change that forms the wave, not the data points themselves.

                      R Offline
                      R Offline
                      rtavakko
                      wrote on 20 Sept 2020, 18:38 last edited by
                      #11

                      @Kent-Dorfman I understand the concept but I'm still not sure how I would go about normalizing the signal. I'm still processing the signal as it comes in as an instantaneous set of values. Do I need to compare each value in the array to the previous one and set it to the lower limit of a dBFS scale if they are equal?

                      1 Reply Last reply
                      0
                      • R Offline
                        R Offline
                        rtavakko
                        wrote on 27 Sept 2020, 20:01 last edited by
                        #12

                        Still trying to figure this out. Any thoughts on converting to the right scale (log scale seems to be appropriate) would help

                        1 Reply Last reply
                        0
                        • R Offline
                          R Offline
                          rtavakko
                          wrote on 28 Dec 2020, 20:09 last edited by
                          #13

                          After a few months of searching for an definitive answer to this topic, I've reached the conclusion that my original assumption would be correct. This page describes how to determine the midpoint of a standard PCM audio signal:

                          https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d

                          For example an 8-bit signed PCM signal has these ranges:

                          Min: -128
                          Max: 128
                          Midpoint: 0

                          As to why the signal I'm getting from my soundcard sits at -128 when there is no sound, I'm going to assume that this is related to a driver problem or could be that this particular piece of hardware does not follow the PCM standard.

                          Converting to the logarithmic scale in my understanding is not related to this issue because you should be able to normalize the signal in time-domain even though eventually you will most likely need to convert it to the log scale if you are doing anything in the frequency domain (e.g. FFT).

                          If anyone has any input, please feel free to add it.

                          1 Reply Last reply
                          0

                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Users
                          • Groups
                          • Search
                          • Get Qt Extensions
                          • Unsolved