[Libav-user] Handling of 24 bit audio in libav* and libswresample

Discussion:

Hendrik Schreiber

2013-06-04 11:29:38 UTC

Hello everybody:

I'm working on a little (java) library for decoding audio using FFmpeg/libav* and have some questions regarding the handling of 24 bit audio.

1. (SHIFTING) When decoding, 24bit audio is apparently shifted, i.e. 24bit become 32bit, as there is no 24bit AVSampleFormat. Am I right to assume that the data is shifted toward the most significant byte? I.e. the most significant 3 bytes are the same as the original 24bit?
Or is the most significant byte simply "sign-extended" and the three least significant bytes are the original 24bit?

2. (SWRESAMPLE) I'm using libswresample to, well, resample data, get rid of planar formats etc. It's working great. libswresample also accepts AVSampleFormat parameters for input and output format. This implies that it does not support any conversion to true 24bit, represented by 3 bytes. Correct?

3. (CODEC) What is the recommend way to produce 24bit audio? After decoding (and potentially resampling), should I use the corresponding codec (e.g. AV_CODEC_ID_PCM_S24LE) to produce the data in the format I'm interested in? Or is there another, better way?

Thanks in advance.

-hendrik

Paul B Mahol

2013-06-04 11:34:43 UTC

Permalink

Post by Hendrik Schreiber
I'm working on a little (java) library for decoding audio using
FFmpeg/libav* and have some questions regarding the handling of 24 bit
audio.
1. (SHIFTING) When decoding, 24bit audio is apparently shifted, i.e. 24bit
become 32bit, as there is no 24bit AVSampleFormat. Am I right to assume that
the data is shifted toward the most significant byte? I.e. the most
significant 3 bytes are the same as the original 24bit?
Or is the most significant byte simply "sign-extended" and the three least
significant bytes are the original 24bit?
2. (SWRESAMPLE) I'm using libswresample to, well, resample data, get rid of
planar formats etc. It's working great. libswresample also accepts
AVSampleFormat parameters for input and output format. This implies that it
does not support any conversion to true 24bit, represented by 3 bytes.
Correct?
3. (CODEC) What is the recommend way to produce 24bit audio? After decoding
(and potentially resampling), should I use the corresponding codec (e.g.
AV_CODEC_ID_PCM_S24LE) to produce the data in the format I'm interested in?
Or is there another, better way?

There should be dithering applied, see output_sample_bits option.

Post by Hendrik Schreiber
Thanks in advance.
-hendrik
_______________________________________________
Libav-user mailing list
http://ffmpeg.org/mailman/listinfo/libav-user

Hendrik Schreiber

2013-06-07 10:40:17 UTC

Permalink

Post by Paul B Mahol

Post by Hendrik Schreiber
1. (SHIFTING) When decoding, 24bit audio is apparently shifted, i.e. 24bit
become 32bit, as there is no 24bit AVSampleFormat. Am I right to assume that
the data is shifted toward the most significant byte? I.e. the most
significant 3 bytes are the same as the original 24bit?
Or is the most significant byte simply "sign-extended" and the three least
significant bytes are the original 24bit?

The first statement is true.

I did some tests and all libav does it shift the data toward the most significant byte. I.e. the least significant byte is 0. This means, that one has to apply dithering, *if* one wants to use this 4 bytes representation for anything other than extracting the most significant three bytes.

If one just wants to dump 3-byte for each 24bit sample, one has to simply cut off that extra byte added before (encoding with, AV_CODEC_ID_PCM_S24LE see below). No dithering necessary.

Post by Paul B Mahol

Post by Hendrik Schreiber
2. (SWRESAMPLE) I'm using libswresample to, well, resample data, get rid of
planar formats etc. It's working great. libswresample also accepts
AVSampleFormat parameters for input and output format. This implies that it
does not support any conversion to true 24bit, represented by 3 bytes.
Correct?

Yes. I fiddled with it some more. swresample does not support any true 24bit (i.e. 3byte per sample) output. It works strictly on the intermediate dataformats defined in AVSampleFormat.

Post by Paul B Mahol

Post by Hendrik Schreiber
3. (CODEC) What is the recommend way to produce 24bit audio? After decoding
(and potentially resampling), should I use the corresponding codec (e.g.
AV_CODEC_ID_PCM_S24LE) to produce the data in the format I'm interested in?
Or is there another, better way?

There should be dithering applied, see output_sample_bits option.

I guess there is no other way (expect for perhaps filtering). It turns out that to produce 24bit audio in true a 24bit format, one has to use an appropriate encoder, e.g. AV_CODEC_ID_PCM_S24LE for signed 24bit little endian.

Dithering is only necessary, when converting the data somewhere in between (e.g. changing the sample rate while it's in 32bit format), as the code in pcm.c (macro ENCODE) simply shifts the 32bit representation by 8bit, essentially just dropping the last 8bits.

Since I got the answers to all my questions - I figured, I might as well post them. Hope it's useful to someone else.

Cheers,

-hendrik

Paul B Mahol

2013-06-07 11:12:02 UTC

Permalink

Post by Hendrik Schreiber

Post by Paul B Mahol

The first statement is true.
I did some tests and all libav does it shift the data toward the most
significant byte. I.e. the least significant byte is 0. This means, that one
has to apply dithering, *if* one wants to use this 4 bytes representation
for anything other than extracting the most significant three bytes.
If one just wants to dump 3-byte for each 24bit sample, one has to simply
cut off that extra byte added before (encoding with, AV_CODEC_ID_PCM_S24LE
see below). No dithering necessary.

Post by Paul B Mahol

Yes. I fiddled with it some more. swresample does not support any true 24bit
(i.e. 3byte per sample) output. It works strictly on the intermediate
dataformats defined in AVSampleFormat.

Post by Paul B Mahol

There should be dithering applied, see output_sample_bits option.

Note that dithering should be done when doing 32bit to 24bit case
and source audio have >24bits used.

Post by Hendrik Schreiber
Dithering is only necessary, when converting the data somewhere in between
(e.g. changing the sample rate while it's in 32bit format), as the code in
pcm.c (macro ENCODE) simply shifts the 32bit representation by 8bit,
essentially just dropping the last 8bits.

Because last 8bits are always zero for that particular case.

Post by Hendrik Schreiber
Since I got the answers to all my questions - I figured, I might as well
post them. Hope it's useful to someone else.
Cheers,
-hendrik
_______________________________________________
Libav-user mailing list
http://ffmpeg.org/mailman/listinfo/libav-user

Hendrik Schreiber

2013-06-11 08:50:07 UTC

Permalink

Post by Paul B Mahol
Note that dithering should be done when doing 32bit to 24bit case
and source audio have >24bits used.

Yes - definitely.

Post by Paul B Mahol

Because last 8bits are always zero for that particular case.

Exactly.

Thanks, Paul, for adding the extra clarifications for the more general case.

I saw that you added some more documentation for output_sample_bits (http://patches.libav.org/patch/38964/).
Unfortunately, I'm still not entirely sure how to use the parameter - a simple example would be great.

Let's say I want to convert 32bit audio to 24bit.
The input AVSampleFormat is AV_SAMPLE_FMT_S32, the output sample format as well.
But, when writing the result to a file, I want to use AV_CODEC_ID_PCM_S24LE, i.e. the least significant byte is cut off.

Therefore, for SWR, I'm calling:

av_opt_set_int(swr_context, "dither_method", SWR_DITHER_TRIANGULAR, 0);
av_opt_set_int(swr_context, "output_sample_bits", ???, 0);

What should I set ??? to?
24, since I'm only using 24bit?

Thanks,

-hendrik

Carl Eugen Hoyos

2013-06-11 09:31:30 UTC

Permalink

Post by Hendrik Schreiber
But, when writing the result to a file, I want to use
AV_CODEC_ID_PCM_S24LE, i.e. the least significant
byte is cut off.

AV_CODEC_ID_PCM_S24LE encoder only accepts
AV_SAMPLE_FMT_S32 as input.

Carl Eugen

Hendrik Schreiber

2013-06-11 09:40:41 UTC

Permalink

Post by Carl Eugen Hoyos

Post by Hendrik Schreiber
But, when writing the result to a file, I want to use
AV_CODEC_ID_PCM_S24LE, i.e. the least significant
byte is cut off.

AV_CODEC_ID_PCM_S24LE encoder only accepts
AV_SAMPLE_FMT_S32 as input.

The example I gave provides AV_SAMPLE_FMT_S32 as input, so that should not be a problem.

-hendrik

Paul B Mahol

2013-06-11 10:26:46 UTC

Permalink

Post by Hendrik Schreiber

Post by Paul B Mahol
Note that dithering should be done when doing 32bit to 24bit case
and source audio have >24bits used.

Yes - definitely.

Post by Paul B Mahol

Because last 8bits are always zero for that particular case.

Exactly.
Thanks, Paul, for adding the extra clarifications for the more general case.
I saw that you added some more documentation for output_sample_bits
(http://patches.libav.org/patch/38964/).
Unfortunately, I'm still not entirely sure how to use the parameter - a
simple example would be great.
Let's say I want to convert 32bit audio to 24bit.
The input AVSampleFormat is AV_SAMPLE_FMT_S32, the output sample format as well.
But, when writing the result to a file, I want to use AV_CODEC_ID_PCM_S24LE,
i.e. the least significant byte is cut off.
av_opt_set_int(swr_context, "dither_method", SWR_DITHER_TRIANGULAR, 0);
av_opt_set_int(swr_context, "output_sample_bits", ???, 0);
What should I set ??? to?
24, since I'm only using 24bit?

Dunno, but if it does not work try to set dither_scale too.

If it doesn't work feel free to open bug ticket, and/or bump this thread.

Post by Hendrik Schreiber
Thanks,
-hendrik
_______________________________________________
Libav-user mailing list
http://ffmpeg.org/mailman/listinfo/libav-user

Robert Krüger

2013-06-07 11:46:31 UTC

Permalink

Post by Hendrik Schreiber

Post by Paul B Mahol

The first statement is true.
I did some tests and all libav does it shift the data toward the most significant byte. I.e. the least significant byte is 0. This means, that one has to apply dithering, *if* one wants to use this 4 bytes representation for anything other than extracting the most significant three bytes.
If one just wants to dump 3-byte for each 24bit sample, one has to simply cut off that extra byte added before (encoding with, AV_CODEC_ID_PCM_S24LE see below). No dithering necessary.

Post by Paul B Mahol

Yes. I fiddled with it some more. swresample does not support any true 24bit (i.e. 3byte per sample) output. It works strictly on the intermediate dataformats defined in AVSampleFormat.

Post by Paul B Mahol

There should be dithering applied, see output_sample_bits option.

I guess there is no other way (expect for perhaps filtering). It turns out that to produce 24bit audio in true a 24bit format, one has to use an appropriate encoder, e.g. AV_CODEC_ID_PCM_S24LE for signed 24bit little endian.
Dithering is only necessary, when converting the data somewhere in between (e.g. changing the sample rate while it's in 32bit format), as the code in pcm.c (macro ENCODE) simply shifts the 32bit representation by 8bit, essentially just dropping the last 8bits.
Since I got the answers to all my questions - I figured, I might as well post them. Hope it's useful to someone else.

much appreciated. Thank you!