[Libav-user] How to get raw audio frames from an audio file

Discussion:

Matthieu Regnauld

2018-10-20 10:30:23 UTC

Hello,

I try to extract raw audio frames from an audio file, using the following
example: http://ffmpeg.org/doxygen/3.4/decode_audio_8c-example.html

As far as I understand, I get these frames in the decode() function, in the
frame->data array.

That said, I need to convert the frames into floats between -1 and 1. Here
is what I tried (in my test, dataSize == 4):

// here is where I want to get my frame:
float myFrame;

// first try:
memcpy(&myFrame, &frame->data[ch][i], dataSize * sizeof(uint8_t));

// second try:
myFrame = (frame->data[ch][i]<<0) | (frame->data[ch][i + 1]<<8) |
(frame->data[ch][i + 2]<<16) | (frame->data[ch][i + 3]<<24);

// third try:
myFrame = (frame->data[ch][i + 3]<<0) | (frame->data[ch][i + 2]<<8) |
(frame->data[ch][i + 1]<<16) | (frame->data[ch][i]<<24);

But the problem is; it doesn't work, all I got so far is some kind of white
noice.

So what is the proper way to extract audio frames and convert them to float
amplitudes?

Thanks for your help.

Finalspace

2018-10-20 14:11:51 UTC

Permalink

Hi,

the example you are following just writes the frame out in raw format,
without doing any kind of sample rate or format conversion.

To convert a sample you need to check against the sample format
(dec_ctx->sample_fmt) and implement the right conversion.
For example if your audio stream uses a sample format of S16 (Signed
16-bit integer), converting that to float is really easy (Just divide by
max int16):

Â Â Â int sampleStride = av_get_bytes_per_sample(dec_ctx->sample_fmt);
Â Â Â for (int sampleIndex = 0; sampleIndex < frame->nb_samples;
++sampleIndex) {
Â Â Â Â Â Â for (int channel = 0; channel < dec_ctx->channels; ++channel) {
Â Â Â Â Â Â Â // For signed 16-bit integer to float conversion
Â Â Â Â Â Â Â Â Â if (dec_ctx->sample_fmt == AV_SAMPLE_FMT_S16) {
Â Â Â Â Â Â Â Â Â Â Â Â int16_t *inputPtr = (int16_t *)(frame->data[ch] +
sampleStride * sampleIndex);
Â Â Â Â Â Â Â Â Â Â Â Â int16_t inputSample = *inputPtr;
Â Â Â Â Â Â Â Â Â Â Â Â float outputSample;
Â Â Â Â Â Â Â Â Â Â Â Â if (inputSample < 0) {
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â outputSample = inputSample / (float)(INT16_MAX - 1);
Â Â Â Â Â Â Â Â Â Â Â Â } else {
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â outputSample = inputSample / (float)INT16_MAX;
Â Â Â Â Â Â Â Â Â Â Â Â }
Â Â Â Â Â Â Â Â Â }
Â Â Â Â Â Â }
Â Â Â }

Hope this helps.

Greetings,
Final

Post by Matthieu Regnauld
Hello,
I try to extract raw audio frames from an audio file, using the
http://ffmpeg.org/doxygen/3.4/decode_audio_8c-example.html
As far as I understand, I get these frames in the decode() function,
in the frame->data array.
That said, I need to convert the frames into floats between -1 and 1.
float myFrame;
memcpy(&myFrame, &frame->data[ch][i], dataSize * sizeof(uint8_t));
myFrame = (frame->data[ch][i]<<0) | (frame->data[ch][i + 1]<<8) |
(frame->data[ch][i + 2]<<16) | (frame->data[ch][i + 3]<<24);
myFrame = (frame->data[ch][i + 3]<<0) | (frame->data[ch][i + 2]<<8) |
(frame->data[ch][i + 1]<<16) | (frame->data[ch][i]<<24);
But the problem is; it doesn't work, all I got so far is some kind of
white noice.
So what is the proper way to extract audio frames and convert them to
float amplitudes?
Thanks for your help.
_______________________________________________
Libav-user mailing list
http://ffmpeg.org/mailman/listinfo/libav-user

Carl Eugen Hoyos

2018-10-20 18:33:26 UTC

Permalink

Post by Finalspace
the example you are following just writes the frame out in raw format,
without doing any kind of sample rate or format conversion.
To convert a sample you need to check against the sample format
(dec_ctx->sample_fmt) and implement the right conversion.
For example if your audio stream uses a sample format of S16 (Signed
16-bit integer), converting that to float is really easy (Just divide by

Note that libswresample should be able to do this conversion
significantly faster.
Also note that (just like the default decoder for a given file is not
part of the API) decoders in different versions of FFmpeg may
output different sample formats, the sample formats have
changed in the past (without much information for you) and may
change in the future: You always have to check the sample format
of the decoder in your code!

Carl Eugen

Matthieu Regnauld

2018-10-21 10:37:02 UTC

Permalink

Thanks a lot for your help.

That said, I still haven't managed to solve my problem, and I'm new to
FFMpeg.

My final goal is to be able to extract raw audio frames from an OGG file on
the fly (and, if possible, from any other format), and convert them in an
array of float amplitudes (between -1 and 1).

Here is a copy of the code that I'm trying to make work:
https://gist.github.com/mregnauld/2538d98308ad57eb75cfcd36aab5099a

I initiate my player this way:
FFMpegPlayer* ffMpegPlayer = new FFMpegPlayer();
ffMpegPlayer->createFFmpeg("/path/to/my/file.ogg");

And later on, when I need audio samples, I do this way, and I redirect the
buffer directly to the audio output (it for an Android app):
float *buffer;
ffMpegPlayer->getPcmFloat(buffer);

I still have white noice so far, but I can hear a little bit the music
(actually I guess it more that I can hear it), which makes me think that
I'm close to the solution.

What should I change in my code to get the proper float amplitudes?

Thanks for your help.

Finalspace

2018-10-21 13:42:57 UTC

Permalink

The only thing you need to change, is to divide by INT16_MAX - not
INT32_MAX, because in your software conversion context you use S16 as
your target format. This should give you -1 to 1 range from a -32767 to
+32767 amplitude. Therefore your variable "myFrame" should be int16_t
instead.

Also the memcpy part is wrong:
memcpy(&myFrame, &localBuffer[ch * i], dataSize * sizeof(uint8_t));

Correct is:
memcpy(&myFrame, &localBuffer[i * ch * dataSize], dataSize); // *
sizeof(uint8_t) is useless, its always and will ever be 1

Also here are a few tips:

1.) Do not use, global/static variables outside of a class for no
reason. Just move the variables out from the global space into class
members. If this needs to be static then use static members instead.

2.) Short variables names re bad - use long names, it does not hurt to
write "sampleIndex" instead of "i". This way others and propably can
understand the code much better.

3.) Fix bad naming of variables:

myFrame -> conversionSampleS16
dataSize -> sourceSampleSize
localBuffer -> conversionBuffer
buffer -> targetFloatBuffer

In addition it does not hurt to add the type to the variable names as
well - especially when writing audio code.

4.) You have no idea if the buffer passed to getPcmFloat() fits the
samples you wanna write. It would be to pass a size_t bufferLength as well.

Post by Matthieu Regnauld
Thanks a lot for your help.
That said, I still haven't managed to solve my problem, and I'm new to
FFMpeg.
My final goal is to be able to extract raw audio frames from an OGG
file on the fly (and, if possible, from any other format), and convert
them in an array of float amplitudes (between -1 and 1).
https://gist.github.com/mregnauld/2538d98308ad57eb75cfcd36aab5099a
FFMpegPlayer* ffMpegPlayerÂ = new FFMpegPlayer();
ffMpegPlayer->createFFmpeg("/path/to/my/file.ogg");
And later on, when I need audio samples, I do this way, and I redirect
float *buffer;
ffMpegPlayer->getPcmFloat(buffer);
I still have white noice so far, but I can hear a little bit the music
(actually I guess it more that I can hear it), which makes me think
that I'm close to the solution.
What should I change in my code to get the proper float amplitudes?
Thanks for your help.
_______________________________________________
Libav-user mailing list
http://ffmpeg.org/mailman/listinfo/libav-user

Matthieu Regnauld

2018-10-21 20:44:20 UTC

Permalink

Thank you so much for your help, it works much better, I can clearly hear
the music now!

Also, about your tips, it's just for the example, to make the code easier
to read (but I agree with you).

That said, I made some changes in the code, and I still have a few more
questions:

1 - Even if the sound is much better, it's unfortunately still bad: on the
left channel, there is only crackling, while on the right channel, the
sound is much better, but still with some crackling too, even if the sound
doesn't saturate. Also, the sound is low-pitched (only if I use
AV_SAMPLE_FMT_FLT for swr_alloc_set_opts()). Where could that come from?

2 - My sound is encoded in 44100 Hz, while my device expects sound in 48000
Hz. I think that there is a command in FFMpeg that allows upscaling (not
sure about the term), i.e. provide 48000 frames per second from a file
encoded in 44100 Hz, for example. How can I achieve that?

3 - *" You have no idea if the buffer passed to getPcmFloat() fits the
samples you wanna write. It would be to pass a size_t bufferLength as
well."* : I agree, and again, I provide this code to make it easy to
understand, but I found a workaround for that. That said, if it's possible
to ask FFMpeg to extract a specific number of frames, I'm interested. Is it
possible, and if yes, how?

Thanks again for your help.

Matthieu Regnauld

2018-10-23 15:59:35 UTC

Permalink

Thank you for your help.

That said, I'm new to FFMpeg, and I still struggle.
Could you tell me what I have to fix in my code to have a clear sound and
also, if possible, how to resample it (from 44100 Hz to 48000 Hz, for
example)?

Here is my code:
https://gist.github.com/mregnauld/2538d98308ad57eb75cfcd36aab5099a

Thank you.

Gonzalo Garramuño

2018-10-23 17:32:32 UTC

Permalink

Post by Matthieu Regnauld
Thank you for your help.
That said, I'm new to FFMpeg, and I still struggle.
Could you tell me what I have to fix in my code to have a clear sound
and also, if possible, how to resample it (from 44100 Hz to 48000 Hz,
for example)?
https://gist.github.com/mregnauld/2538d98308ad57eb75cfcd36aab5099a
Thank you.

First, you need to set swrContext to NULL at the beginning, like:

AVFrame *frame = NULL; SwrContext *swrContext = NULL;

Then, here you should set the output frequency:

int out_sample_rate = 48000;
swr_alloc_set_opts(swrContext, AV_CH_LAYOUT_STEREO, AV_SAMPLE_FMT_FLT,
out_sample_rate,
Â Â Â Â Â Â Â Â Â Â Â codecContext->channel_layout,
codecContext->sample_fmt, codecContext->sample_rate, 0,
Â Â Â Â Â Â Â Â Â Â Â Â NULL);

Then, you should not need to memcpy anything as that's what swr_convert
should do for you.Â However, you might want to set your buffer bigger
than two channels as avcodec_receive_frame might return multiple frames
of sound.Â By doing that, you won't have to worry about overrunning the
buffer.Â Also, you might want to use extended_data, for formats that
have more than 4 channels.

// Somewhere else
localBuffer = av_malloc( sizeof(float) * out_sample_rate * nb_samples +
padding);

//

swr_convert(swrContext, (uint8_t**)&localBuffer, frame->nb_samples,
(const uint8_t **) frame->extended_data, frame->nb_samples);

--
Gonzalo GarramuÃ±o

Matthieu Regnauld

2018-10-23 21:35:35 UTC

Permalink

Thank you again for your help!

OK so I try to follow the example given in the "Detailed Description" part
of the Libswresample documentation:
https://www.ffmpeg.org/doxygen/2.3/group__lswr.html
And I updated my code:
https://gist.github.com/mregnauld/2538d98308ad57eb75cfcd36aab5099a

As a reminder, in my example, I try to resample and extract audio samples
from an audio file (44100 Hz, 2 channels) to have them in the following
format: 48000 Hz, 2 channels.

But unfortunately, I still don't manage to make it work: the sound is a bit
faster than its regular speed, and still with quite a lot of crackling.

What wrong in my code?

Thanks for your help.

Gonzalo Garramuño

2018-10-23 21:54:40 UTC

Permalink

Post by Matthieu Regnauld
What wrong in my code?

How are you playing your sound? Does ffmpeg (the command-line utility)
show a similar behavior?

--
Gonzalo Garramuño

Matthieu Regnauld

2018-10-23 21:59:57 UTC

Permalink

My sound is played in an Android app (using Android NDK).
I compiled FFMpeg for Android to make the code work.

I haven't tried the command-line utility yet.

Matthieu Regnauld

2018-10-23 22:21:17 UTC

Permalink

Also, here is the actual class:
https://github.com/mregnauld/AudioPlayerDemo/blob/master/app/src/main/cpp/AudioPlayer.cpp

You'll also find the complete sample Android project I'm working on.

Gonzalo Garramuño

2018-10-24 00:53:15 UTC

Permalink

Post by Matthieu Regnauld
https://github.com/mregnauld/AudioPlayerDemo/blob/master/app/src/main/cpp/AudioPlayer.cpp
You'll also find the complete sample Android project I'm working on.

Dear Matthieu,

I seem to have sent you astray with more complex code than you should
have to deal with. In the docs/examples/ of ffmpeg you will find a file
called transcode_aac.c which is a full transcoder to any format. It is
very well documented and has very clean code (kudos to whoever wrote
it). Try compiling it and see if you can make it export pcm by frame
extension (not likely) or by modifying the source.

--
Gonzalo Garramuño

Tuukka Pasanen

2018-10-24 11:06:34 UTC

Permalink

Hello,

My tut is bit rusty and rough but show how to encode and resample with
avresample or swresample to anything.

https://github.com/illuusio/ffmpeg-example

Haven't tested it with FFMPEG 4.0 lot but it should work.

Sincerely,

Tuukka

Post by Matthieu Regnauld
https://github.com/mregnauld/AudioPlayerDemo/blob/master/app/src/main/cpp/AudioPlayer.cpp
You'll also find the complete sample Android project I'm working on.
_______________________________________________
Libav-user mailing list
http://ffmpeg.org/mailman/listinfo/libav-user