durani01 wrote:"Consider an application with no jitter buffer... packets would be sent by user A to user B. User B would play the audio as soon as it arrives. However, if there is congestion in the network the audio will cut out for a moment while endpoint B waits for a new audio packet. Thus, you have a buffer underrun problem. When that packet does arrive, it would probably be too late to play it given the assumption one is running on a fixed clock."
Means after fixed clock duration, received audio packets will be drop.
When playing audio, there is a clock. The clock is always running and the media layer in a VoIP system must ensure that there is audio playing at all times. If the media layer plays audio packets immediately upon receipt and without introducing some delay to accommodate network jitter, then audio will drop out. Anytime you don't have audio to play, that's a problem.
durani01 wrote:"Suppose a new packet experiences 15ms more delay than the previous packet. It does not matter: the packet is put into the queue and will be played when it's time"
Means last packet received after e.g. with 20ms delay and very next packet received after 35ms delay.Why it does not matter?
What I meant was that it does not matter if a packet is late if you have a good jitter buffer, because it will be placed in the buffer and played at the right time. With AMR-WB, I think, all packets represent 20ms of audio. (If that's not true, feel free to correct me.)
So, let's assume you have this buffer:
[0][1][2]
Each position in the buffer represents 20ms of audio, so we have 60ms of audio buffered. We then start playing the audio in [0]. Is that is playing a new packet arrives, so the buffer contains:
[1][2][3][4]
We pulled [0] out since it's being played. Immediately when done, we start playing [1]. So, the buffer looks like this:
[2][3][4]
Now, suppose there is network congestion and things are backed up for 30 ms. We finish playing [1] and start playing [2]. So, the buffer has this:
[3][4]
We no longer have 60ms of audio buffered, only 40ms buffered.
If there is severe congestion, perhaps there might be even more delay and we start playing [3]. The buffer might have a single frame inside:
[4]
At some point, the late-arriving packets that should have been sent will arrive and be placed into the buffer:
[4][5][6]
So, what I meant was that it does not matter if packets are a little late. They will be inserted into the buffer once they arrive. Now, we're back to a 60ms buffer.
If a packet is lost entirely *or* if the packet arrives so late that it should have been played already, those packets must be discarded and the audio needs to be smoothed from one 20ms frame to another. This technique is called "packet loss concealment" (PLC). That's a whole other topic, though. Still, you might want to explore concealment techniques for AMR-WB.
durani01 wrote:"If packets arrive out of order, no big deal: they are ordered as they are inserted into the jitter buffer. (RTP defines a sequence number to aid with this.)"
Means before entering in to the jitter buffer the RTP will sequence all the packets.
Correct. As with the example above where [5] and [6] arrived late, they could have arrived in the order [6] [5]. These need to be re-ordered before inserting them into the jitter buffer.
durani01 wrote:"Suppose you assume that a "safe" jitter buffer size is 40ms"
Means 20ms duration 2 packets can be accommodated in 40ms buffer (OR) The packet which is 40ms delay can be accommodate in buffer.
This is the number of 20ms frames you feel you need to buffer in order to ensure that you do not run out of packets to play.
durani01 wrote:"the objective is to not run out of audio packets to play."
Means too much late packets cannot be play.
Right. If the packets come too late and you did not introduce local delay via the jitter buffer, then late-arriving packets cannot be played and will just be discarded. It's better to introduce a little delay than to throw away good audio.
durani01 wrote:Questions:
Q 1. If the late packets are not going to be played how we will accommodate empty space?
Try to maintain a buffer long enough to handle late packets. However, if a packet is significantly delayed such that it is too late to play the packet, then you have to discard it. If this is a statistical anomaly, you don't worry about it. If this happens frequently, it suggests that you need to increase your buffer size.
If you end up where you have no audio to play, you need to employ some kind of PLC algorithm. Or, you have silence for a period of less than 20ms.
durani01 wrote:Q 2. How the B will know after how much time the first packet or later will be played?
Every packet will be played sequentially, with each being 20ms from the next. So, B knows when a packet should be played. B should create a buffer that can buffer enough audio.
durani01 wrote:Q 3. My assignment: To find the best approach for growing and shrinking this buffer such that you do not experience a buffer underrun. Means use different buffer algorithms e.g. Exponential Average, Fast exponential average, Minimum delay, Spike detection, Window, Gap Based. and compare the result?
Should I limit myself the above mentioned algorithms or what?
The approach you take is your own. Using AMR-WB, you also have the opportunity to reduce the bit-rate from 23.85Kbps down to 6.6Kbps, I think. So, an element of your work should probably be to consider not only resizing the jitter buffer, but reducing the transmission rate. Would it be better to get a 6.6Kbps flow with 40ms of delay and no packet loss or a 23.85Kbps flow with 120ms of delay and 3% packet loss?
These are the kinds of things you have to consider. I would personally try to take a relatively simply approach that would look at the inter-packet delay. Perhaps measure the delay between each packet, maintaining a history of the last 10 packets or so. On a perfect, non-existent network (where there is 0 delay) , the delay between each packet should be a constant 20ms. If you see 21ms or even 25 ms, that's no big deal. If you see 60ms, that suggests you need a longer jitter buffer. If the delay varies considerably, like 30ms, 60ms, 90ms, 45ms, etc., that really suggests you need a sizable jitter buffer. However, if you happen to see just one single packet with a much larger delay (e.g., 400ms), I would consider that a statistical outlier and would not adjust the buffer for that.
In any case, the mechanism is for you to come up with. I can only provide you some thoughts, but I cannot provide the best solution
durani01 wrote:Please find the attached proposal. This is the one which I wrote before our conversations. Kindly review it for necessary correction if possible.
Waiting for your swift and positive reply.
Regards
I really don't have time to read through that in detail. Perhaps others might provide you with feedback.
Paul