Digital Sound Processing - DSP (up-encoding analysis), page 3

TomMix 3daywarning on October 18th, 2007 / post 22056

Ziggy wrote:
i have exams all of this week, but as soon as i have time i want to read every word on here... (im studying this in school!!!! i love it)

so what?!

SpasV V.I.P. on January 13th, 2008 / post 23545

XM radio again?

Is there something new I could say?
Not very much and maybe not important for most of you.
But …
First of all let me say a couple of introductory words about a signal spectrum for those who are not familiar with it.
The sound is a perception caused by a sound wave - a wave of pressure that spreads through the air. (The sound wave spread not only through the air.)
So, there are two points to look at the sound:
1. One is the point of the human perception – I am not going it to this topic at all.
2. The second is the point of the physics, mathematics, and engineering. This is the point I am interested in.
The sound wave can generate a signal which can be treated as a function. In digital sound we are dealing with discrete functions. Discrete functions are sequences of samples taken from the continuous functions (analog signals) at equally spaced moments of time. A continuous function can be perfectly reconstructed by its discrete derivative under some conditions that are usually met.
The sound is very complex and for its studding a Spectrum Analysis is used. The Spectrum Analysis of discrete functions is based on FFT (Fast Fourier Transform is a method for fast and effective calculating the Fourier Transform of discrete functions) named in honor of the French mathematician Fourier who first proposed the idea for a complex function to be consider as a sum of simpler functions maybe around 200 years ago. The Fourier Transform is unique and reversible which means that if you have a discrete function its Fourier transform is unique and if you have a Fourier Transform of a function you can calculate the function back also. There are many such transform that are in use. One of them is MDCT (Modified Discrete Cosine Transform) which is used to compress the discrete functions and the sound in particular also.

Here is a spectrum of a sound. The sound is the track #11 from Tiesto’s album ISOS 5. I use this track because its source is reliable. In other words if a sound has a spectrum which is similar it should be consider good.

The spectrum is shown in logarithmic (the first one) and linear (the second one) scale.
Of course if we are interested in more details we could have them like this:

Now about XM radio broadcast.
What is known about it?
XM use AAC + v 2 as an encoding standard.
What does it mean? It means AAC (Advance Audio Coding), accepted in MPEG-4 also with m4a audio format, and SBR (Sub Band Replication).
The m4a sound format, and accordingly AAC, is already known. What is SBR? SBR is patented method for sound reconstruction using information about its spectrum.

Knowing these and something more about DSP (Digital Signal Processing) let us look at the sound spectrum of an XM broadcast.

Looking at the first spectrum we could say it is similar to the one above – of a CD track sound.
Looking at the second spectrum – it is not so similar. The essential differences are in the middle frequency range: 5,000 Hz – 15,000 Hz and in the high frequency range: 15,000 Hz – 22,050 Hz.

Let us look at them closer.

What is first seen is the slope in the middle frequency range. It is very important because if we have it almost flat, as it is in the ISOS spectrum, the high frequency range would be better also.
Let me say it. What can be done about making the sound better is just that – equalizing the spectrum. It is not a trivial task though. So, I left this topic aside for a while.
Second, there are drops in the spectrum at 5.5 kHz, at 11.0 kHz, and at 15.0 kHz. They are not natural and my understanding about them is:
The drop at 15.0 kHz is maybe due to SBR. In other words – the frequency range 20 Hz – 15 kHz of the signal is encoded using traditional approach (AAC encoding) while the high frequency ranges 15.0 kHz – 22.05 kHz using the SBR approach. Or the signal is first split in two frequency ranges: 20 Hz - 15 kHz and 15 kHz – 22.05 kHz.

Then …
Then the range – 20 Hz – 15 kHz is split in three ranges: 20 Hz - 5.5 kHz, 5.5 kHz – 11 kHz, and 11 kHz – 15 kHz. The reason for this my understanding is the two drops in the spectrum at 5.5 kHz and at 11 kHz. These drops are result of an imperfect signal reconstruction when using so called QMF (Quadrature Mirror Filter – a method for transfer data through a communication cannel). The method allows reducing the bit rate needed to transfer the data exploiting the decreasing spectrum amplitude vs. the frequency. To get more advantage of the method the spectrum is shaped with a slope about 20 dB/decade in the range 5.5 kHz – 15.0 kHz (this slope is missing in a CD track’s spectrum).
In other words this part of the spectrum is shaped to have the mention slope then split in three parts. Every part then quantized using actually an ACC encoding and because of that information is lost the signal spectrum has these drops when reconstructed again at the receiving side.

There was a question (TomMix has asked): what is the encoding bit rate? It is impossible for me to determine it at that time.
Because there are three sub bands in the range 20 Hz – 15 kHz to encode it is possible almost every thing and I do not have means to evaluate that. I have the reconstructed signal only and I cannot find reliable differences to distinguish when encoding and decoding a signal with spectrum bandwidth of 15 kHz @65 kbps or @400 kbps using AAC encoder. So, I think it is possible for the 20 Hz - 5.5 kHz band to be encoded @32-64 kbps, for the 5.5 kHz – 11 kHz band to be encoded @96 kbps – 128 kbps, and for the 11 kHz – 15 kHz band to be encoded @128 kbps – 192 kbps or something more suitable.

But what I can do is to equalize the signal spectrum.
Without going into details I do equalizing in 6 – 9 steps using the Sony Sound Forge.
Because the procedure is time consuming I am going to write a filter (a computer program) to process equalizing automatically.
The result of equalizing is shown next.

The first and the third are the ISOS 5 track’s spectra and the second and the fourth are the equalized XM radio’s spectra.
As it can be seen the signal spectrum of the XM radio broadcast now is very close to the spectrum of a CD track. It is almost very good. Maybe I’ll try to shape it better but I am not thinking of correcting the two drops at all.
Because of this spectrum I have for the XM tunes I encode my uploads as VBR m4a @320 kbps. Actually I use quality factor of 0.8 which results in bit rate around 320 kbps. The maximum quality factor of 1.0 encodes a CD track @400 kbps.

The spectrum shown here is from ASOT Year Mix 2007 - a mix I have uploaded at Megaupload.com.

Skype:spas.velev

SpasV V.I.P. on January 19th, 2008 / post 23644

Here the three spectra are on the same plot so, it is easier to compare them.

Skype:spas.velev

TomMix 3daywarning on May 31st, 2008 / post 25684

unbelievable spasv, im just speechless

ur a brave one and i will do my best to get this done.

my goal was to prove that it is useless to provide a mp4 audio file with a larger size than
a mp3 file of the very same source and it is known that the source is a lossy encoded
audio stream sending only near cd quality.
and ur still trying to say that the shitty x_m stream is worth the blown up file size,
sticking on some nice spectras and a dead french guy.

that proof was produced wihtin that topic! read above.

it is also self-explanatory that a new advanced codec is able to provide smaller files of the same source at the same quality.

so wtf is it with ur spectras?!

and a hint at last: the encoding bitrate is the filesize in bits divided by the duration in seconds ...

SpasV V.I.P. on January 25th, 2009 / post 28522

I need to say something about how I process my XM radio rips.
I am going to use power spectra so, a couple of words about them.

The sound as physical phenomenon is a wave which propagates through some medium - for example air.
With a device like a microphone the sound wave can generate an electrical signal and the signal can be viewed as a mathematical function of a time variable.
We work with discrete signals and their mathematical representation is a discrete function which means a function which values belong to a finite set of numbers and it is determined at discrete times. This way a finite length time interval sound can be represented by a finite set of numbers and can be processed by a computer.

The sound is a complex function and the Spectrum Analysis can be used to simplify its study. The common methods used are based on the Fourier Transform (FT) and its result is a mathematical reversible transform of the studied function. There exist different FT to use when working with discrete function: Discrete Time Fourier Transform (DTFT), Discrete Fourier Transform (DFT), Time Dependent Fourier Transform (TDFT) or Short Time Fourier Transform (STFT). Fast Fourier Transforms (FFT) are algorithms for effective calculation of DFT.

I will use TDFT which calculates an approximation of the DTFT but I will call it FT.
The DTFT is a complex function (it has real and imaginary parts) and can be represented by its Module and Phase functions. I will not use the Phase function because the method I use to process the sound does not change it.

A couple of words about the Power Spectrum (the Module function of the FT) function.
I will consider a digital sound as defined by Sony/Philips specification of the CD format:
The sound is represented by its samples taken at a rate of 44,100 samples/sec and the samples are 16 bits quantities. One bit is used for the sign to represent the two different periods of a periodic function and 15 bits - for the amplitude. So, the maximum sample amplitude is 32767.

The FT allows for a function to be considered as if composed of infinite sum of simple (sinusoidal) functions with different (infinite number) frequencies. The frequency range of the component functions is 0 - 22.05 kHz - according to the sampling frequency of 44.1 kHz. The Power Spectrum is a function of the frequency which shows sound power distribution vs. the frequency. Or it is a signal power density.

Here is an example of Power Spectrum of a digital sound How to understand it?

The horizontal axis is the frequency axis. As it is seen the frequency range is 20 Hz – 22,050 Hz (22.05 kHz) which corresponds to the low tonal frequency a human perceives and to the maximum tonal frequency that can be stored on a CD. The vertical axis is the Power Spectrum axis. The measurement units are decibels - dB.
A variable X measured in dB is calculated as 20log(X/Base) where the Base is the maximum value that can be present on the axis. In this case Base = 32767 and the maximum value on the axis is 0 dB. Another interesting value is -104 dB which is the power density of the quantization error. Or a signal component to be consider present in the Power Spectrum it should be greater than -104 dB.

This example shows a Power Spectrum calculated over a 60 minutes sound.
The first graph is a spectrum drawn in logarithmic scale which means the frequency segments of 10 times increase (100 Hz - 1,000 Hz, 1,000 Hz - 10,000 Hz) have the same length. The logarithmic scale is used when the spectrum over the full frequency range needs to be shown clearly.

The second graph is the same spectrum drawn using a linear scale. As it is seen the low frequency range (20 Hz – 800 Hz) is not clearly present.

What can we say about the example?
First of all we cannot say anything about the sound in some specific time interval although every sound sample can be calculated back using the FT.
Still we can make conclusions about the sound:
All possible frequency components up to 22.05 kHz are present in the spectrum although the graph crosses the -104 dB level at 21 kHz. It is not a considerable drawback because with a proper processing it can be changed.
The easiest characteristic that can be obtained is the sound spectrum bandwidth and it is a very representative characteristic because under the same other conditions the wider bandwidth the higher sound quality.
The spectrum bandwidth has not a strict definition. It depends on what signal power level is significant compared to the noise (or quantization noise) level. In this particular example the bandwidth can be assumed to be 21,050 - 20 = 21 kHz or 20 kHz or even 19 kHz depending on how you want to evaluate the high frequency components. If you want their influence on the sound to be stressed you need to use lower estimation. Let us assume this spectrum has bandwidth of 20 kHz although if the signal is to be processed the full spectrum of 22 kHz should be considered as long as all frequencies are present. So, this sound contains all audible frequency components (20 Hz – 20 kHz).

Another interesting characteristic is the sound Dynamic Range which is the difference between the maximum and minimum power density level which in this example is -40 - (-104) = 60 dB. Personally I use the bit rate achieved by the encoder in Variable Bit Rate mode of compression as a measure of dynamic range. The higher dynamic range the better sound quality. This particular example is the spectrum of the Armin van Buuren’s “State of Trance year 2007 mix - CD1” and it is compressed by neroAACenc_sse vbr @400 kb/s when working with the highest quality requirement. I would like to say this is the best CD I have seen although my experience is not enough to express a reliable opinion.

Without going in more details further I would say, based on my experience, this spectrum is a very good example to follow when producing dance music because:
1) It contains all audible frequency components.
2) It has high dynamic range which allows a loud sound to be generated. Because the loudness depends on the low frequency components it has as a side effect of making less audible the possible spectrum high frequency range changes caused by a sound encoder such as LAME mp3 encoder which I will show in the next post.

Skype:spas.velev

SpasV V.I.P. on January 25th, 2009 / post 28523

The “natural” way to have a digital sound is in a Pulse Code Modulated (PCM) format. According to above (Sony/Philips) specification such a sound is generated by an Analog to Digital Converter (ADC) at a rate of 1,412.2 kb/s (two channels each at 16 x 44.1 kb/s = 2x705.6 kb/s). If such a sound is to be transmitted over a digital communication cannel it would generate a huge amount of information. That is way methods of reducing it have been developed. These methods, as known, fall in two categories: lossy and lossless methods.

I will mention a couple of words about how an mp3 encoder works.
Basically, the lossy encoders try to decrease the amplitude of the spectral components present in the spectrum and to reduce the dynamic range of the spectrum. Two steps are performed:
1) At the first step a psychoacoustic model is used to shape the spectrum (usually to decrease or to cut off the high frequency range) and to cut off spectrum components that are evaluated as unperceivable because they are week or masked by louder components. For this purpose the sound is processed at a small sound time frame (about 23 ms for mp3) by filtering through a filter bank consisting in 32 sub bands (the whole frequency range 20 Hz - 20 kHz is divided in 32 sub bands) and these 32 spectra are analyzed.
2) At the second step the result from the previous step is transformed (a Modified Discrete Cosine Transform - MDCT is used). The coefficients of the transform are sorted to find proper scale factors; the coefficients are recalculated according to the scale factors and rounded off (quantized) so as the less possible number of bits to be used for their representation.
All these procedures are governed by the quality requirements and by the psychoacoustic model’s estimation of the resulting changes which should stay inaudible.

By using a spectrum analysis (Time Dependent Fourier Transform) it is possible to find out the changes but it is not a common practice and listening tests are conducted to evaluate them. This is a natural approach because the human sound perceiving system (ears and the brain) is the ultimate quality estimator.

But below you can see the spectra obtained after the example sound (ASOT Year 2007 Mix CD1) has been processed by LAME 398.2 encoder with default parameters and maximum quality requirement (-q 0) at three different constant bit rates: 320 kb/s, 192 kb/s, and 128 kb/s. A VBR, (at the highest quality requirements which resulted in 267 kb/s bit rate) processed sound spectrum is shown also.
Here, it is not seen the quantization effects which reduce the sound quality also but the reducing of the sound band width is clearly seen.

As to the other radio communication channel (FM radio channel) which is used as a source of our music “it is not known that any FM broadcasting system significantly exceed the 30 Hz - 15 kHz audio bandwidth levels”. But these cannels use “dynamic range compression” which changes the sound spectrum also.

Skype:spas.velev

SpasV V.I.P. on January 26th, 2009 / post 28525

Finally, I would like to show the way I process my XM radio rips.
There are two points that can be considered:
1) What is my sound source?
2) How do I process my sound source?
I am going to discuss the second point first. The reason is simple. I know exactly what I do but I don’t know my source and I can only guess and make some estimation which cannot be strict.

First of all I need to note: Provided that I have ripped a CD sound having been broadcasted by the XM radio channel I am not capable of perfectly reconstructing the original CD sound. What I can do is to get an approximation to the original. My approximation, I thing, is close in the sense of a spectrum evaluation and is not based on any listening test.

My source is a PCM sound which is a result of satellite receiver decoding of an MPEG-4 AAC stream.
I see sound spectrum drawbacks and I correct them achieving a spectrum which is close to the example (ASOT Year 2007 Mix CD1) spectrum. With such corrections some files can be encoded by neroAACenc encoder at 400 kb/s in VBR mode - the same maximum bit rate it encodes the example sound.

The following spectra are shown on the plot below:
1) The example (ASOT Year 2007 Mix CD1) sound spectrum.
2) The source (ASOT 387 radio show) sound spectrum.
3) The corrected source (of ASOT 387 radio show) sound spectrum.
4) The (Module of the) Frequency Respond of the filter I use to correct the spectrum.(It is only a qualitative illustration and is a result of filtering of a white noise.)

I have tuned my filter using two CD that have been broadcasted by the radio and I have ripped the broadcasts. As a result I have a good spectrum match between the CD spectra an my corrected ripped broadcast spectra which I have estimated by a visual inspection.

I would mention that the results I show for this source are not the best results I have. As it is seen the high frequency range of the corrected spectrum is above the example spectrum, which I think is not bad at all, but the MPEG-4 compression generates a max bit rate of 395 kb/s for the corrected source. This is because I have tuned the filter by Paul van Dyk’s show sound. If I tune the filter so as the corrected spectrum from Armin van Buuren’s show sound maches the example spectrum the MPEG-4 compression would generate a bit rate of 400 kb/s.

My filter is a FIR filter with 1291 coefficients and it processes a PCM encoded sound which means every sample it generates is a result of 1291 multiplications and additions. As long as there are 44,100 stereo samples per second and 7,200 seconds in two hours set the filtering needs high computer power.
I have implemented the filter in a computer program as a convolution in the Frequency domain using FFT which is the fastest way to perform those operations. The program runs on multiple processors simultaneously. I have two core 3.4 GHz processor which processes a two hours set for about 20 minutes.

There is another interesting topic in such processing and it is related to the way the resulting data, which are float point numbers, are converted back to 16 bits integers. I have solved existing problems the simplest possible way because my source is a decoded MPEG-4 radio stream, after all.

A couple more words about the first point - my estimation of my source (XM radio bit stream).
What follows is an idea. It is not strict enough to serve as a reliable estimation.

As I have already said in my previous posts devoted to the XM radio bit stream it seems to me the whole sound spectrum is split in two at 15 kHz.

The upper frequency range (greater than 15 kHz) is transferred by Sub Band Replication (SBR) which method uses less information than the classical methods to encode sound information.
The rest is resampled at 30 KHz and passed through a Quadrature Mirror Filter (QMF) bank with three sub bands. The signals at the output of the analyzing filters are encoded as MPEG-4 AAC streams.

So, there are four information channels altogether.
The MPEG-4 AAC encoding is more effective than an mp3 encoding.

The three of the information channels use more effective three sub bands encoding than a simple one band encoding.
If I assume each of the three sub band channels has 64 kb/s bit rate, the full bit rate would be 192 kb/s.
Because the three sub band channel transfers a stream sampled at 30 kHz this means 6.4 bits/sample are used.
If one channel transfers a stream sampled at 44.1 kHz (as an Internet stream) at bit rate of 192 kb/s it uses 4.66 bits/sample. So, this three sub band channel is equivalent to one digital channel transferring a 44.1 kHz sampled bit stream at 263 kb/s. This estimation has to be increased because of the additional SBR information that needs to be transferred also.

With my processing the need of higher bit rate increases also.
I use neroAACenc (version issued in 2007 which preserves the spectrum bandwidth) with a quality factor of 0.8 or 80 % of the maximum and it generates am MPEG-4 stream at around 300 - 330 kb/s depending on the sound producer.

Skype:spas.velev

SpasV V.I.P. on February 1st, 2009 / post 28565

Does a higher bit rate mean better quality? Not necessarily.

I have mentioned a couple of words about the FM radio spectrum.
Now I have a good reason to talk about it again. The reason was there were three torrents to download the Carl Cox’s Global 307 show. Two of them were ripped from a Kiss100, which is FM radio, broadcasts.

Here are four spectra:
One is the CD rip I have used as an example of a good spectrum in the previous my posts - Armin van Buuren’s album ASOT Year Mix 2007.
The second one (dark blue) is corrected Area Sirius XM’s broadcast of Global 307 rip.
The third one - (light blue) 160 kb/s mp3 Kiss100 broadcast rip.
The fourth one - (green) 192 kb/s mp3 Kiss100 (satellite) broadcast rip.

What do you think about the sound quality?

Here is what I think:

My sound spectrum high frequency range is a little bit above the CD spectrum which I think is better than the CD spectrum but it is a little bit below in the low frequency range and the result is, maybe, a worse dynamic range.

What about other two:
The spectrum bandwidth of an FM broadcasted sound is 15 kHz as I have already said.
In general, the green is worse because the spectrum is well bellow the blue one over almost the whole spectrum range.
The sound is weaker.

There are two more drawbacks with this spectrum.
The obvious one is the pick at 15.6 kHz. It is a pilot tone for the receiver to tune for a stereo broadcast. It should have been filtered out and not present in the spectrum.

The second one is not obvious but it can be shown this way:

The light blue (spectrum) sound is 160 kb/s mp3 sampled at 44.1 kHz. Which means 160/44.1 = 3.628 b/sample are used in average to code a stereo sample.
What about the green (satellite) sound? It is 192 kb/s mp3 sampled at 48.0 kHz. Which means 192/48 = 4 b/sample are used in average to code a stereo sample.

First of all 4 is 10% more than 3.628 which means that without seeing the spectra one can make a mistake deciding that 192 kb/s mp3 is better than 160 kb/s. Actually they are close and the blue is better.

But there is one more point that worth to be consider.
Is there a better mp3 encoding alternative than both present?
(Not going into details I will assume better means the best quality at the lowest bitrate.)
Yes, there is one and it is based on the spectrum bandwidth of the FM radio broadcast.
As already we know its spectrum bandwidth is 15 kHz. A digital signal sampled at 30 kHz can preserve the whole information about the analog signal broadcasted by the radio.

So, the mp3 encoder can resample the input file at (the closest) frequency of 32 kHz.
Then, I have encoded my full bandwidth PCM file with parameters –q 0 –s 32 –V0 --lowpass 15.
The result was a 179 kb/s mp3 encoded sound sampled at 32 kHz. Its average stereo sample uses 179/32=5.375 b/sample which is 34.375% more information than by 192 kb/s encoding.

In this particular case, the 192 kb/s mp3 sound is nether better than the 179 kb/s nor even than 160 kb/s sound.

Note: the --lowpass 15 parameter is needed, at least, to filter out the 15.6 stereo pilot tone.

Skype:spas.velev

mades house addict on March 3rd, 2009 / post 28945

Hey SpasV,

I'll answer you here, as you requested.

If it would be lame, LameTAG would detect it. And EncSpot wasn't the only one audio analyzer which marked it as FhG, for example AudioIdentifier said the same.

I checked some more 128k files and the spectrum analysis looks everytime the same. Lowpass cut at around 15.5 to 16Khz depending, if it was FhG, Xing, Gogo or Lame. None of the mp3 had frequencies over this level, like the Digg 320k file. The Joris Voorn file shows it even more clearly.

I also checked the 192k lame version from other source (with lowpass filter at 18.6) and it looks just like the 320k. Since it is a "scene internal release" where re-enconding is forbidden and people check for it, it can't be 128k original source, because they wouldn't release it at all.

SpasV V.I.P. on March 3rd, 2009 / post 28955

OK, my mistake. The encoder should leave its Tag in the file, so the program that read it could identify it although EncSpot guesses sometimes.

Let us try to identify the Transitions on Proton - 27-Feb-2009 version.

I don't have FhG encoder and I am using LAME instead. I am not going to proof their results do not differ significantly.

First I'll show the 320k and 128k spectra calculated over an hour sound to point out where the significant differences are.
Then I'll try to find differences between 320k and 128k encoding in a Spectrogram (at some moment of time).

First:

Here are spectra of my file and a mp3 @320k obtained from it and the spectrum of Transitions on Proton - 27-Feb-2009 version.

As it is easily seen the spectrum bandwidths are 20 kHz and 16 kHz which is typical for these bitrates.

The spectra above are from files obtained from different sources - Sirius XM broadcast and Proton Internet stream (? I guess.)

That's why here are 320k and 128k spectra obtained from a same source. I choose the John Digweed's Transitions vol.4 2008 CD as a source to see something more typical for John Digeed's sound.

Again, the spectra bandwidths differ significantly: 17.3 kHz for the 320k and 16 kHz for the 128k.

Based on these spectra I would say again:
Transitions on Proton - 27-Feb-2009 version is 320k approximation to a 128k source.

Another guess could be John Digweed has provided Proton Radio with a 320k version record of his show obtained with a 16 kHz lowpass filter applied. But I don't believe that for at least two reasons.
But I'll try to prove this studying the sound spectrograms next.

Skype:spas.velev

mades house addict on March 4th, 2009 / post 28966

Hm, let us compare:

Bedrock 10 - A Musical Transition (mixed by John Digweed):

Encoder string: LAME
Version string: 3.97
Quality: 77 (V2 and q3)
Encoding method: vbr new V2 (~191K, min 32K)
Lowpass: 18 600Hz

John_Digweed_-_Transitions_(Guest_Joris_Voorn)-SBD-02-27-2009-TALiON_INT

Encoder string: LAME
Version string: 3.97
Quality: 57 (V4 and q3)
Encoding method: cbr 192K
Lowpass: 18 600Hz

John Digweed - Transitions 02-27-2009 320k (FhG, no settings available)

John Digweed - Transitions on Kiss (Littleangel rip - FM broadcast 15Khz cut)

Random 128k proton stream:

All i can see on these spectrum graphs is that the 320k can't be 128k original source, like you suggested based on your sound forge analyzer and dB drop above 16Khz (but not frequency CUT).

Now i did also check your Sirius rip, it had constant peak frequency up to 20KHz during the whole set, like the sound was modified and digitally enhanced by Sirius. Here is the spectrum analysis:

John Digweed - Transitions 02-27-2009 Sirius AAC VBR complete 2 hours (SpasV)

Especially interesting is the ~21:55 to 22:25 part, where there is a 20sec stoptime with just 2 samples playing.They sound very differently on AAC sirius rip compared to LAME/FhG proton, you can hear additional "sound" on Sirius rip. Now does the original track include this sound or is it only a result of sound enhancing?

I was curious so i managed to get the original 320k version of that track from BeatPort. Guess what? There was NO sound like Sirius rip produced during that stoptime part.

For your information, it is Luciano Pizzella - We Need It (Original Mix), here you go Lame settings for the track:

Encoder string: LAME
Version string: 3.97
Quality: 58 (V4 and q2)
Encoding method: cbr 320k
Lowpass: 20 500Hz

Let's take a look at those 30 seconds !

Original Beatport 320k version:

320k rip

Kiss Fm rip (with noise)

AAC Sirius rip (SpasV) - zoomed

Quite a difference, huh ? :)

SpasV V.I.P. on March 5th, 2009 / post 28968

OK,
I don't have time to look carefully at the mades' post right now.

Here is my second step. It is not perfect because I don't have enough information but still it can be usefull.

First of all I would like to point out I used 512 points FFT for better time resolution. So, the frequency resolution is not high but this doesn't matter at all. What does matter is to compare the spectra at the same time of the music.

Here is an example of how the mp3 compression changes the spectra of one sec from Transitions vol. 4 2008 CD. The spectrograms are easy to distinguish. Starting from top to bottom they follow:
the PCM (wav) version, 320k version, 128k version. The differences are obvious but I have marked some of them on the 128k spectrogram in yellow.

Now, here are the spectrograms of my Transitions 01-Mar-2009 show (wav version), its 320k mp3 version, and the 320k mp3 Proton version. Because the sources are different there could be additional differences also but the main difference caused by the mp3 encoder is determining.
The time synchronization is done by the waveforms.

So, I think the 320k Proton version is 320k approximation to a 128k mp3 source not a 320k 16 kHz lowpassed approximation to an original CD source.

Skype:spas.velev

mades house addict on March 5th, 2009 / post 28976

You are forgetting one very important point: There is NO CD source for Transitions show. In fact, im 99% sure, he sends out a 320k mp3 to all the radio shows for broadcasting, because it is sufficient for all kinds of broadcasting techniques considering the listening quality. What Sirius does (and i have learned reading old posts on this forum) is, it adds a lot of sound enhancement so the spectra looks nice and full. But as i have shown on a simple track, in some cases it is more than unwanted and the quality drops rapidly.

There is nothing like "128k approximation". It is either 128k (or band limited) or not, you can see it on analyzer and most importantly, you can hear it. If you have lowpass filter at 18.6Khz, does it mean you are approximating 128k quality? No way.
I think i have proved my point even beyond the level necessary and i won't continue persuading you if you don't let yourself to.

P.s.It is no big deal to modify the sound to look like CD-source. But you can't cheat your ear.

SpasV V.I.P. on March 7th, 2009 / post 28992

mades wrote:
There is NO CD source for Transitions show.

Contrary to this, I think the producer supplies a lossless copy of his show and it is broad casted over the contracted channel (FM, 128 kb/s stream ...) by the radio.

mades wrote:
There is nothing like "128k approximation".

I have said "320 kb/s approximation to a 128 kb/s source" meaning that the 320 kb/s mp3 is a discrete function approximating = "coming close" to another discrete function - result of 128 kb/s mp3 encoding of an CD source discrete function. And the discrete function means digital representation of a sound.

mades wrote:
P.s.It is no big deal to modify the sound to look like CD-source.

Contrary to this, it is not a trivial task to do a bandwidth extension.

Skype:spas.velev

SpasV V.I.P. on March 7th, 2009 / post 28993

Let me add a couple more words to this discussion
First, to clarify the spectrograms:

They are 3-D graphs.
The first dimension (horizontal axis) is the time. The time is measured relatively to the beginning of the file. It is important to compare different spectra at the same point of time, that is why I have synchronized the spectrograms by the wave forms. So, the spectra at the same time position could have different time tag because the beginning of the file does not start at the same show time.

The second dimension (vertical axis) is the frequency. It starts at 2,000 Hz. (Usually, there are not considerable spectral differences under 2,000 Hz.)

The third dimension is the power spectrum. It is shown by a color coding in the range 0 dB -150 dB. So, dark blue and black corresponds to no signal components in that time - frequency field.

Now, here is my view of the same Transitions 27-Feb-2009 show versions mades have already shown: Proton 320k, TALiON SBD - 192 k, and Kiss100 192k.
The spectra are calculated over the same 3.6 sec time interval of the show (around 43 sec). I think it is enough to get some idea about the sound.

What can I say about them?

The black colored regions lack any signal power. These regions, within the spectrum bandwidth, are caused by the encoder having cut the components which it considers below the audible limit.

1) Proton 320 k and TALiON SBD – 192 k look the same (at this resolution).
2) Kiss100 192k is a little bit different around the cut off frequency and there is considerable signal power at around 15.5 kHz
Now, let us look at the spectra calculated over the first 27 min of the show.

First:
I have marked 2,000 Hz on the Frequency Axis. The region above it is shown on the spectrograms.
I have drawn a line at around -104 dB. The spectral components above this value are significant. You can think the component below it do not exist because the level -104 dB is considered a spectral density of the white noise added to the sound when digitalizing it as 15 bits values. Or it is the level of the round off errors.

Next:
1) Proton 320 k and TALiON SBD – 192 are undistinguished (at this resolution).
2) I can assume the original source is the Kiss100 FM radio broadcast. It is easily recognized by the pilot tone at 15.6 kHz. This tone is recognized on the spectrogram also (signal power at around 15.5 kHz). It shouldn’t be present in the signal.
3) As long as I understand, the TALiON’s signal has as a source a SBD sound system. My interpretation of SBD is Blue Circle Audio “ShoeBoxDAC” (Digital to Analog Converter) and this system is capable of doing an audio bandwidth extension so, it corrects the radio signal spectrum above 13.2 kHz extending it to 16 kHz.
4) I need to correct myself. I think what is called Proton 320 kb/s is a trans-coding of the TALiON’s 192 kb/s signal.

Skype:spas.velev