[转贴]对MP3及音频压缩技术的一些误解（短歌行）|『动漫游戏音乐交流区』 - 『漫游』酷论坛

[转贴]对MP3及音频压缩技术的一些误解（短歌行）

itsu@2003-06-02 20:29

1、mp3的音质很差？

错。mp3作为当前音频有损压缩的“王者”，它的编码技术已经几近完美。很多人只是不清楚如何才能压缩出高品质的mp3而已。2001年12月，世界上最优秀的mp3编码器--LAME推出了革命性的版本3.90.2，针对lame压缩参数过于烦琐的情况，提供了几个preset（预设）参数。现在只要使用LAME的standard（标准）模式进行压缩，就能得到近似于CD的完美音质。

2、128kbps的mp3=CD音质？

错。首先，所谓CD音质是一个带有很大主观性的名词，基本上可以认为CD音质意味着在平均水平的听音条件下能达到用光驱放CD的效果。但是根据这个定义，无数的试听结果表明，不管用什么编码器，什么样的设置，128kbps的mp3都不能达到这个标准。关于这方面的主题可参考http://ff123.net/ ,这是一个非常著名的国外音频站点，对128kbps的mp3的测试有非常详细的理论阐述。

3、mp3 192kbps CBR(固定比特速率) stereo(立体声)编码是音质与文件大小的最佳平衡设置？

错。这一误解有很深的根源。因为128kbps的mp3在音质上不能被“苛刻”的音乐爱好者接受，所以他们要寻求更好的设置。对Xing编码器及Fraunhofer编码器来说，直到现在它们在VBR(可变比特速率)和jointstereo(混合立体声)的算法上都很失败，所以很多人都认为CBR和stereo才是最佳的选择，而且192kbps的mp3在文件大小上也是可以接受的。是LAME编码器改变了这一切！LAME采用的VBR及智能的joint stereo算法非常优秀，已经没什么理由再去使用CBR和stereo--这样做只会浪费有限的bits。标准的VBR预定设置（即使用--alt-preset standard参数）生成的mp3文件的平均比特率也是192kbps，但它的音质要好过CBR 192kbps，在同等的比特率下其他的编码器非其敌手（按：除了1、mpc--其音质在该bitrate左右好于mp3, 2、最近的oggenc 1.0--not tested yet）。

4、mp3 320kbps CBR Stereo是mp3音质的极限？

错（或者说Not exactly true）。虽然320kbps是mp3标准的极限，但在320kbps下使用设计良好的Joint Stereo，能够将节省下下的bits用于纯粹的音乐部分（从而提高音质）。如果音源的立体声分离度很低，使用完全的stereo是一种浪费。

5、VBR的音质不如CBR？

错。设计良好的VBR算法不会将bits浪费在易于编码的部分，节省下来的bits将用在对复杂的音频部分进行编码。这一误解可能来自于较老的FhG Encoder的VBR算法及Xing VBR算法中存在的bug,对当前的lame编码器来说,它的VBR算法已被协调得很好,不会有音质上的问题。

6、Joint Stereo 音质不佳？

错。当前主流的encoder如lame、mppenc、oggenc、aacenc都使用了所谓smart joint stereo的技术，不会破坏stereo image,请参阅如下的两个链接（E文,由编码器的开发者解答）：

　　http://www.hydrogenaudio.org/forums/showthread.php?s=&threadid=1081
　　http://www.hydrogenaudio.org/forums/showthread.php?s=&threadid=759

更为技术性的解释如下：

　　http://www.xiph.org/ogg/vorbis/doc/stereo.html

7、Blade是最佳的mp3编码器？

错。（似乎不用过多的解释）Blade不推荐用于所有bitrate的mp3编码，由于缺少相当多的功能，其音质较lame或FhG逊色很多。下面的两个链接有助于了解blade的缺憾：

　　http://forums.afterdawn.com/thread_view.cfm/1914
　　http://www.hydrogenaudio.org/forums/showthread.php?s=&threadid=463

最新消息——Blade已经停止开发，其作者在主页上声明ogg是更好的选择

8、wma在64kbps可达CD音质？

错。不用我多费笔墨，不相信的话点击下面的链接了解详情:：

　　http://www.hydrogenaudio.org/forums/showthread.php?s=&threadid=1434
　　http://forums.winamp.com/showthread.php?s=&threadid=89378
另外，专门为winamp写plugin的Peter也写了篇文章:

　　Why not to use wma (http://205.188.228.81/showthread.php?threadid=81838)

9、不同的音乐类型需要不同的编码器及不同的参数?

错。编码器是在音频信号级进行处理，不去分辨音乐类型。只要心理学模型与编码算法正确，同一设置就适用于所有的音乐类型。详情参见：

　　 http://www.hydrogenaudio.org/forums/showthread.php?s=&threadid=1835

引用

itsu@2003-06-02 20:30

Overview of the MP3 techniques
To get a such reduction of the amount of data, the MP3 format uses a few techniques and tricks. I am going to attempt giving you some explanations on most of them. Among these techniques, those commonly designated under the name of perceptual coding will be mentioned by , others by .
The minimal audition threshold
The masking effect
The bytes reservoir
The Joint Stereo coding
The Huffman coding
The minimal audition threshold:
The minimal audition threshold of the ear is not linear. It is represented, according to the law of Fletcher and Munson, by a curve dug between 2Khz and 5Khz. It is not therefore necessary to code sounds situated under this threshold, because they will not be perceived.
The masking effect:
This system is based on masking properties of the human ear:
When you look at the sun and if a bird passes ahead, you do not see it because of the too predominant light of the sun. In audio, it is similar. During strong sounds, you do not hear the weakest sounds. Take as an example a piece of organ: when the organist does not play, you hear the breath in the piping, and when he plays, you no longer hear it because it is masked.
It is therefore not necessary to code all the sounds. This is the first property used by the MP3 format to earn some space. For this the MP3 encoder uses a psychoacoustic model modeling the behavior of the human ear.
The bytes reservoir:
Often, some passages of a musical piece can not be coded to a given rate without altering the musical quality. The MP3 then uses then a short reservoir of bytes that acts as a buffer by using capacity from passages that can be coded to an inferior rate in the given flow.
The Joint Stereo coding:
In the case of a stereophonic signal, the MP3 format can then use a few more tools, reffered as Joint Stereo (JS) coding, to further shrink the compressed file size.
In many mid-range Hi-fi sets , there is a unique subwoofer. However you usually do not have the feeling that the sound comes from this boomer, but rather from satellite speakers. Indeed for very low and very high frequencies, the human ear is no longer able to locate the spacial origin of sounds with full accuracy. The mp3 format can therefore (optionally) revert to such a trick by using what is called Intensity Stereo (IS). Some frequencies are then recorded as a monophonic signal followed by a few additional information in order to restore a minimum of spatialisation.
The second joint stereo tool is called Mid/Side (M/S) stereo. When the left and the right channels are quite similar, then a middle (L+R) and a side (L-R) channels are encoded instead of left and right. This allows to reduce the final file size by using less bits for the side channel. During playback, the MP3 decoder will reconstruct the left and right channels.
The Huffman coding:
The MP3 also uses the classic technique of the Huffman algorithm. It acts at the end of the compression to code information, and this is not therefore itself a compression algorithm but rather a coding method.
This coding creates variable length codes on a whole number of bits. Higher probability symbols have shorter codes. Huffman codes have the property to have a unique prefix, they can therefore be decoded correctly in spite of their variable length. The decoding step is very fast (via a correspondence table). This kind of coding allows to save on the average a bit less than 20% of space.
It is an ideal complement of the perceptual coding: During big polyphonies, the perceptual coding is very efficient because many sounds are masked or lessened, but little information is identical, so the Huffmann algorithm is very seldom efficient. During "pure" sounds there are few masking effects, but Huffman is then very efficient because digitalized sound contains many repetitive bytes, that will then be replaced by shorter codes.

引用

itsu@2003-06-02 20:33

MPEG Audio Layer I/II/III frame header

There is no main file header in an MPEG audio file. An MPEG audio file is built up from a succession of smaller parts called frames. A frame is a datablock with its own header and audio information.

In the case of Layer I or Layer II, frames are some totally independent items, so you can cut any part of MPEG file and play it correctly. The player will then play the music starting to the first plain valid frame founded. However, in the case of Layer III, frames are not always independant. Due to the possible use of the "byte reservoir", wich is a kind of buffer, frames are often dependent of each other. In the worst case, 9 frames may be needed before beeing able to decode one frame.
When you want to read info about an MPEG audio file, it is usually enough to find the first frame, read its header and assume that the other frames are the same. But this is not always the case, as variable bitrate (VBR) files may be encountered. In a VBR file, the bitrate can be changed in each frame. It can be used, as an exemple to keep a constant sound quality during the whole file, by using more bits where the music need more to be encoded.

The frame header is 32 bits (4 bytes) length. The first twelve bits (or first eleven bits in the case of the MPEG 2.5 extension) of a frame header are always set to 1 and are called "frame sync".

Frames may have an optional CRC checksum. It is 16 bits long and, if it exists, follows the frame header. After the CRC comes the audio data. By re-calculating the CRC and comparing its value to the sored one, you can check if the frame has been altered during transmission of the bitstream.

Here is a presentation of the frame header content. Characters A to M are used to indicate different fields. In the table below, you can see details about the content of each field.

AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM

Sign Length
(bits) Position
(bits) Description
A 11 (31-21) Frame sync (all bits must be set)
B 2 (20,19) MPEG Audio version ID
00 - MPEG Version 2.5 (later extension of MPEG 2)
01 - reserved
10 - MPEG Version 2 (ISO/IEC 13818-3)
11 - MPEG Version 1 (ISO/IEC 11172-3)
Note: MPEG Version 2.5 was added lately to the MPEG 2 standard. It is an extension used for very low bitrate files, allowing the use of lower sampling frequencies. If your decoder does not support this extension, it is recommended for you to use 12 bits for synchronization instead of 11 bits.

C 2 (18,17) Layer description
00 - reserved
01 - Layer III
10 - Layer II
11 - Layer I
D 1 (16) Protection bit
0 - Protected by CRC (16bit CRC follows header)
1 - Not protected
E 4 (15,12) Bitrate index
bits V1,L1 V1,L2 V1,L3 V2,L1 V2, L2 & L3
0000 free free free free free
0001 32 32 32 32 8
0010 64 48 40 48 16
0011 96 56 48 56 24
0100 128 64 56 64 32
0101 160 80 64 80 40
0110 192 96 80 96 48
0111 224 112 96 112 56
1000 256 128 112 128 64
1001 288 160 128 144 80
1010 320 192 160 160 96
1011 352 224 192 176 112
1100 384 256 224 192 128
1101 416 320 256 224 144
1110 448 384 320 256 160
1111 bad bad bad bad bad

NOTES: All values are in kbps
V1 - MPEG Version 1
V2 - MPEG Version 2 and Version 2.5
L1 - Layer I
L2 - Layer II
L3 - Layer III

"free" means free format. The free bitrate must remain constant, an must be lower than the maximum allowed bitrate. Decoders are not required to support decoding of free bitrate streams.
"bad" means that the value is unallowed.

MPEG files may feature variable bitrate (VBR). Each frame may then be created with a different bitrate. It may be used in all layers. Layer III decoders must support this method. Layer I & II decoders may support it.

For Layer II there are some combinations of bitrate and mode which are not allowed. Here is a list of allowed combinations. bitrate single channel stereo intensity stereo dual channel
free yes yes yes yes
32 yes no no no
48 yes no no no
56 yes no no no
64 yes yes yes yes
80 yes no no no
96 yes yes yes yes
112 yes yes yes yes
128 yes yes yes yes
160 yes yes yes yes
192 yes yes yes yes
224 no yes yes yes
256 no yes yes yes
320 no yes yes yes
384 no yes yes yes

F 2 (11,10) Sampling rate frequency index bits MPEG1 MPEG2 MPEG2.5
00 44100 Hz 22050 Hz 11025 Hz
01 48000 Hz 24000 Hz 12000 Hz
10 32000 Hz 16000 Hz 8000 Hz
11 reserv. reserv. reserv.

G 1 (9) Padding bit
0 - frame is not padded
1 - frame is padded with one extra slot

Padding is used to exactly fit the bitrate.As an example: 128kbps 44.1kHz layer II uses a lot of 418 bytes and some of 417 bytes long frames to get the exact 128k bitrate. For Layer I slot is 32 bits long, for Layer II and Layer III slot is 8 bits long.
H 1 (8) Private bit. This one is only informative.
I 2 (7,6) Channel Mode
00 - Stereo
01 - Joint stereo (Stereo)
10 - Dual channel (2 mono channels)
11 - Single channel (Mono)

Note: Dual channel files are made of two independant mono channel. Each one uses exactly half the bitrate of the file. Most decoders output them as stereo, but it might not always be the case.
One example of use would be some speech in two different languages carried in the same bitstream, and then an appropriate decoder would decode only the choosen language.
J 2 (5,4) Mode extension (Only used in Joint stereo)
Mode extension is used to join informations that are of no use for stereo effect, thus reducing needed bits. These bits are dynamically determined by an encoder in Joint stereo mode, and Joint Stereo can be changed from one frame to another, or even switched on or off.

Complete frequency range of MPEG file is divided in subbands There are 32 subbands. For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied. For Layer III these two bits determine which type of joint stereo is used (intensity stereo or m/s stereo). Frequency range is determined within decompression algorithm.

Layer I and II Layer III
value Layer I & II
00 bands 4 to 31
01 bands 8 to 31
10 bands 12 to 31
11 bands 16 to 31
Intensity stereo MS stereo
off off
on off
off on
on on

K 1 (3) Copyright
0 - Audio is not copyrighted
1 - Audio is copyrighted

The copyright has the same meaning as the copyright bit on CDs and DAT tapes, i.e. telling that it is illegal to copy the contents if the bit is set.
L 1 (2) Original
0 - Copy of original media
1 - Original media

The original bit indicates, if it is set, that the frame is located on its original media.
M 2 (1,0) Emphasis
00 - none
01 - 50/15 ms
10 - reserved
11 - CCIT J.17

The emphasis indication is here to tell the decoder that the file must be de-emphasized, ie the decoder must 're-equalize' the sound after a Dolby-like noise supression. It is rarely used

引用

黄昏の旅者@2003-06-02 20:52

早先在收一些较老的游戏Rip时,倒是用的FhG,要不就是xing编码的. 汗...
这样看来,lame也算得上是现在的大哥了.:)

引用

wwmidia@2003-06-02 21:06

-__-头大……

反正我喜欢320cbr，stereo……tf……

引用

sth@2003-06-02 21:10

短歌行还有很多类似的帖子，有一些很有参考价值

引用

太棒了

nick4611@2003-06-02 22:44

好詳細喔
不過MP3終究是壓過的效果
還市比部上CD直街稍在音軌上^^

引用

alextian_123@2003-06-03 08:46

你们的耳多真的精到可以一听就分辨是CD还是128的MP3？？？我不相信……我把128K的MP3刻成CD放音响里听你们可以听出这不是原版的CD？？？音质要考虑的东西很多，绝对不是凭码率来分辨的……除了我经常听的歌，我可以分辨他在不同的环境下的高底音效果！否则……难！

引用

Symlith@2003-06-03 09:22

在我的破烂声卡上也听得出128K mp3和CD的区别，很明显

引用

zyb1020@2003-06-03 09:27

引用
最初由 alextian_123 发布
你们的耳多真的精到可以一听就分辨是CD还是128的MP3？？？我不相信……我把128K的MP3刻成CD放音响里听你们可以听出这不是原版的CD？？？音质要考虑的东西很多，绝对不是凭码率来分辨的……除了我经常听的歌，我可以分辨他在不同的环境下的高底音效果！否则……难！

不过现在还真的听得出128 和196的区别了～～～

128已经不能满足音乐发烧友了在没cd的情况下 196是不错的选择

320 ape 当然更棒～

引用

A502ALARM@2003-06-03 09:31

引用
最初由 alextian_123 发布
你们的耳多真的精到可以一听就分辨是CD还是128的MP3？？？我不相信……我把128K的MP3刻成CD放音响里听你们可以听出这不是原版的CD？？？音质要考虑的东西很多，绝对不是凭码率来分辨的……除了我经常听的歌，我可以分辨他在不同的环境下的高底音效果！否则……难！

最起码128kbits和CD的音质你应该听得出吧
除非CD本身就是用MP3刻的
否则差别太明显了
如果说是192的和CD比咋听下也许会分不出
但是稍有经验的也不难分辨
320就～～～～至少我分不出

引用

gundamboy@2003-06-04 13:33

128k与192k很好分辨的，高音和音域方面的差别很大，当然前提是你要有一个好的耳机或音响，320k与192k就较难听出来了，个人感觉320k的音域要比192k广，用耳塞（888）听的时候能感觉出来，用音响(Sound Works)听，没什么感觉。出门在外听随身听的话，就很难分辨出192k于320k的区别了。我不喜欢用Joint Stereo，可能是心理作用，反正我觉得用它声场的定位不准确，与音源有差别，也不爱用CBR 320k，LAME的优势在它的VBR，320k浪费了太多空间，只换来些许很难察觉的细节，从实用的角度讲，我更偏好于ABR 224k。

引用

nature@2003-06-04 19:59

引用
你们的耳多真的精到可以一听就分辨是CD还是128的MP3？？？

这个太好分辨了，一听就出，尤其是用XX山寨编码器压的128KBPS的MP3，简直是在强奸耳朵。

引用

wwmidia@2003-06-04 21:34

引用
最初由 alextian_123 发布
你们的耳多真的精到可以一听就分辨是CD还是128的MP3？？？我不相信……我把128K的MP3刻成CD放音响里听你们可以听出这不是原版的CD？？？音质要考虑的东西很多，绝对不是凭码率来分辨的……除了我经常听的歌，我可以分辨他在不同的环境下的高底音效果！否则……难！

强淫，拜一个

引用

| TOP