MIDI 文件格式

原文:http://jedi.ks.uiuc.edu/~johns/links/music/
翻译:dreamana.com

Standard MIDI Files 1.0

未完成 …

MIDI 文件包含一个或多个 MIDI 流,MIDI 流中的每一个事件 (event) 都带有时间信息。支持歌曲 (song)、序列 (sequence)、音轨结构 (track structures)、拍子和拍子记号信息 (tempo and time signature information)。音轨名称和其他描述信息可以储存到 MIDI 数据中。这种格式支持多音轨多序列,因此如果一个正使用支持多音轨的程序的用户想将一个文件转移到另一个上面,这种格式可以实现。

MIDI 文件中所采用的 8-bit 二进制数据流,可以储存为二进制文件,可以逐段读取化 ?(nibbleized),7-bit 化,达到高效 MIDI 传输,转换为十六进制 ASCII,或转换为符号化的可打印的文件文件。

1. 序列,音轨,块:文件块结构

序列文件由块 (chunks) 组成。每一块有 4-character 类型和 32-bit 长度,就是块的字节数。在 Mac 机上,数据可以通过文件的数据叉或者剪贴板传输。(在 Mac 机上这种文件的格式是 “Midi”) 在其他电脑上,数据就单是文件的内容。这种结构使得将来要是设计了新的块类型,旧的程序可以容易地把这些块忽略掉。因此,你的程序要能够忽略异类块。

这里定义了两种类型的块:文件头块和音轨块。文件头块提供关于整个 MIDI 文件的小量信息。音轨块含有 MIDI 数据的序列流,可以容纳 16 条 MIDI 通道的信息。多音轨,多 MIDI 输出,谱式,序列和歌曲这些概念可以通过若干个音轨块来实现。

MIDI 文件通常以文件头块开始,接着是一个或多个音轨块。

MThd  <length of header data>

<header data>

MTrk  <length of track data>

<track data>

MTrk  <length of track data>

<track data>

音轨数据格式 (MTrk 块类型)

MTrk 块类型 (MTrk chunk type) 是实际歌曲数据存放的地方。它其实是一个 MIDI 事件(和非 MIDI 事件)流,事件前面都带着 delta-time 值。

MTrk 块里有些数字是以一种叫做可变长量 (variable-length quantity) 的形式表示的。

These numbers are represented 7 bits per byte, most significant bits first. All bytes except the last have bit 7 set, and the last byte has bit 7 clear. If the number is between 0 and 127, it is thus represented exactly as one byte.

这里有一些将数字表示为可变长量的例子:

        Number (hex)    Representation (hex)         00000000        00         00000040        40         0000007F        7F         00000080        81 00         00002000        C0 00         00003FFF        FF 7F         00004000        81 80 00         00100000        C0 80 00         001FFFFF        FF FF 7F         00200000        81 80 80 00         08000000        C0 80 80 00         0FFFFFFF        FF FF FF 7F

最大允许的数字是 0FFFFFFF 因此用可变长表示法写数字时必须匹配 32 位。理论上,可以表示更大的数,但是 tempo=500BPM 的高速下 2 x 108 96ths / 拍 等于四天的时间,对任何 delta-time 都足够长了。

下面是 MTrk 块的语法:

<track data> = <MTrk event>+

<MTrk event> = <delta-time> <event>

<delta-time> 存储为可变长量。它代表接下来的事件发生之前所经过的时间。如果音轨第一个事件在开头就发生,或者两个事件同时发生,delta-time 设为零。

Delta-times 总是存在的(不存储零 delta-times 至少需要额外的 2 字节,而且大多数 delta-times 都不为零)。 而 Delta-time 是在文件头块里定义的拍子分数(或者秒数,用来记录带有 SMPTE 时间的音轨)。

<event> = <MIDI event> | <sysex event> | <meta-event>

< MIDI event> is any MIDI channel message.  Running status is used: status bytes may be omitted after the first byte.  The first event in a file must specify status.  Delta-time is not considered an event itself: it is an integral part of the specification.  Notice that running status occurs across delta-times.

< MIDI event> 是 MIDI 通道信息。运行状态时:状态位可能在第一字节之后被忽略掉。文件中的第一个事件必须定义状态。Delta-time 本身不作为一个事件:它是定义中不可缺的部分。运行状态发生在 delta-times。

<meta-event> 定义非 MIDI 信息对这种格式或者音序器 (sequencers) 有用,语法是这样:

FF <type> <length> <bytes>

所有 meta-event 都以 FF 开头,然后是一个事件类型的字节(总小于 128),接着是一个数据的长度值 length(用可变长量表示的),最后就是数据本身。如果没有数据长度为 0。对于 sysex event,运行状态是不允许的。对于块,现有的程序也许不认识以后设计的 meta-event,因此程序必须适当地忽略到它们识别不了的 meta-event,其实,更应该预留处理这些未知事件的接口。 New for 0.06: programs must never ignore the length of a meta-event which they do recognize, and they shouldn’t be surprised if it’s bigger than they expected. If so, they must ignore everything past what they know about. However, they must not add anything of their own to the end of a meta-event.

<sysex event> 是用于定义 MIDI 系统专属信息 (system exclusive message),或者作为“换码 (escape)”去定义将要传送的任意字节。不幸的是,有些合成器 (synthesizer) 制造商定义他们的系统专属信息以小包的方式传送。每一个包 (packet) 仅仅是整个系统专属信息实体的一部分,但是他们被传送的时间是很重要的。例如,一个 CZ patch dump 里发送的字节,或者 FB-01 的“系统专属模式”下传输的微音程数据。为了能够处理这些情况,提供了两种形式的 <sysex event> :

F0 <length> <bytes to be transmitted after F0>

F7 <length> <all bytes to be transmitted>

上述两种,存储为一个可变长量。长度值表示随后有多少字节,不包括长度值本身和信息类型 (F0 或 F7),但只要是所指定的长度内的所有字节都被传输(就算是这些字节中含有 F7)。第一种形式,带有 F0 的,是用于语法完整系统专属信息,或者一个序列 Q 的第一个数据包,那是 F0 将要传送的信息。第二种形式是用于语法 sysex 信息的数据包的剩下部分,那些不是 F0 开头的。当然,F7 不作为系统专属信息的一部分。也当然,只在 MIDI 中,不允许运行状态,因此长度值用一个不一定带 bit7 集的可变长量来储存。

(New to 0.06) 一个语法系统专属信息总是以 F7 结束,虽然现实的设备不发送结束标记,因此不需要等到 MIDI 文件里下一个事件的开头,你也知道一个 sysex 信息实体何时结束。这一原则将在后面再说明。

绝大多数系统专属信息只用 F0 形式。例如,传输信息 F0 43 12 00 07 F7 在 MIDI 文件里存储的是 F0 05 43 12 00 07 F7。正如上面提到的,需要在结尾加上 F7,使得在读 MIDI 文件时可以知道已经读取整个信息。

对于特殊情况,当单个系统专属信息被分割,各部分在不同时段传输,如 Casio CZ patch 转送器,或 FB-01 的“系统专属模式”,每一个 packet(除了第一个) 的 sysex event 都是用 F7 形式,每一个 packet 不以 F7 结尾 (除了最后一个,因为它必须要以 F7 结尾)。而且在多个系统专属信息的 packet 之间不能含有其他可传输 MIDI 事件。以下是一个例子:假设发送字节 F0 43 12 00,接着延时 200-tick,又接着字节 43 12 00 43 12 00,延时 100-tick,字节 43 12 00 F7,在 MIDI 文件里头是:

        F0 03 43 12 00         81 48                                   200-tick delta-time         F7 06 43 12 00 43 12 00         64                                      100-tick delta-time         F7 04 43 12 00 F7

F7 事件也可以作为“换码 (escape)”去传输任何字节,包括实时 (real-time) 字节,歌曲指针 (song pointer),或 MIDI 时间码 (MIDI Time Code),这规范一般不允许这些。没办法解释这样的字节。若一个系统专属信息未传达,这种情况就没必要以一个 F7 来结束 F7 事件。

2. 文件头块 (Header Chunk)

文件开头的文件头块定义一些关于文件数据的基本信息。数据部分含有 3 个 16-bit 字,(当然)先存储高位。下面是完整块的语法:

<chunk type> <length> <format> <ntrks> <division>

前面提到,<chunk type> 是那 4 个 ASCII 字符 ‘MThd’;<length> 是 6 个数字的 32 位表示法(高位优先)。<format> 定义了整个文件的组织结构。定义的格式只有 3 种:

        0       该文件含有一个多通频道的单音轨         1       该文件含有一个带有一个或多个同步音轨 (或 MIDI 输出) 的序列         2       该文件含有一个或多个顺序独立单轨模式

接下来,<ntrks> 是文件音轨块的数量。<division> 文件中 delta-time 的一个四分音符的分割量。(如果 division 是负数,则表示 delta-time 的一秒钟的分割量,因此音轨事件发生可以用精确时间代替韵律时间来表示。表示如下:第一字节是-24,-25,-29,-30 四个值之一,相对应 4 种标准的 SMPTE 和 MIDI 事件码,和表示每秒的帧数。第二字节(保存正数)是帧的解析度:通常值可能是 4(MIDI 时间码解析度), 8, 10, 80 (比特解析度), 或 100. 系统允许定义额外的基于时间码的音轨,也允许 25 帧每秒和 40 个单位每帧解析度的基于毫秒的音轨。)

0 格式,就是一个多通道音轨,是最可互换的数据表示法。一个简单的 MIDI 文件单轨播放器的应用程序,需要使音序器发声,但是它要首先关系到其他例如混音器或音效盒 (sound effect boxes)。去能够生成这种格式非常必要的,即使你的程序是基于音轨的,为了跟这些简单程序一起运作。另一方面,也许某人会写一种格式将格式 1 转为格式 0,并且非常容易设置,就可以省去你很多麻烦将它放进你的程序。

支持多同步音轨的程序要能够保存和读取格式 1 的数据,一个垂直一维表,作为音轨的集合。支持多独立模式的程序要能够保存和读取格式 2 的数据,一个水平一维表。提供这个些最低限度的能力可以确保最大限度的可互换性。

MIDI 文件可以表示拍子和拍子记号,他们被用于传送拍子映射到另一个设备。对于格式 0,拍子通过音轨散布,而拍子映射阅读器会忽略中介事件;对于格式 1,拍子映射(从 0.04 开始)必须储存在第一音轨。提供给你的用户做只有拍子的 0 格式对拍子映射阅读器来说是很

友好的,除非你能使用格式 1。

所有 MIDI 文件都应当定义拍子和拍子记号。如果没有,拍子记号被假定为 4/4,拍子 120 Bpm。在格式 0 中,这些 meta-event 至少应该在多频道单音轨的开头出现。在格式 2 中,时间独立模式应该至少包含初始的拍子和拍子记号的信息。

我们将来可能会为了支持其他结构而定义其他格式。因此(我们编写的)程序当读到一个不熟悉的格式时应当返回错误信息给用户,而不要继续读下去。

3. Meta-Events

在此定义了一些 meta-event。不是所有程序都需要支持每一个 meta-event。Meta-event 初始定义包括:

FF 00 02 ssss Sequence Number

This optional event, which must occur at the beginning of a track, before any nonzero delta-times, and before any transmittable MIDI events, specifies the number of a sequence. The number in this track corresponds to the sequence number in the new Cue message discussed at the summer 1987 MMA meeting. In a format 2 MIDI file, it is used to identify each “pattern” so that a “song” sequence using the Cue message to refer to the patterns. If the ID numbers are omitted, the sequences' locations in order in the file are used as defaults. In a format 0 or 1 MIDI file, which only contain one sequence, this number should be contained in the first (or only) track. If transfer of several multitrack sequences is required, this must be done as a group of format 1 files, each with a different sequence number.

FF 01 len text Text Event

Any amount of text describing anything. It is a good idea to put a text event right at the beginning of a track, with the name of the track, a description of its intended orchestration, and any other information which the user wants to put there. Text events may also occur at other times in a track, to be used as lyrics, or descriptions of cue points. The text in this event should be printable ASCII characters for maximum interchange. However, other character codes using the high-order bit may be used for interchange of files between different programs on the same computer which supports an extended character set. Programs on a computer which does not support non-ASCII characters should ignore those characters.

Meta event types 01 through 0F are reserved for various types of text events, each of which meets the specification of text events(above) but is used for a different purpose:

FF 02 len text Copyright Notice

Contains a copyright notice as printable ASCII text. The notice should contain the characters (C), the year of the copyright, and the owner of the copyright. If several pieces of music are in the same MIDI file, all of the copyright notices should be placed together in this event so that it will be at the beginning of the file. This event should be the first event in the first track chunk, at time 0.

FF 03 len text Sequence/Track Name

If in a format 0 track, or the first track in a format 1 file, the name of the sequence. Otherwise, the name of the track.

FF 04 len text Instrument Name

A description of the type of instrumentation to be used in that track. May be used with the MIDI Prefix meta-event to specify which MIDI channel the description applies to, or the channel may be specified as text in the event itself.

FF 05 len text Lyric

A lyric to be sung. Generally, each syllable will be a separate lyric event which begins at the event’s time.

FF 06 len text Marker

Normally in a format 0 track, or the first track in a format 1 file. The name of that point in the sequence, such as a rehearsal letter or section name (“First Verse”, etc.).

FF 07 len text Cue Point

A description of something happening on a film or video screen or stage at that point in the musical score (“Car crashes into house”, “curtain opens”, “she slaps his face”, etc.)

FF 2F 00 End of Track

This event is not optional. It is included so that an exact ending point may be specified for the track, so that it has an exact length, which is necessary for tracks which are looped or concatenated.

FF 51 03 tttttt Set Tempo, in microseconds per MIDI quarter-note

This event indicates a tempo change. Another way of putting “microseconds per quarter-note” is “24ths of a microsecond per MIDI clock”. Representing tempos as time per beat instead of beat per time allows absolutely exact long-term synchronization with a time-based sync protocol such as SMPTE time code or MIDI time code. This amount of accuracy provided by this tempo resolution allows a four-minute piece at 120 beats per minute to be accurate within 500 usec at the end of the piece. Ideally, these events should only occur where MIDI clocks would be located Q this convention is intended to guarantee, or at least increase the likelihood, of compatibility with other synchronization devices so that a time signature/tempo map stored in this format may easily be transferred to another device.

FF 54 05 hr mn se fr ff SMPTE Offset

This event, if present, designates the SMPTE time at which the track chunk is supposed to start. It should be present at the beginning of the track, that is, before any nonzero delta-times, and before any transmittable MIDI events. The hour must be encoded with the SMPTE format, just as it is in MIDI Time Code. In a format 1 file, the SMPTE Offset must be stored with the tempo map, and has no meaning in any of the other tracks. The ff field contains fractional frames, in 100ths of a frame, even in SMPTE-based tracks which specify a different frame subdivision for delta-times.

FF 58 04 nn dd cc bb Time Signature

The time signature is expressed as four numbers. nn and dd represent the numerator and denominator of the time signature as it would be notated. The denominator is a negative power of two: 2 represents a quarter-note, 3 represents an eighth-note, etc. The cc parameter expresses the number of MIDI clocks in a metronome click. The bb parameter expresses the number of notated 32nd-notes in a MIDI quarter- note (24 MIDI Clocks). This was added because there are already multiple programs which allow the user to specify that what MIDI thinks of as a quarter-note (24 clocks) is to be notated as, or related to in terms of, something else.

Therefore, the complete event for 6/8 time, where the metronome clicks every three eighth-notes, but there are 24 clocks per quarter-note, 72 to the bar, would be (in hex):

        FF 58 04 06 03 24 08

That is, 6/8 time (8 is 2 to the 3rd power, so this is 06 03), 32 MIDI clocks per dotted-quarter (24 hex!), and eight notated 32nd-notes per MIDI quarter note.

FF 59 02 sf mi  Key Signature

        sf = -7:  7 flats         sf = -1:  1 flat         sf = 0:  key of C         sf = 1:  1 sharp         sf = 7: 7 sharps

        mi = 0:  major key         mi = 1:  minor key

FF 7F len data  Sequencer-Specific Meta-Event

Special requirements for particular sequencers may use this event type: the first byte or bytes of data is a manufacturer ID. However, as this is an interchange format, growth of the spec proper is preferred to use of this event type. This type of event may be used by a sequencer which elects to use this as its only file format; sequencers with their established feature-specific formats should probably stick to the standard features when using this format.

To be continue…