Movatterモバイル変換

NotificationsYou must be signed in to change notification settings
Fork99
Star531

Library and tools for working with MP4 files containing video, audio, subtitles, or metadata. The focus is on fragmented files. Includes mp4ff-info, mp4ff-encrypt, mp4ff-decrypt and other tools.

dev.to/video/mp4ff-beyond-mp4-boxes-2bee

License

MIT license

531 stars 99 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,002 Commits
.github/workflows		.github/workflows
aac		aac
av1		av1
avc		avc
bits		bits
cmd		cmd
docker		docker
examples		examples
hevc		hevc
images		images
internal		internal
mp4		mp4
sei		sei
vvc		vvc
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile.osc		Dockerfile.osc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum

Repository files navigation

Module mp4ff implements MP4 media file parsing and writing for AVC and HEVC video, AAC and AC-3 audio, stpp and wvtt subtitles, andtimed metadata tracks.It is focused on fragmented files as used for streaming in MPEG-DASH, MSS and HLS fMP4, but can also decode and encode allboxes needed for progressive MP4 files.

Command Line Tools

Some useful command line tools are available incmd directory.

mp4ff-info prints a tree of the box hierarchy of a mp4 file with informationabout the boxes.
mp4ff-pslister extracts and displays SPS and PPS for AVC or HEVC in a mp4 or a bytestream (Annex B) file.Partial information is printed for HEVC.
mp4ff-nallister lists NALUs and picture types for video in progressive or fragmented file
mp4ff-subslister lists details of wvtt or stpp (WebVTT or TTML in ISOBMFF) subtitle samples
mp4ff-crop crops aprogressive mp4 file to a specified duration
mp4ff-encrypt encrypts a fragmented file using cenc or cbcs Common Encryption scheme
mp4ff-decrypt decrypts a fragmented file encrypted using cenc or cbcs Common Encryption scheme

You can install these tools by going to their respective directory and rungo install . or directly from the repo with

go install github.com/Eyevinn/mp4ff/cmd/mp4ff-info@latestgo install github.com/Eyevinn/mp4ff/cmd/mp4ff-encrypt@latest...

for each individual tool.

Open Source Cloud

You can also run the tools as a job inEyevinn Open Source Cloud. Here is an example using themp4ff-crop command and the Open Source Cloud CLI.

%export OSC_ACCESS_TOKEN=<your-personal-access-token>% npx -y @osaas/cli@latest create eyevinn-mp4fftest \  -o awsAccessKeyId=<s3-access-key-id> \  -o awsSecretAccessKey=<s3-secret-key> \  -o s3EndpointUrl=https://eyevinnlab-birme.minio-minio.auto.prod.osaas.io \  -o cmdLineArgs="mp4ff-crop s3://input/VINN.mp4 s3://output/VINN-crop2.mp4"

The file VINN.mp4 on the bucket called "input" on the MinIO server athttps://eyevinnlab-birme.minio-minio.auto.prod.osaas.io is processed and output uploaded to bucket "output" on the same MinIO server.

Example code

Example code for some common use cases is available in theexamples directory.The examples and their functions are:

initcreator creates typical init segments (ftyp + moov) for different video andaudio codecs
resegmenter reads a segmented file (CMAF track) and resegments it with othersegment durations usingFullSample
segmenter takes a progressive mp4 file and creates init and media segments from it.This tool has been extended to support generation of segments with multiple tracks as wellas reading and writingmdat in lazy mode
multitrack parses a fragmented file with multiple tracks
combine-segs combines single-track init and media segments into multi-track segments
add-sidx adds a top-level sidx box describing the segments of a fragmented files.

Packages

The top-level packages in the mp4ff module are

mp4 provides support for for parsing (called Decode) and writing (Encode) a plethor of mp4 boxes.It also contains helper functions for extracting, encrypting, dectrypting samples and a lot more.
avc deals with AVC (aka H.264) video in themp4ff/avc package including parsing of SPS and PPS,and finding start-codes in Annex B byte streams.
hevc provides structures and functions for dealing with HEVC video and its packaging.
vvc provides structures and functions for dealing with VVC video and its packaging.
sei provides support for handling Supplementary Enhancement Information (SEI) such as timestampsfor AVC and HEVC video.
av1 provides basic support for AV1 video packaging
aac provides support for AAC audio. This includes handling ADTS headers which is commonfor AAC inside MPEG-2 TS streams.
bits provides bit-wise and byte-wise readers and writers used by the other packages.

Structure and usage

mp4.File and its composition

The top level structure for both non-fragmented and fragmented mp4 files ismp4.File.

In a progressive (non-fragmented)mp4.File, the top-level attributes Ftyp, Moov, and Mdat point to the corresponding boxes.

A fragmentedmp4.File can be more or less complete, like a single init segment,one or more media segments, or a combination of both, like a CMAF track which rendersinto a playable one-track asset. It can also have multiple tracks.For fragmented files, the following high-level attributes are used:

Init contains aftyp and amoov box and provides the general metadata for a fragmented file.It corresponds to a CMAF header. It can also contain one or moresidx boxes.
Segments is a slice ofMediaSegment which start with an optionalstyp box, possibly one or moresidxboxes and then one or moreFragments.
Fragment is a mp4 fragment with exactly onemoof box followed by amdat box where the lattercontains the media data. It can have one or moretrun boxes containing the metadatafor the samples. The fragment can start with one or moreemsg boxes.

It should be noted that it is sometimes hard to decide what should belong to a Segment or Fragment.

All child boxes of container boxes such asMoovBox are listed in theChildren attribute, but themost prominent child boxes have direct links with names which makes it possible to write a path suchas

fragment.Moof.Traf.Trun

to access the (only)trun box in a fragment with only onetraf box, or

fragment.Moof.Trafs[1].Trun[1]

to get the secondtrun of the secondtraf box (provided that they exist). Care must betaken to assert that none of the intermediate pointers are nil to avoidpanic.

Creating new fragmented files

A typical use case is to generate a fragmented file consisting of an init segmentfollowed by a series of media segments.

The first step is to create the init segment. This is done in three steps as can be seen inexamples/initcreator:

init:=mp4.CreateEmptyInit()init.AddEmptyTrack(timescale,mediatype,language)init.Moov.Trak.SetHEVCDescriptor("hvc1",vpsNALUs,spsNALUs,ppsNALUs)

Here the third step fills in codec-specific parameters into the sample descriptor of the single track.Multiple tracks are also available via the slice attributeTraks instead ofTrak.

The second step is to start producing media segments. They should use the timescale thatwas set when creating the init segment. Generally, that timescale should be chosen so that thesample durations have exact values without rounding errors, e.g. 48000 for 48kHz audio.

A media segment contains one or more fragments, where each fragment has amoof and amdat box.If all samples are available before the segment is created, one can use a singlefragment in each segment. Example code for this can be found inexamples/segmenter.For low-latency MPEG-DASH generation, short-duration fragments are added to the segment as thecorresponding media samples become available.

A simple, but not optimal, way of creating a media segment is to first create a slice ofFullSample with the data needed.The definition ofmp4.FullSample is

mp4.FullSample{Sample: mp4.Sample{Flagsuint32// Flag sync sample etcDuruint32// Sample duration in mdhd timescaleSizeuint32// Size of sample dataCtoint32// Signed composition time offset },DecodeTimeuint64// Absolute decode time (offset + accumulated sample Dur)Data       []byte// Sample data}

Themp4.Sample part is what will be written into thetrun box.DecodeTime is the media timeline accumulated time.TheDecodeTime value of the first sample of a fragment, willbe set as theBaseMediaDecodeTime in thetfdt box.

Once a number of such full samples are available, they can be added to a media segment like

seg:=mp4.NewMediaSegment()frag:=mp4.CreateFragment(uint32(segNr),mp4.DefaultTrakID)seg.AddFragment(frag)for_,sample:=rangesamples {frag.AddFullSample(sample)}

This segment can finally be output to aw io.Writer as

err:=seg.Encode(w)

or to asw bits.SliceWriter as

err:=seg.EncodeSW(sw)

For multi-track segments, the code is a bit more involved. Please have a look atexamples/segmenterto see how it is done. A more optimal way of handling media sample isto handle them lazily, or using intervals, as explained next.

Lazy decoding and writing of mdat data

For video and audio, the dominating part of a mp4 file is the media data which is storedin one or moremdat boxes. In some cases, for example when segmenting large progressivefiles, it is much more memory efficient to just read the movie or fragment metadatafrom themoov ormoof box and defer the reading of the media data from themdat boxto later.

For decoding, this is supported by runningmp4.DecodeFile() in lazy mode as

parsedMp4,err=mp4.DecodeFile(ifd,mp4.WithDecodeMode(mp4.DecModeLazyMdat))

In this case, the media data of themdat box will not be read, but only its size is being saved.To read or copy the actual data corresponding to a sample, one must calculate thecorresponding byte range and either call

func (m*MdatBox)ReadData(start,sizeint64,rs io.ReadSeeker) ([]byte,error)

func (m*MdatBox)CopyData(start,sizeint64,rs io.ReadSeeker,w io.Writer) (nrWrittenint64,errerror)

Example code for this, including lazy writing ofmdat, can be found inexamples/segmenterwith thelazy mode set.

More efficient I/O using SliceReader and SliceWriter

The use of the interfacesio.Reader andio.Writer for reading and writing boxes gives a lot offlexibility, but is not optimal when it comes to memory allocation. In particular, theRead(p []byte) method needs a slicep of the proper size to read data, which leads to alot of allocations and copying of data.In order to achieve better performance, it is advantageous to read the full top level boxes intoone, or a few, slices and decode these.

To enable that mode, version 0.27 of the code introducedDecode<X>SR(sr bits.SliceReader)methods to every box<X> wheremp4ff.bits.SliceReader is an interface.For example, theTrunBox gets the methodDecodeTrunSR(sr bits.SliceReader) in addition to its oldDecodeTrun(r io.Reader) method. Thebits.SliceReader interface provides methods to read all kindsof data structures from an underlying slice of bytes. It has an implementationbits.FixedSliceReaderwhich uses a fixed-size slice as underlying slice, but one could consider implementing a growing versionwhich would get its data from some external source.

The memory allocation and speed improvements achieved by this may vary, but should be substantial,especially compared to versions before 0.27 which used an extraio.LimitReader layer.

Fur further reduction of memory allocation, use a buffered top-level reader, especially whenwhen reading themdat box of a progressive file.

Benchmarks

To investigate the efficiency of the new SliceReader and SliceWriter methods, benchmarks have been done.The benchmarks are defined inthe filemp4/benchmarks_test.go andmp4/benchmarks_srw_test.go.ForDecodeFile, one can see a big improvement by going from version0.26 to version 0.27 which both use theio.Reader interfacebut another big increase by using theSliceReader source.The latter benchmarks are calledBenchmarkDecodeFileSR but havehere been given the same name, for easy comparison.Note that the allocations here refers to the heap allocationsthat are done inside the benchmark loop. Outside that loop,a slice is allocated to keep the input data.

ForEncodeFile, one can see that v0.27 is actually worsethan v0.26 when used with theio.Writer interface. That isbecause the code was restructured so that all writes govia theSliceWriter layer in order to reduce code duplication.However, if instead using theSliceWriter methods directly,there is a big relative gain in allocations as can be seen inthe last column.

name \ time/op	v0.26	v0.27	v0.27-srw
DecodeFile/1.m4s-16	21.9µs	6.7µs	2.6µs
DecodeFile/prog_8s.mp4-16	143µs	48µs	16µs
EncodeFile/1.m4s-16	1.70µs	2.14µs	1.50µs
EncodeFile/prog_8s.mp4-16	15.7µs	18.4µs	12.9µs

name \ alloc/op	v0.26	v0.27	v0.27-srw
DecodeFile/1.m4s-16	120kB	28kB	2kB
DecodeFile/prog_8s.mp4-16	906kB	207kB	12kB
EncodeFile/1.m4s-16	1.16kB	1.39kB	0.08kB
EncodeFile/prog_8s.mp4-16	6.84kB	8.30kB	0.05kB

name \ allocs/op	v0.26	v0.27	v0.27-srw
DecodeFile/1.m4s-16	98.0	42.0	34.0
DecodeFile/prog_8s.mp4-16	454	180	169
EncodeFile/1.m4s-16	15.0	15.0	3.0
EncodeFile/prog_8s.mp4-16	101	86	1

More about mp4 boxes

Themp4ff.mp4 contains a lot of box implementations.

Box structure and interface

Most boxes have their own file named after the box, but in some cases, there may be multiple boxesthat have the same content, and the code file then has a generic name likemp4/visualsampleentry.go.

There is an interface for boxes:Box specificied inmp4.box.go,

The interfaces define common Box methods including encode (writing),but not the decode (parsing) methods which have distinct names for each box type and aredispatched from the parsed box name.

That dispatch based on box name is defined by the tablesmp4.decodersSR andmp4.decodersfor the functionsmp4.DecodeBoxSR() andmp4.DecodeBox(), respectively.TheSR variant should normally be used for better performance.If a box name is unkonwn, it will result in anUnknownBox being created.

How to implement a new box

To implement a new boxfooo, the following is needed.

Create a filefooo.go and create a struct typeFoooBox.

FoooBox must implement the Box interface methods:

Type()Size()Encode(wio.Writer)EncodeSW(swbits.SliceWriter)Info()

It also needs its own decode methodsDecodeFoooSR andDecodeFooo,which must be added in thedecodersSR map anddecoders map, respectivelyFor a simple example, look at thePrftBox inprft.go.

A test filefooo_test.go should also have a test using the methodboxDiffAfterEncodeAndDecodeto check that the box information is equal after encoding and decoding.

Direct changes of attributes

Many attributes are public and can therefore be changed in freely.The advantage of this is that it is possible to write code that can manipulate boxesin many different ways, but one must be cautious to avoid breaking links to sub boxes orcreate inconsistent states in the boxes.

As an example, container boxes such asTrafBox have a methodAddChild whichadds a box toChildren, its slice of children boxes, but also sets a specificmember reference such asTfdt to point to that box. IfChildren is manipulateddirectly, that link may no longer be valid.

Encoding modes and optimizations

For fragmented files, one can choose to either encode all boxes in amp4.File, or only codethe ones which are included in the init and media segments. The attribute that controls thatis calledFragEncMode.Another attributeEncOptimize controls possible optimizations of the file encoding process.Currently, there is only one possible optimization calledOptimizeTrun.It can reduce the size of theTrunBox by finding and writing defaultvalues in theTfhdBox and omitting the corresponding values from theTrunBox.Note that this may change the size of all ancestor boxes oftrun.

Sample Number Offset

Following the ISOBMFF standard, sample numbers and other numbers start at 1 (one-based).This applies to arguments of functions and methods.The actual storage in slices is zero-based, so sample nr 1 has index 0 in the corresponding slice.

Contributing

When contributing to this project, please ensure that commit messages follow theConventional Commits specification. This helps maintain a consistent and readable commit history.

Examples of conventional commit messages:

feat: add support for VVC video codec
fix: resolve memory leak in fragment processing
docs: update API documentation for mp4.File
chore: update dependencies to latest versions

Stability

The APIs should be fairly stable, but minor non-backwards-compatible changes may happen until version 1.

Specifications

The main specification for the MP4 file format is the ISO Base Media File Format (ISOBMFF) standardISO/IEC 14496-12 7th edition 2021. Some boxes are specified in other standards, as should be commentedin the code.

LICENSE

MIT, seeLICENSE.

ChangeLog and Versions

SeeCHANGELOG.md.

Support

Join ourcommunity on Slack where you can post any questions regarding any of our open source projects. Eyevinn's consulting business can also offer you:

Further development of this component
Customization and integration of this component into your platform
Support and maintenance agreement

Contactsales@eyevinn.se if you are interested.

About Eyevinn Technology

Eyevinn Technology is an independent consultant firm specialized in video and streaming. Independent in a way that we are not commercially tied to any platform or technology vendor. As our way to innovate and push the industry forward we develop proof-of-concepts and tools. The things we learn and the code we write we share with the industry inblogs and by open sourcing the code we have written.

Want to know more about Eyevinn and how it is to work here. Contact us atwork@eyevinn.se!

About

Library and tools for working with MP4 files containing video, audio, subtitles, or metadata. The focus is on fragmented files. Includes mp4ff-info, mp4ff-encrypt, mp4ff-decrypt and other tools.

dev.to/video/mp4ff-beyond-mp4-boxes-2bee