Data, Video and Image Compression
ABSTRACT:
Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. For example, this article could be encoded with fewer bits if one were to accept the convention that the word "compression" be encoded as "comp." One popular instance of compression with which many computer users are familiar is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many source files in a single destination output file.
As with any communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver.
Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications
. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme, and the computational resources required to compress and uncompress the data.
Lossless vs. Lossy Compression
Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely with fewer errors. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small.
Another kind of compression, called lossy data compressionperceptual coding, is possible if some loss of fidelity is acceptable. Generally, a lossy data compression will be guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG image compression works in part by "rounding off" some of this less-important information. Lossy data compression provides a way to obtain the best fidelity for a given amount of compression. In some cases, transparent (unnoticeable) compression is desired; in other cases, fidelity is sacrificed to reduce the amount of data as much as possible.
Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.
However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually result in an expansion, as will attempts to compress encrypted data.
In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example always removes the last byte of a file, will always compress a file up to the point where it is empty.
An example of lossless vs. lossy compression is the following string:
25.888888888
This string can be compressed as:
25.[9]8
Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just written in a smaller form. In a lossy system, using
26
instead, the original data is lost, at the benefit of a smaller file size.
Video compression
Video compression refers to reducing the quantity of data used to represent video images and is a straight forward combination of image compression and motion compensation This article deals with its applications: compressed video can effectively reduce the bandwidth required to transmit digital video via terrestrial broadcast, via cable, or via satellite services.
Most video compression is lossy, i.e. it operates on the premise that much of the data present before compression is not necessary for achieving good perceptual quality. For example, DVDs use a video coding standard called MPEG-2 that can compress ~2 hours of video data by 15 to 30 times while still producing a picture quality that is generally considered high quality for standard-definition video. Video compression, like data compression, is a tradeoff between disk space, video quality and the cost of hardware required to decompress the video in a reasonable time. However, if the video is overcompressed in a lossy manner, visible (and sometimes distracting) artifacts can appear.
Method
Video compression typically operates on square-shaped groups of neighboring pixels, often called a macroblock. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression codec (encode-decode scheme) sends only the differences within those blocks. This works extremely well if the video has no motion. A still frame of text, for example, can be repeated with very little transmitted data. In areas of video with more motion, more pixels change from one frame to the next. When more pixels change, the video compression scheme must send more data to keep up with the larger number of pixels that are changing. If the video content includes an explosion, flames, a flock of thousands of birds, or any other image with a great deal of high frequency detail, the quality will decrease,
Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial (horizontal and vertical) directions of the moving pictures, and one dimension represents the time domain. A data frame is a set of all pixels that correspond to a single time moment. Basically, a frame is the same as a still picture.
Video data contains spatial and temporal redundancy. Similarities can thus be encoded by merely registering differences within a frame (spatial) and/or between frames (temporal). Spatial encoding is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in color as easily as it can changes in brightness and so very similar areas of color can be "averaged out" in a similar way to jpeg images (JPEG image compression FAQ, part 1/) With temporal compression only the changes from one frame to the next are encoded as often a large number of the pixels will be the same on a series of frames.
Some forms of data compression are lossless. This means that when the data is decompressed, the result is a bit-for-bit perfect match with the original. While lossless compression of video is possible, it is rarely used. This is because any lossless compression system will sometimes result in a file (or portions of) that is as large and/or has the same data rate as the uncompressed original. As a result, all hardware in a lossless system would have to be able to run fast enough to handle uncompressed video as well. This eliminates much of the benefit of compressing the data in the first place. For example, digital videotape can't vary its data rate easily so dealing with short bursts of maximum-data-rate video would be more complicated than something that was fixed at the maximum rate all the time. At its most basic level, compression is performed when an input video stream is analyzed and information that is indiscernible to the viewer is discarded. Each event is then assigned a code - commonly occurring events are assigned few bits and rare events will have codes more bits. These steps are commonly called signal analysis, quantization and variable length encoding respectively. There are four methods for compression, discrete cosine transform (DCT), vector quantization (VQ), fractal compression, and discrete wavelet transform (DWT).
Discrete cosine transform is a lossy compression algorithm that samples an image at regular intervals, analyzes the frequency components present in the sample, and discards those frequencies which do not affect the image as the human eye perceives it. DCT is the basis of standards such as JPEG, MPEG, H.261, and H.263. We covered the definition of both DCT and wavelets in our tutorial on Wavelets Theory.
Vector quantization is a lossy compression that looks at an array of data, instead of individual values. It can then generalize what it sees, compressing redundant data, while at the same time retaining the desired object or data stream's original intent.
Intraframe vs interframe compression
One of the most powerful techniques for compressing video is interframe compression. Interframe compression uses one or more earlier or later frames in a sequence to compress the current frame, while intraframe compression uses only the current frame, which is effectively image compression.
The most commonly used method works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If sections of the frame move in a simple manner, the compressor emits a (slightly longer) command that tells the decompresser to shift, rotate, lighten, or darken the copy -- a longer command, but still much shorter than intraframe compression. Interframe compression works well for programs that will simply be played back by the viewer, but can cause problems if the video sequence needs to be edited.
Since interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed properly. Some video formats, such as DV, compress each frame independently using intraframe compression. Making 'cuts' in intraframe-compressed video is almost as easy as editing uncompressed video -- one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe compression is that with intraframe systems, each frame uses a similar amount of data. In most interframe systems, certain frames (such as "I frames" in MPEG-2) aren't allowed to copy data from other frames, and so require much more data than other frames nearby.
It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same picture quality.
Video codec
A video codec is a device or software that enables video compression and/or decompression for digital video. The compression usually employs lossy data compression. Historically, video was stored as an analog signal on magnetictape. Around the time when the compact disc entered the market as a digital-format replacement for analog audio, it became feasible to also begin storing and using video in digital form, and a variety of such technologies began to emerge.
There is a complex balance between the video quality, the quantity of the data needed to represent it, also known as the bit rate, the complexity of the encoding and decoding algorithms, robustness to data losses and errors, ease of editing, random access, the state of the art of compression algorithm design, end-to-end delay, and a number of other factors.
Intra Frame Coding
The term intra frame coding refers to the fact that the various lossless and lossy compression techniques are performed relative to information that is contained only within the current frame, and not relative to any other frame in the video sequence. In other words, no temporal processing is performed outside of the current picture or frame. This mode will be described first because it is simpler, and because non-intra coding techniques are extensions to these basics. Figure 1 shows a block diagram of a basic video encoder for intra frames only. It turns out that this block diagram is very similar to that of a JPEG still image video encoder, with only slight implementation detail differences.
The potential ramifications of this similarity will be discussed later. The basic processing blocks shown are the video filter, discrete cosine transform, DCT coefficient quantizer, and run-length amplitude/variable length coder. These blocks are described individually in the sections below or have already been described in JPEG Compression.
This is a basic Intra Frame Coding Scheme is as follows:
· Macroblocks are 16x16 pixel areas on Y plane of original image.
A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block.
In the example HDTV data rate calculation shown previously, the pixels were represented as 8-bit values for each of the primary colors red, green, and blue. It turns out that while this may be good for high performance computer generated graphics, it is wasteful in most video compression applications. Research into the Human Visual System (HVS) has shown that the eye is most sensitive to changes in luminance, and less sensitive to variations in chrominance. Since absolute compression is the name of the game, it makes sense that MPEG should operate on a color space that can effectively take advantage of the eye¹s different sensitivity to luminance and chrominance information. As such, H/261 (and MPEG) uses the YCbCr color space to represent the data values instead of RGB, where Y is the luminance signal, Cb is the blue color difference signal, and Cr is the red color difference signal.
A macroblock can be represented in several different manners when referring to the YCbCr color space. Figure 7.13 below shows 3 formats known as 4:4:4, 4:2:2, and 4:2:0 video. 4:4:4 is full bandwidth YCbCr video, and each macroblock consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks. Being full bandwidth, this format contains as much information as the data would if it were in the RGB color space. 4:2:2 contains half as much chrominance information as 4:4:4, and 4:2:0 contains one quarter of the chrominance information. Although MPEG-2 has provisions to handle the higher chrominance formats for professional applications, most consumer level products will use the normal 4:2:0 mode.
Macroblock Video Formats
Because of the efficient manner of luminance and chrominance representation, the 4:2:0 representation allows an immediate data reduction from 12 blocks/macroblock to 6 blocks/macroblock, or 2:1 compared to full bandwidth representations such as 4:4:4 or RGB. To generate this format without generating color aliases or artifacts requires that the chrominance signals be filtered.
The Macroblock is coded as follows:
o Many macroblocks will be exact matches (or close enough). So send address of each block in image -> Addr
o Sometimes no good match can be found, so send INTRA block -> Type
o Will want to vary the quantization to fine tune compression, so send quantization value -> Quant
o Motion vector -> vector
o Some blocks in macroblock will match well, others match poorly. So send bitmask indicating which blocks are present (Coded Block Pattern, or CBP).
o Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG.
· Quantization is by constant value for all DCT coefficients (i.e., no quantization table as in JPEG).
Video codec design
Video codecs seek to represent a fundamentally analog data set in a digital way. Because of the design of analog video signals, which represent luma and color information separately, a common first step in image compression in codec design is to represent and store the image in a YCbCr color space. The conversion to YCbCr provides two benefits: first, it improves compressibility by providing decorrelation of the color signals; and second, it separates the luma signal, which is perceptually much more important, from the chroma signal, which is less perceptually important and which can be represented at lower resolution to achieve more efficient data compression. It is common to represent the ratios of information stored in these different channels in the following way Y:Cb:Cr. Professional video codecs designed to function at much higher bitrates and to record a greater amount of color information for post-production manipulation sample in 3:1:1 (uncommon), 4:2:2 and 4:4:4 ratios.
Commonly used standards and codecs
A variety of codecs can be implemented with relative ease on PCs and in consumer electronics equipment. It is therefore possible for multiple codecs to be available in the same product, avoiding the need to choose a single dominant codec for compatibility reasons. In the end it seems unlikely that one codec will replace them all. Some widely-used video codecs are listed below, starting with a chronological-order list of the ones specified in international standards.
H.261: Used primarily in older videoconferencing and videotelephony products. H.261, developed by the ITU-T, was the first practical digital video compression standard. Essentially all subsequent standard video codec designs are based on it. It included such well-established concepts as YCbCr color representation, the 4:2:0 sampling format, 8-bit sample precision, 16x16 macroblocks, block-wise motion compensation, 8x8 block-wise discrete cosine transformation, zig-zag coefficient scanning, scalar quantization
MPEG stands for the Moving Picture Experts Group. MPEG is an ISO/IEC working group, established in 1988 to develop standards for digital audio and video formats. There are five MPEG standards being used or in development. Each compression standard was designed with a specific application and bit rate in mind, although MPEG compression scales well with increased bit rates. They include:
MPEG-1
Designed for up to 1.5 Mbit/sec
Standard for the compression of moving pictures and audio. This was based on CD-ROM video applications, and is a popular standard for video on the Internet, transmitted as .mpg files. In addition, level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for VideoCD, the most popular video distribution format thoughout much of Asia.
Designed for up to 1.5 Mbit/sec
Standard for the compression of moving pictures and audio. This was based on CD-ROM video applications, and is a popular standard for video on the Internet, transmitted as .mpg files. In addition, level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for VideoCD, the most popular video distribution format thoughout much of Asia.
MPEG-2
Designed for between 1.5 and 15 Mbit/sec
Standard on which Digital Television set top boxes and DVD compression is based. It is based on MPEG-1, but designed for the compression and transmission of digital broadcast television. The most significant enhancement from MPEG-1 is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an MPEG-3.
Designed for between 1.5 and 15 Mbit/sec
Standard on which Digital Television set top boxes and DVD compression is based. It is based on MPEG-1, but designed for the compression and transmission of digital broadcast television. The most significant enhancement from MPEG-1 is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an MPEG-3.
MPEG-4
Standard for multimedia and Web compression. MPEG-4 is based on object-based compression, similar in nature to the Virtual Reality Modeling Language. Individual objects within a scene are tracked separately and compressed together to create an MPEG4 file. This results in very efficient compression that is very scalable, from low bit rates to very high. It also allows developers to control objects independently in a scene, and therefore introduce interactivity.
Standard for multimedia and Web compression. MPEG-4 is based on object-based compression, similar in nature to the Virtual Reality Modeling Language. Individual objects within a scene are tracked separately and compressed together to create an MPEG4 file. This results in very efficient compression that is very scalable, from low bit rates to very high. It also allows developers to control objects independently in a scene, and therefore introduce interactivity.
MPEG-7 - this standard, currently under development, is also called the Multimedia Content Description Interface. When released, the group hopes the standard will provide a framework for multimedia content that will include information on content manipulation, filtering and personalization, as well as the integrity and security of the content. Contrary to the previous MPEG standards, which described actual content, MPEG-7 will represent information about the content.
MPEG-21 - work on this standard, also called the Multimedia Framework, has just begun. MPEG-21 will attempt to describe the elements needed to build an infrastructure for the delivery and consumption of multimedia content, and how they will relate to each other.
JPEG stands for Joint Photographic Experts Group. It is also an ISO/IEC working group, but works to build standards for continuous tone image coding. JPEG is a lossy compression technique used for full-color or gray-scale images, by exploiting the fact that the human eye will not notice small color changes.
JPEG 2000 is an initiative that will provide an image coding system using compression techniques based on the use of wavelet technology.
DV is a high-resolution digital video format used with video cameras and camcorders. The standard uses DCT to compress the pixel data and is a form of lossy compression. The resulting video stream is transferred from the recording device via FireWire (IEEE 1394), a high-speed serial bus capable of transferring data up to 50 MB/sec.
H.261 is an ITU standard designed for two-way communication over ISDN lines (video conferencing) and supports data rates which are multiples of 64Kbit/s. The algorithm is based on DCT and can be implemented in hardware or software and uses intraframe and interframe compression. H.261 supports CIF and QCIF resolutions.
H.263 is based on H.261 with enhancements that improve video quality over modems. It supports CIF, QCIF, SQCIF, 4CIF and 16CIF resolutions.
Image compression
Image compression is the application of Data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form.
Image compression can be lossy or lossless. Lossless compression is sometimes preferred for artificial images such as technical drawings, icons or comics. This is because lossy compression methods, especially when used at low bit rates, introduce compression artifacts. Lossless compression methods may also be preferred for high value content, such as medical imagery or image scans made for archival purposes. Lossy methods are especially suitable for natural images such as photos in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate.
Methods for lossless image compression are:
· Run-length encoding – used as default method in PCXand as one of possible in BMP, TGA, TIFF
· Entropy coding
Methods for lossy compression:
· Reducing the color spaceto the most common colors in the image. The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette. This method can be combined with ditheringto avoid posterization.
· Chroma subsampling. This takes advantage of the fact that the eye perceives brightness more sharply than color, by dropping half or more of the chrominance information in the image.
· Transform coding. This is the most commonly used method. A Fourier-related transform such as DCTor the wavelet transfrom by quantization and entropy encoding
· Fractal compression.
The best image quality at a given bit-rate (or compression rate) is the main goal of image compression. However, there are other important properties of image compression schemes:
Scalability generally refers to a quality reduction achieved by manipulation of the bitstream or file (without decompression and re-compression). Other names for scalability are progressive coding or embedded bitstreams. Despite its contrary nature, scalability can also be found in lossless codecs, usually in form of coarse-to-fine pixel scans. Scalability is especially useful for previewing images while downloading them (e.g. in a web browser) or for providing variable quality access to e.g. databases. There are several types of scalability:
· Quality progressive or layer progressive: The bitstream successively refines the reconstructed image.
· Resolution progressive: First encode a lower image resolution; then encode the difference to higher resolutions.
· Component progressive: First encode grey; then color.
Region of interest coding. Certain parts of the image are encoded with higher quality than others. This can be combined with scalability (encode these parts first, others later).
Meta information. Compressed data can contain information about the image which can be used to categorize, search or browse images. Such information can include color and texture statistics, small preview images and author/copyright information.
Processing power. Compression algorithms require different amounts of processing power to encode and decode. Some high compression algorithms require high processing power.
The quality of a compression method is often measured by the Peak signal-to-noise ratio. It measures the amount of noise introduced through a lossy compression of the image. However, the subjective judgement of the viewer is also regarded as an important, perhaps the most important, measure.
The usual steps involved in compressing an image are
- Specifying the Rate (bits available) and Distortion (tolerable error) parameters for the target image.
- Dividing the image data into various classes, based on their importance.
- Dividing the available bit budget among these classes, such that the distortion is a minimum.
- Quantize each class separately using the bit allocation information derived in step 3.
- Encode each class separately using an entropy coder and write to the file.
Compression artifact
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Original image, with good color grade.
Loss of edge clarity and tone "fuzziness" in JPEG compression.
compression artifact (or artefact) is the result of an aggressive data compression scheme applied to an image, audio, or video that discards some data which is determined by an algorithm to be of lesser importance to the overall content but which is nonetheless discernible and objectionable to the user. Artifacts in time-dependent data such as audio or video are often a result of the latent error in lossy data compression.
Recent Comments