Event:

Join us at Xplor 2024 SUMMIT September 24-26 in Orlando, Florida

Print Streams

From Xplor Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
EDBOK Guide
EDBOK-book cover.png
Body of Knowledge
Document Production Workflow
Lifecycle Category
Print Streams
Content Contributor(s)
William McCalpin m-edp
Original Publication
August 2014
Copyright
© 2014 by Xplor International
Content License
CC BY-NC-ND 4.0

What are Print Streams?

A print stream is a file that contains a series of bytes that are intended to print or be displayed in a print-like setting. The print stream describes what text and/or images to place on paper or other physical medium. It also describes how the text and/ or images are to be placed on the medium. The result of the print stream is ultimately intended for the human reader.

By “print-like”, we mean oriented to the page. That is, while print streams like AFP and PostScript are clearly oriented towards the placement of text and images on paper, a print stream like PDF is generally displayed on a computer screen – yet PDF is also a print stream because the layout of the content presumes a page-like setting: there is a page size, there is an orientation, and there are other descriptors to facilitate print.

Presentation formats such as HTML and XML are not page-centric; therefore, while they contain presentation commands, they are not “print streams” because they are not laid out on a “page”. When you print an HTML page, note that the HTML does not have content to the bottom of the “page”, but is as long as it is. If you print a long HTML page it will print over several physical pieces of paper, and you have no control over where the page break is. So, HTML is a presentation format, not a “print stream”.

In the extremely simple case, a print stream might be just one or more records in a file that contain only printable text, e.g., “ABC”, “DEF”, “HIJ”. In practice, however, print streams are almost always printable text and images interwoven with commands that direct the physical or virtual printer to format and present the text and images in a certain way.

Raster Image Processor (RIP)

All printers except one are raster devices; whatever text and images are sent to the printer, the printer must convert to a string of bits that can be placed on the paper or medium as dots that are black or a color. Whether the printer is a laser (electrophotographic) printer, inkjet printer, electro-erosion printer, or some other technology, the content is laid on the medium as a string of bits that are repeated row after row – with one exception.

The one exception is a pen plotter, a specialized print device that uses a pen to draw text and images on the medium. Normally used for scientific mapping or engineering plans, the pen plotter does not convert text and images to bits; rather it responds to vector commands that tell the printer to set the pen at location X,Y, lower the pen to the medium, then move the pen while drawing in a straight line or curve. Since pen plotters are never used as transaction printers, we will not discuss them further.

All transaction printers have a component (whether in hardware or software) that is called the RIP. The RIP (pronounced “rip”, like “to tear something in two”, as opposed to pronouncing each letter as “R”-“I”-“P”) is required to convert the print stream to the raster stream that the print or display device will actually place on the medium. The RIP has a critical influence on the performance of the printer, since the bit-level arithmetic involved is computationally intensive. Indeed, in the early days of electronic printing, there were few high volume PostScript printers because of the complexity of the PostScript language. High-speed PostScript printers had to wait until the development of both high-speed CPUs and the ability to run multiple high-speed CPUs in parallel in the RIP, so that multiple pages could be ripped simultaneously. Even a print stream like AFP, which is optimized for high volume transaction printers, can be built in such a way that the RIP runs much slower than normal. That is, while a single page image is created from a single and unique bit map (a bit of an exaggeration, but close enough for our purposes), there are many ways to build a print stream to create that unique bit map. Take the case of the simple string “ABC”. It could be represented in a print stream in a variety of ways. Here are just a few:

(x,y location)ABC 
(x,y location)A(x,y location)B(x,y location)C 
(x,y location)A(relative move)B(relative move)C 
(x,y location)C(x,y location)B(x,y location)A 

While each of these examples could result in the same final bit map for the printer RIP, the amount of processing to create the bit map from these strings can vary. The first example is the simplest and almost certainly the fastest to rip. The third example is likely to be faster than the second because the relative move updates only one aspect of the current address, whereas the second example updates both the x and y aspects of the current location. And, it is a peculiarity of Xerox Metacode printers that in certain circumstances the fourth example might not print at all.

Therefore, the choice of print stream architecture as well as the quality of the software that generates the print stream is of great importance to high volume document production sites, because a “bad” print stream can seriously slow down a printer or the transform that converts one print stream to another.

Contents of a Print Stream

A simple print stream contains text and images to be printed on a medium and the interwoven commands that direct the printer on how to format these text and images. However, for historical reasons, the environment surrounding the print stream complicates this scenario. In fact, there are a variety of components that may be found in a print stream:

  • text,
  • images,
  • formatting information for the text or images,
  • page or vertical control information for the printer,
  • printer feature instructions (change tray feed, switch from simplex to duplex, etc.),
  • references to external document objects (like fonts and images), and
  • containers that hold in-stream document objects (like fonts and images).

In fact, all text characters are actually references to an external document object – a font. The pervasiveness of text and fonts tends to lead the user to forget that a reference to a text character is, in a way, the same as a reference to an external image or other object. This fundamental requirement to deal with internal and external resources is a critical part of transaction printing, and is a differentiator between it and other forms of data processing.

Text

The text consists of letters or symbols from a writing scheme. In the case of English, the letters are “A”, “B”, “C”, and so on, but could just as easily be Greek or Arabic letters, or Chinese or Japanese symbols. It is the ability for text to refer to external document objects called fonts that makes it possible for a single printer to display every language known to humankind.

Images

The term image in this section is a general reference to non-text items, such as a photo or line drawing or logo or signature. For transaction printers there are two types of images: raster and vector.

Raster images are fundamentally bit maps of the image; that is, a two-dimensional array of bits that represent the image. Raster images are often, but not always, compressed. There are a number of compression schemes that can sharply reduce the amount of storage that an uncompressed bit map requires, such as one dimensional, CCITT Group 3/two dimensional (used in faxes), CCITT Group 4 (used in scanners and archives), and JPEG (used in photographs). The choice of compression scheme is dependent on whether the image is color or monochrome, largely text or a photograph, or lossless or lossy (this refers to whether or not unimportant bits are dropped during the compression process).

Raster images are easy to process in the RIP, but the sheer size of the bit map may cause performance issues.

Vector images are images that are described by a series of vector commands. Instead of providing a bit map of a circle on the page, the print stream may contain a vector command that states “at location x,y, draw a circle with diameter Z and with a rule width of Q”. This command would typically be just a few bytes rather than the thousands of bytes required to represent the same image as a bit map.

Since all transaction printers are raster-based, the RIP must interpret the vector command to make the bit map of the circle; however, because of the much smaller amount of data involved, the RIP process and subsequent printing may take a smaller amount of time than using the raster image.

The ability to use vector commands is one of the reasons that PostScript was so successful for the graphic arts market – the page designer could use tools to create hundreds of complex images for the document master, yet because these vector commands were each small in size, the PostScript files were not unduly large. It was the number of vector commands in the PostScript file that caused the slow performance of early PostScript printers, a problem that has long been addressed.

NOTE: We are using the term image as a generic term. However, different vendors in the transaction document space have their own terms, which may conflict.

For example, in AFP, an image is always a raster image, but a graphic is always a vector image. For Metacode printers, there was no vector support, so graphic and image referred to raster images that were handled in two different ways. Windows Metafiles can contain both raster images and vector images, which makes them closer to an AFP Page Segment (a container for both types of images).

Formatting Information

The page is a two-dimensional framework on which to lay out text and images. Either an explicit or implied location for any content on the page will be provided in the  print stream. In the original Xerox Metacode model, the X and Y addresses referred to physical attributes of the printer – the scan and dot directions. The scan address was a function of how the paper moved through the machine – a scan line was the line of dots that the laser made across the page image on the drum before the drum rotated to the next line of dots.

However, all other print streams that have an architecture (besides line data) have virtualized the page so that the X and Y addresses are independent of the physical properties of the print device. In the case of AFP, there is the inline address and the baseline address, with inline referring to the direction that the text moves when one character follows another. This virtualization makes it easy to print all modern alphabets, since the inline direction is to left to right for Latin-based alphabets (and many others), right to left for Hebrew and Arabic, and from the top down for some

Chinese and Japanese documents. Once the virtual page has been described, then the RIP takes care of the text placement.

Other formatting instructions include defining the inline and baseline directions (to use the AFP terms), defining which font is to be used, defining which color is to be used, defining the location and size of an image, and defining the dot resolution of the virtual page.

Page or Vertical Control Information

Nearly all print streams contain commands to set page breaks and location on the page. However, the original print format – line data – does not have a command set. Therefore, elements external to the actual text data were used to control page breaks and text placement.

In the case of IBM mainframes, there are carriage controls. These are a single byte at the beginning of each record that tells the printer what to do in terms of pagination and vertical placement. A skip to channel 1 may tell the printer to go to the top of a new page while a print and skip 2 will tell the printer to lay down the current record as a line of text and then skip two lines (leaving a blank line) before laying down more text.

In the case of environments other than the IBM mainframe, the print streams sit in an environment in which formatting elements from the typewriter are used: form feed, line feed, carriage return, and others. These elements were required to print line data (i.e., text) on simple decentralized printers, but they are largely ignored by modern print streams and serve primarily as delimiters between one record and the next.

Printer Feature Instructions

Printers have more features than just placing text and images on the page. For example, cut sheet printers often have more than one tray to feed paper from, so there are commands that may be external to the print stream to indicate which tray is the source tray. In the case of Xerox Metacode, the tray feed selection was done in the JDE (Job Descriptor Element) or by DJDE (Dynamic JDE) totally external to the Metacode environment. In the case of AFP, tray selection is done outside of a page or document object, but is part of the formdef, an external object to the print data. In HP PCL, the tray pull command is one of a group of escape sequence commands related to control of the paper (Esc&l#H).

Printer features and how they are implemented are highly dependent on the printer. In the first release of PostScript, because PostScript printers at that point were being used only to create master documents in low volumes, there was no tray feed command because PostScript printers had only one tray. And PDF, whose primary destination is to be displayed in the Acrobat Reader as opposed to printed on a printer, has no tray pull command at all, although some workarounds have been implemented.

References to External Document Objects

The section on Document Objects describes the many kinds and varied uses of objects external to the print stream. Some objects, like fonts and forms, are printable, while other objects, like AFP formdefs and color tables, don’t print directly themselves but modify how other content is printed.

These document objects are often highly dependent on either the composition system (if they are source objects) or the printing system (if they are used in the printing process). For this reason, the way in which the print stream refers to these objects will also vary.

In PDF, the use of external document objects is discouraged, and in PDF/A, external document objects are forbidden by the standard. This is done because a PDF document is intended to be as portable as possible, and having external dependencies would greatly complicate the ability to accurately represent the document on any platform.

Containers that Hold In-stream Document Objects

Most print streams are also able to carry document objects in-stream as part of a print file, but the printer does not simply print this object - it stores it somewhere so that it can be reused. Given that host computers typically have much more storage and library capabilities than a printer, it is usually more efficient to store document objects on the host computer creating the print stream or the one sending the print stream to the printer than on the printer.

If there is a driver in control of the printer, as is the case of AFP, then the driver on the host has total control over what objects are stored on the printer. It is easy for the driver to query the printer to see if an object is needed and if so, to download the object to the printer in-stream.

In the case of HP PCL printers attached to Windows there is no single host in control of the printer, but potentially dozens or hundreds of computers creating PCL for that one printer. In this case, no software that creates the PCL can know what objects will be on the printer, so each PCL print job clears all erasable memory on the printer at the start of the job and downloads all objects needed for the print job in-stream. Depending on the number of redundant objects (like a company logo that appears on many printed documents) this latter method is much less efficient because all print streams have to carry the company logo.

There are methods to download objects to a PCL printer, but without a single host/ driver controlling access to the printer, it is difficult to manage them, and as a result, most users do not bother.

Encoding Schemes

A print stream is a series of bytes organized into records and then into files. A byte might be the binary string of “0100 0001” or “1100 0001”. But how do we know what the string of bits stands for? We know this by mutual agreement on an encoding scheme.

An encoding scheme is an agreement as to what letters, numbers, and symbols are assigned to which 8-bit strings. For example, the string “0100 0001” is defined to stand for the letter “A” (capital) in the ASCII encoding scheme.

There are two major encoding schemes found in transaction printing: ASCII and EBCDIC.

EBCDIC, the Extended Binary Coded Decimal Interchange Code, is used mainly on IBM mainframe and midrange computers. While these computers are numerically in a small minority in the business world, their central importance to the core data processing of most businesses in the transaction print space gives EBCDIC a disproportionate influence on the industry. That is, almost every large insurance company, bank, credit card processor, and the like has in the back IBM mainframes handling the bulk of the data.

ASCII, the American Standard Code for Information Interchange, is the encoding scheme used on almost every other computing platform on the planet. All Windows PCs and UNIX servers in a company use ASCII while the mainframes (if present) use EBCDIC, a situation that often causes confusion.

ECBDIC and ASCII both define all the upper and lower case letters (A-Z and a-z) as well as the digits 0 through 9. In addition, they both define a number of punctuation characters, but not always the same ones. For example, the standard EBCDIC definition has a symbol for the not equal sign (≠) while the original ASCII definition does not.

However, in electronic printing, the actual character that prints is a function of the font, and most fonts have been extended to include a variety of useful characters. But, since these are extensions, these characters were not always placed into the same locations in the EBCDIC or ASCII encoding schemes, making translations from one scheme to another problematic when nonstandard characters are used.

EBCDIC Print Stream

EBCDIC Text: The original EBCDIC print stream was just text. This text might be accompanied by typewriter-style formatting controls such as line feed, carriage return, and form feed. This format is quite rare, however. Suffice it to say that if such data somehow were delivered to an EBCDIC-compatible printer, the print stream might accurately print.

1403 Line Data: So called 1403 line data is a file that contains print data in EBCDIC and in which every record of print data is preceded by a carriage control. This carriage control would be stripped away by the spooler and substituted with a Channel Command Word (CCW) that would direct the printer’s vertical motion after printing the line of text. Each line of text was limited to 132 bytes. While simple, the 1403 format was (and is) enduring because of the longevity of the 1403 printers themselves, as well as the variety of other impact printers that would take the same print stream.

3800 Mod I: The 3800 Model I was an electronic printer from IBM that was intended to be a replacement for impact printers like the 1403. Because the printer was an electronic printer, the print engine had the ability to change fonts. The 3800 Mod I print stream had a carriage control in the first position in the print record, and a one- byte font identifier in the second byte, followed by the printable text. The ability to change fonts was limited; there was a limit of four fonts available for a print job and the number of characters in fonts was limited, as was all upper case or all lower case characters. The maximum number of printable bytes on a print record was increased to 204.

3211 Data with Xerox DJDEs: 3211 was the name for a particular type of impact print from IBM. While the 1403 had a print train (the chain that contained all the available characters that could be hammered on to the paper) that was made to print a line that was 132 characters wide, the 3211 was made for print lines that were 150 characters wide. Thus, the maximum 3211 print record was 151 bytes – 150 printable characters and the 1 byte carriage control.

Xerox took this printer definition in the IBM spooling systems and created a series of “LCDS” (line-conditioned data stream) printers that were laser marking engines from copiers coupled with front-end computers that emulated the 3211 printer while adding a great deal of functionality. While the electronic 3800 Mod I printer was able to control only the vertical movement and the limited number of fonts through the print stream, the computer on the Xerox LPS (Laser Printing System) enabled the printer to store hundreds of fonts, forms, images, and logos, as well as enabling the print stream to dynamically change the print formatting page by page or even record by record. The Xerox printer – also called a centralized printer, or a Metacode printer – appeared to be an IBM 3211 printer to the spooling system, but its capabilities far exceeded those of any impact printer. With the right mix of forms, fonts, logos, and dynamic print commands, the Xerox Metacode printer was able to create a wide variety of sophisticated printed documents and drove the US insurance industry in particular for many years.

AFPDS and IPDS: As sophisticated as the Xerox LCDS environment became, the print stream was still fundamentally a line data print stream that laid data down a record at a time. A total architectural revolution was needed to make a print stream that was All Points Addressable. And so AFP (Advance Function Printing, later Advanced Function Presentation) was created. AFP was designed specifically to address a number of performance issues in the data processing environment, such as high speed printing, efficient resource management, and error recovery.

Furthermore, two parallel definitions were created: AFPDS (Advanced Function Presentation Data Stream) and IPDS (Intelligent Printer Data Stream). AFPDS was a standard definition that could be printed on all AFP-compatible printers, even with varying features. IPDS was the print stream that could vary slightly from printer to printer, and enabled the print driver (originally PSF: Print Services Facility) to translate the device-independent AFPDS that users created to a device-dependent targeted IPDS print stream that sent print data and resources to each printer.

Note: AFP specifies the architecture, protocols, objects and command set. AFPDS is a print stream conforming to the AFP specification.

These print streams are both page-oriented and object-oriented, for superior processing, resource management, and error recovery.

ASCII Print Streams

ASCII Text and ASCII Text with Print Controls: ASCII text was the first and most basic print stream based on the ASCII encoding scheme. We have included “text with print controls” here because it is virtually impossible to create text without some sort of print control to identify line endings. This form of print stream is still found in simple applications (Think: Windows Notepad).

ASCII Text with Escape Sequences: Early in the development of impact printers, the informal convention was created that printer commands could be smuggled into the text print stream via an escape sequence. A very early example using the Shift In/Shift Out bytes (x’0F’/x’0E’, respectively) could be found on the Epson MX-80 dot matrix printer. The Shift In byte caused the dot matrix printer to print characters in condensed mode while the Shift Out byte caused the printer to print characters in expanded mode.

The formal escape character is x’1B’ in ASCII, and many printer RIPs were coded to not print a character on this character, but to look at the following one or more characters as arguments for a print command. Indeed, this process became the basis of the most successful of the escape sequence print streams: HP PCL.

Escape sequence print streams (partial list):

  • Epson MX-80
  • Xerox UDK (XES)
  • QMS QUIC
  • IBM PPDS
  • HP PCL
  • Xerox Metacode

Page-oriented Print Streams: Just as for the EBCDIC print streams, the numerous ASCII escape sequence print streams were limited in that they were founded on a line data-oriented structure. Even though several of the escape sequence print streams achieved all points addressability via commands in the language, they lacked the robust character of page-oriented print stream architectures that combined objects, state tables, and other features of programming languages. The only similarity these print streams have with escape sequence print streams is that, except for Interpress, they are ASCII character-based.

These page-oriented print streams became very complex and were originally targeted to the graphic arts market. Performance was much less of an issue than the ability to draw complex graphics and documents with a minimum number of instructions.

Now, thanks to architectural updates as well as large advances in the underlying hardware in printer RIPs, these languages (except for Interpress, which is obsolete in the market) are widely available in the transaction document space.

Page-oriented print streams include:

  • Interpress
  • PostScript
  • PDF
  • The PDF family , including PDF/A, PDF/VT, and others

Sequential Print Streams versus Object-oriented Print Streams

The majority of print streams are sequential in nature. That is, the print stream is designed to be processed by the RIP by starting with byte #1 in the print file and proceeding sequentially to the end of the file. Attempting to jump directly into the middle of the file to start printing may result in incorrect output because a printer command executed on page 2 may affect all pages following, and if page 2 were not executed by the RIP, then pages 3 and following may print incorrectly.

For this reason, simultaneous parallel processing of pages for escape sequence print streams is usually not practical. Fortunately, the relative simplicity of the command sets for escape sequence print streams did not require parallel RIP processing to achieve high volume performance.

However, to go beyond the limits of escape sequence processing as well as to make better resource handling and error handling an integral part of the architecture, the concept of objects was added to several print stream architectures. AFP and PDF added the concept of a page, a document, tables, and other components that not only allowed the RIP to detect errors in the print stream, but also encouraged the reuse of content in the form of images, fonts, and text fragments.

AFP has a detailed object-oriented architecture wherein any piece of content can be  in a container (object) of some kind. One way in which AFP encouraged parallel RIP processing was to create an Active Environment Group for each page, even if identical with every other page. For the most part, each page can be extracted independently of the print file and ripped. However, even with AFP, because of its long past, there are commands that can be used which ruin the ability to do parallel processing.

PDF also has an object-oriented format. The benefit of this architecture could be seen in an early enhancement of PDF which enabled PDF files to display the first page while the file was still being downloaded. Prior versions of PDF had internal object tables at the end of the PDF file, which meant that the entire file had to be downloaded to begin processing. When the Windows PDF driver was updated to be multi-pass (“optimized” in Adobe’s terminology) so that these tables could be placed at the beginning of the PDF file rather than at the end, it became possible to process and display the first page while the rest of the file was still downloading.

PostScript is an exception to everything written above. PostScript is actually a programming language with if and loop statements. Although PostScript was not intended to be a general purpose programming language, it is humorous to think of your company’s accounting being done by your PostScript printer’s RIP.

Common Print Streams Today

AFP

AFP and its related IPDS were created by IBM in the early 1980s as a robust, state and object-oriented print stream architecture that could efficiently drive many different models of high volume electronic printers. The acronym AFP initially stood for Advanced Function Printing (in 1980), but was changed to Advanced Function Presentation by 1990 to acknowledge that AFP could be rendered on a monitor.

AFP is publicly documented while IPDS was not at first because it was intended to be used only by IBM products (the PSF driver and the Common Control Unit on the printer). The open nature of AFP had two advantages in the marketplace: (1) It alleviated the concerns that companies had for a proprietary print stream like Xerox Metacode, and (2) other companies were able to build products that built or manipulated AFP print streams, which increased the size of the market. IPDS was later made open as other companies signed licenses to build parts of the AFP/IPDS print environment alongside IBM.

AFP is state-oriented as well as object-oriented. State-oriented means that certain AFP structures must or may be followed by other structures. For example, the IMM structured field (Invoke Medium Map) may appear in a document object between the EPG (End Page) of the previous page and the BPG (Begin Page) of the next page.

Finding the IMM structured field in any other location in the AFP document would trigger an automatic error.

Object-oriented means that the fundamental AFP element of the structured field can be organized into objects such as page segments, pages, active environment groups, image blocks, resource groups, font character sets, and code pages. Structured fields can be saved or manipulated in groups rather than always individually, resulting in a high level of reusability.

In 2004, IBM created the AFP Color Consortium to help the company expand the AFP definition on color. By 2006, the entire AFP architecture was transferred to the Consortium, which was renamed and incorporated as an independent non-profit called the AFP Consortium. The AFP Consortium, consisting of 35 vendors who create, manipulate, or ingest AFPDS or IPDS, is now an independent standards body.

Xerox Metacode

In 1977, Xerox Corporation released the Xerox 9700, a high-speed, cut sheet, duplex laser printer that was suited for the production of documents for the office. Xerox combined their high quality copier marking engines with a computer front-end that enabled a great deal of control over the print environment. Indeed, its capabilities far outpaced those of the contemporary IBM 6670 and 3800 Mod I printers, which were slow (the 6670) or intended to be a replacement for system impact printers (the 3800 Mod I).

The first 9700s (and the entire family) handled line data and were either attached to IBM mainframes emulating an IBM 3211 impact printer or were standalone units that read the print stream from a magnetic tape drive. This latter feature enabled the 9700 and its kind to accept print streams from a large variety of computers, such as Digital, VAX, Honeywell, and so on. The line printer RIP was powerful in that the computer front end to the 9700 enabled the storage of hundreds of unique overlays, fonts, images, and logos. In addition, the 9700 had tables called JDLs (Job Descriptor Language) that defined the order of the fonts, the page size, the orientation, how the incoming print record would be parsed, number of copies, unique changes per copy, and an almost bewildering array of functions. Furthermore, DJDE (Dynamic Job Descript Entry) capability was added so that the print stream could, by means of an in-stream DJDE command, alter most of the features of the print environment on-the-fly.

As powerful as the LCDS (as it was usually called) environment was, it could not achieve all points addressability. Xerox developed a mechanism to access the underlying metacodes by which the printer’s computer sent data directly to the marking engine. These metacodes were simple – X and Y address, font choice, orientation, and relative movement – yet when created by the right software, they were capable of quickly printing high quality business documents. While the line printer environment imposed enough overhead on the on-board computer to slow the printer throughput to less than the rated 120 impressions per minute (for the 9700), the same print job in Metacode was normally able to run at rated speed.

The Xerox Metacode environment succeeded in business because a number of companies both allied to and independent of Xerox created software capable of producing high quality Metacode print streams. These environments – and the fact that the typical business document did not need the complexity of a document produced by the graphics arts – resulted in the widespread adoption of the Xerox 9700 line of printers as the de facto standard device in the US and Canadian insurance market – the largest market for these applications. Even though the Xerox Metacode architecture (although it isn’t an “architecture” in the data processing sense) has been around since 1977 and is functionally obsolete, the quality of the applications is such that hundreds of companies still use Metacode printers and Metacode producing software. Until the advent of color there was never any requirement for business documents (as opposed to marketing documents) that the Metacode print stream could not handle as well as more modern AFP or PostScript print streams.

HP PCL

In the 1970s and 1980s a number of companies were developing their own table-top laser printing engines. Some of these engines were quite sophisticated and capable of full color, even in the early 1980s (e.g., QMS and its QUIC language). As there was no standard, each company developed its own print stream command set for its devices.

While the PostScript language could be printed on table top devices, the expense of the licensing fees and the slow processing due to the overhead of ripping the language caused it to fail to gain widespread acceptance for the printing of business documents, whose formatting needs were relatively simple.

PCL, on the other hand, was simpler to code. Furthermore, as an escape sequence language, its demands on the printer’s RIP were limited as most of the formatting had already been done on the host device – the printer had to simply read the print stream and execute the simple commands such as X, Y location, font selection, lay down bit map, etc. Eventually, the PCL command set became quite large, making it possible not only to be all points addressable but to also describe any sort of business document in black and white or color.

In addition, the success of the Windows operating system coupled with the free Windows PCL print driver enabled every application that was Windows-compatible to be able to print documents on a PCL printer.

PCL is the most widely used escape sequence print stream, and is supported by hundreds of vendors. However, it has not enjoyed the same success as AFP in the high volume transaction document world because it lacks the architectural robustness of AFP, which was designed to be efficient and with good error recovery in high volume environments.

PostScript®

PostScript[1] is a programming language as opposed to a print stream; however, since it is not a general purpose programming language but one developed to support the particular application of printing, it is included with print streams.

Because PostScript is a programming language, it is capable of describing extremely complex graphic images in a relatively small amount of code. This was a positive for the graphics arts market, who wanted a powerful tool to build complex images, but it was a negative for the business world that needed speed more than complexity. Some early PostScript printers intended for the transaction document world would actually take up to twenty minutes to parse and render the PostScript file before it would print a single page.

PostScript was developed by John Warnock and Chuck Geschke, who had both worked at Xerox PARC (Palo Alto Research Center) on the Interpress language, but who both left when Xerox failed to develop Interpress as a successful commercial product. At first glance there is a high level of correspondence between the command sets of Interpress and PostScript, but PostScript was developed for the rapidly growing world of personal computing, so the initial implementation used printable text characters only, and so could be transmitted over any type of data line.

Hardware and software finally caught up with the CPU-intensive requirements of PostScript, making high-volume transaction document printers feasible. When high- volume inkjet printers with color became feasible, PostScript was already in place to support it, although there were still some performance constraints.

PostScript is still a proprietary print stream belonging to Adobe, and is the de facto standard for the graphic arts industry.

PDF

PDF (Portable Document Format) is a print stream that has been architected to display anywhere that its “reader” can run. PDF is a “print stream” because it is founded on the idea that there is a “page” of a certain size and orientation, and that text and images will be placed on that medium. While it is true that the normal medium for PDF is an electronic display, PDF differs from HTML and XML because its emphasis on the page allows it to guarantee that a PDF document will display AND print as accurately as possible no matter what platform is used. Compare this to HTML which cannot guarantee page parity or other print attributes.

PDF is able to perform its feat of document portability because the RIP is not in a printer but in a software product called the Adobe Acrobat Reader, which has been developed by Adobe for most platforms.

PDF is based on PostScript; however, it is not a programming language because it has been simplified to remove programming features, such as conditional processing and GOTOs.

PDF 1.7 was made into an ISO standard, ISO 32000-1:2008. PDF proved to be such a useful platform that a number of specialized versions of PDF have been developed. The versions of interest to the transactional document space are:

  • PDF/A - ISO 32000-1: Designed for highly reliable long term archival.
  • PDF/VT - ISO 16612-2: Designed for “variable data and transactional printing”, although it is not widely used in the transaction document space.
  • PDF/UA - ISO 14289-1: Designed for universal access (i.e., for people with disabilities who need assistive technologies to comprehend documents in a variety of formats).

Note that while Adobe has handed PDF over to ISO standards bodies like AIIM, Adobe will still continue to enhance the Acrobat Reader, which may have the effect of making it appear that there are enhancements to the print stream definition.

Note that most PDF definitions do not actually include new PDF commands (“tags”), but they may require the use of PDF commands in a certain order, they may require that certain things be required that are normally optional, and they may require comment tags to add additional information for use in readers other than Acrobat Reader. For example, the PDF/A definition removes certain features of PDF, such as JavaScript, as well as requiring that all external fonts and objects be stored as part of the PDF file (the PDF definition does not require that the Base14 fonts be part of the actual PDF file).

References

  1. PostScript® is a Registered Trademark of Adobe Systems Incorporated