Event:

Join us at Xplor 2024 SUMMIT September 24-26 in Orlando, Florida

Document Composition

From Xplor Wiki
EDBOK Guide
EDBOK-book cover.png
Body of Knowledge
Document Production Workflow
Lifecycle Category
Document Composition
Content Contributor(s)
Carol Fiore, Roberta McKee-Jackson edp
Original Publication
August 2014
Copyright
© 2014 by Xplor International
Content License
CC BY-NC-ND 4.0

What is Document Composition?

Composition involves processing all components involved in creating a document – raw data, document design criteria, business rules, static messaging, dynamic messaging, barcodes, postal sortation, rendering requirements – into formatted documents for delivery in print or via the wide variety of digital channels and media.

The composition step is at the core of the document production workflow and serves several purposes:

  • It interprets formatting commands interleaved with text and graphics to create composed pages and documents.
  • During the formatting process the composition engine may also be capable of interpreting business rules to build dynamic graphics, assign marketing messages, replace variable data fields with live data, insert customer support messages, and add compliance messages.
  • It emits composed pages into one or more presentation streams suitable for print, archive and/or electronic delivery.
  • It provides document integrity and workflow control data for managing downstream processing.
    • The downstream process management can be accomplished by various marks on the documents (barcodes etc.) and/or by control files that describe the presentation file generated in terms of documents, pages, inserts for each document, mail-packet weight, addressee, and other metadata.
  • It can produce MIS reporting data on document content for CRM and effectiveness measurement as well as to meet audit requirements.

A well-constructed composition application, in conjunction with a quality document development project and appropriate document objects, yields well-formatted documents that process efficiently during printing, insert reliably, and ensure high- integrity end-to-end document manufacturing. Whether the documents are output to print streams or to display devices, quality composition facilitates faster, more efficient processing.

Composition History and Trends

Computer-based document composition has evolved along with computer technology, programming, and processing capabilities.

The earliest composition applications, developed at a time when there were mostly impact printers and they could only handle line-based presentation, used a variety of computer programming languages to insert formatting commands. Print files were limited to the typeface of the print chain on the impact printer. Changing fonts involved manually changing the print-train in the printer. The fonts had fixed positioning horizontal and vertical characteristics.

The combination of limited function impact printers and general purpose programming languages resulted in limited capabilities for composition. A wide variety of these legacy applications remain today simply because there is still a need for the application to run. These legacy applications are costly and difficult to manage for all the usual consequences of age: lost documentation, the original programmers no longer available, and even lost source code.

The next generation of formatters evolved separately on mainframes and servers. A program called RUNOFF emerged from MIT and was later modified at Bell Labs; it is still commonly found as TROFF and newer variations on UNIX systems. More common in the enterprise world was the mainframe SCRIPT program introduced by IBM as a method of adding formatting Control Words, commonly known as dot commands. Files marked up with control words were processed by a compiler to produce a formatted document.

SCRIPT has many variations, including Waterloo SCRIPT and a variety of PC- based programs designed to allow mainframe document files to move to the desktop computing environment. The basic markup language included options for changing fonts, adding line spacing, identifying headings of multiple levels in a hierarchy, and adding index and Table of Contents entries.

Tag-based languages followed Control Word-based languages in the 1970s with the introduction of IBM’s Generalized Markup Language (GML). GML used intent-based markup so that final formatting rules could be maintained separately from the physical formatting rules. Instead of using Control Words to skip a line and indent, a macro was created with the desired formatting commands and then mapped to a tag, indicated with a leading colon or other tag indicator. As formatting requirements change the formatting can be changed in the macro without changing the document markup. The IBM GML product was marketed as the Document Composition Facility (DCF) and became the basis for the international standard SGML (Standardized Generalized Markup Language), HTML (HyperText Markup Language) and the host of XML (eXtensible markup Language) variations in the market today.

Some early composition engines focused on reading and manipulating data and placing results onto predefined locations on a page – often into boxes or spaces on pre-printed business forms. Many pre-printed forms later became static electronic forms called by the printer through commands in the print stream. That evolved into data merging with electronic forms, ultimately with conditional processing to permit formatting variations.

In the early 1980s, as the adoption of xerographic technology-based all points- addressable printers (e.g. Xerox 9700 and IBM 3800) progressed, visual design came into play. Instead of the typewriter approach, printers were able to control each dot on the page, as printer controllers became more powerful and added memory.

The Xerox Approach

On the earliest Xerox systems electronic forms were created on the printer controller, which took valuable time away from the intended usage of high speed production print. Intran Systems developed proprietary workstations using WYSIWYG (What You See Is What You Get) technology to create electronic forms which were then compiled to run on the printers. Intran created the electronic forms (.FRM), fonts (.FNT), logos (.LGO), and images (.IMG) supported on Xerox printers. Later Elixir, Tyrego, and Lytrod developed similar PC-based software to create the same type of resources.

The Xerox Integrated Composition Systems (XICS) emerged from Xerox, built on  the base CompuSet language, to serve the full page/full document composition requirements that emerged. Xerox subsidiary Document Sciences took the XICS/ CompuSet products through a series of enhancements creating Monogram and Autograph to support the development of documents. Document Sciences, as an independent company, created their follow on product called Xpression, an XML- based tool and provided migration paths to the new platform. Many XICS/CompuSet installations are still in production even though the product is no longer supported by Document Sciences, now owned by EMC.

The IBM Approach

The IBM mainframe approach led to the development of a set of products designed to create electronic forms (OGL: Overlay Generation language) and to add formatting to line data files (PPFA: Page Printer Formatting Aid) within the mainframe environment. The development of the AFP architecture (originally Advanced Function Printing, later Advanced Function Presentation) to generate the output to drive IBM printers created an environment for creating and managing documents, electronic forms and graphic resources. Print Service Facility (PSF) consumed AFP files to create device-dependent print files targeted to specific IBM print devices.

Other Composition Vendors

A new generation of composition products followed that could exploit graphic capabilities. Many included interactive design tools that allowed for the inclusion of objects anywhere on the output page, and provided an accurate rendering of what the page would look like. Products like Custom Statement Formatter (CSF), Lynx (now Pitney Bowes Software DOC1), and EZ-Letter found niches in the enterprise transaction market. Many customers use them today.

Another set of products began to enter the market during the 1980s to provide support for merging variable data and pre-composed document templates. Some were specific to an industry, including DocuMerge (Image Sciences, now Oracle) in the insurance and policy issue industry. Products of this type required sophisticated workflows, command of the document composition process, and an understanding of the final output stream formats.

Later programs emerged that transformed final format document print files to different formats and provided document re-engineering capabilities. This crosses over to the document composition environment, but is covered in the article on transform products.

Color

Use of color grew as two technologies converged. The first was the availability of color printers capable of printing at sufficient speed and low enough cost to make them worth considering for enterprise printing. The second was the rise of the internet and web-based information presentation. The availability of color gave rise to concerns about overusing color in business documents. The addition of color required additional training on the part of document designers and the addition of tools to generate color graphics and manage device-specific color requirements.

Post-Processing Functions

As the development of mail packages became more sophisticated the composition software added capabilities to add barcodes and other marks to be used in post-print processes during finishing and mailing. Most post-processing functions are based on the interpretation of business rules by the composition engine. Rules-based processing today is used to manage the insertion of variable data, the selection of images for inclusion in the document, the selection of marketing messaging and the management of excess available white space in the document.

Document Types

Composition today can be as simple as interpreting basic formatting commands or as complex as interpreting formatting commands and business rules. The most sophisticated programs may be linked to Customer Communication Management systems. Basic business documents generally fall into these categories:

  • Text-based documents such as letters, notices, policies, proposals, documentation, and instructions.
  • Transaction documents that communicate business content including account status, changes, ordering and services details such as bills, account statements, invoices, explanations of benefits, and bills of lading.
  • Promotional documents that typically have more graphical content to focus on sales and increasing business such as offers, coupons, and brochures.

Another way to look at composition focuses on the application(s) used to create the document. In this view the completed document could fall into any category:

  • Batch, high volume documents (often transactional).
  • Interactive where a user selects a document template and responds to prompts to determine contents of a document, such as claims correspondence.
  • On-Demand where the document application runs on a server after a user specifies data that determines the content, such as a price quote for insurance.

Composition Processes

The composition step is one component of the workflow to create customer communications. Within composition, there is also a workflow, whether in a new application or a batch job that runs on a regularly-scheduled cycle.

Assembly of Data and Document Objects

For composition in support of formatting transaction documents there are several steps in the workflow. It begins with the assembly of data extracts and document objects. For batch jobs on regular schedules there is generally a trigger to automatically set the process in motion to extract, transform, and load (ETL) the required data and document objects. For documents such as correspondence, which may be personalized interactively (for example by a customer service rep), data and objects may be obtained in real time.

Data extract files, often held on the mainframe or other client server, are readied for use. These extract files may include customer account information, applicable transaction records, related taxes and fees files including current updates to any taxes being applied, marketing content triggers, and external denominators file such as unit trust market prices or other data which will affect the customer document being produced.

All required document objects should also be available from the document object libraries. This includes fonts, images, any dynamic messages to be incorporated in the communication, and any static text objects e.g., terms and conditions.

Business Rules

During the development of the application business rules are created to ensure the consistent configuration of the document whenever it is recreated. The rules are based on the type of document, its purpose, the method of delivery, and its intended usage. They are used to set up a template for the layout of the document. The template identifies where to place each text block on the page, including its absolute or relative positioning. It also identifies where graphics and images will be placed in the document and if they will they be repeated within the document. An example might be a corporate logo placed on each page in a multi-page document. A repeated graphic or image may be referred to as a hard object.

Other business rules are conditional; they provide guidance for what to do when certain conditions exist. Examples of conditional rules include language preferences, customer preference for electronic delivery or printed delivery, or both, regional conditions, special needs or accessibility requirements, and page threshold requirements in a particular batch process.

Business rules also provide instructions on how to break tables at the end of pages or how to carry table data to the following page if a break is required. There are also rules on how to force page breaks to avoid widows (a paragraph-ending line that appears at the beginning of the following page) and orphans (either the opening line of a paragraph or heading that appears at the bottom of the page by itself) in the page layout sequence. Business rules are also used to manage output options, including rules surrounding sortation for printing in mail delivery order, routing to digital mailboxes and other ePresentment options, as well as to archives.

Calculations

As the data and document objects are being parsed through the composition system there may be calculations to generate values to be used during composition and in the reporting process. The number of pages in the document is a common calculation and takes into account the rules for sortation and separation of the document based on the expected print device and format for printing. For example, printing two-up duplex may require insertion of blank pages at the end of Customer A’s statement to avoid printing any of Customer B’s data on Customer A’s statement.

The composition system also calculates page numbers within each mail-piece as well as the total number of impressions in a print run. Impressions are the number of images printed as opposed to the number of physical pieces of paper printed.

There are other calculations that many composition systems are able to perform that should be avoided. For example, do not calculate any legally relevant information in the customer document. Examples include column totals in a transaction document; the date of issue or due date; interest charges; or the yield on a stock portfolio. These types of data calculations should always be provided in the data stream extracted from the client data file.

An audit file is normally generated with each data file produced for the composition system. In that audit file you would expect to see the total number of customer records generated and perhaps the total number of transactions for the entire data set, among other audit criteria. Normal calculations include verification of the total number of customer records and that the total value of the transactions after composition matches the number in the original audit file. This type of calculation ensures the integrity of the composition process but is not calculating any legally relevant data for the customers.

As a general rule, multiple passes are required to establish the optimum layout based upon the calculations, and to resolve such variables as “y” in “Page x of y” footers.

Dynamic generation of documents creates information that affects formatting, and sometimes alters the placement of text or an object elsewhere in the document. Widow and orphan control is an example. This is often referred to as “two-pass” processing.

Layout

Document layout commands in the composition system give the instructions on all details of the document generation. The components of layout include:

  • Orientation: Documents are usually created in either portrait (tall) or landscape (wide) mode.
  • Beginning of page: First pixel in upper left hand corner of page which becomes the key reference point for all other objects on the page.
  • Logical page: A virtual page. The system can place two or more virtual pages into a print image, e.g. two logical pages on one physical page yields one impression.
  • Text blocks: Areas defined in the layout for placement of specific information, such as a corporate contact details.
  • Absolute positioning: Placing an object or text at a specific point on the page. Absolute positioning is based on the beginning of page. This is often used in placing address blocks.
  • Relative positioning: Placing an object or text on the page based on the position of another object, such as the beginning or end of a table.

Address Block

In a transaction document it is imperative that the address block fits properly in the envelope window (if window envelopes are in use) or that it can be read by scanners to match the address to the envelop. For window envelop environments if the address block cannot be read or is misaligned in the window, it cannot be delivered.

Specific criteria are established by every postal authority regarding placement and format of addresses. The criteria define clear areas required within the window as well as font sizes, acceptable font styles, and maximum height and width when addresses are not in a window. These requirements enable mail-pieces to be read by the high-speed sorting equipment at the post office.

The address block may also contain additional postal information such as postal barcodes applied through the composition system.

It is important to test the position of the address block early in the development process. Many project designs neglect to test the parameters of the address block early enough in the document design process which can result in expensive delays or damages if the application goes live without the required testing.

Emitter

As part of the core composition process just described, interim, often proprietary, format files may be produced by the composition system. This is a step prior to creation of the actual print file. The interim format is tagged to support multiple output formats; the tags are generally xml-like to provide information on structure and layout. Sometimes the interim format produces specifications for that job which can then be processed by other applications to prepare the job for the emission step.

The data stream may be run through multiple emitters creating print files, electronic delivery files, special needs files, etc. and these processes may run concurrently.

At some stage in the overall process, the documents may need splitting and sorting, for example by delivery channel, special needs, or archive-only, and then those destined for print by postcode, mail-piece weight, and overall number of documents. This may occur upstream of composition, during composition, or downstream using file manipulation software. If it happens during the composition step, then it is often part of the emission processing.

The resultant interim files are then run through the appropriate emitters to produce the desired output files. Depending on the format being produced, separate layout rules may be required for each output type:

  • print files may emit as AFPDS, Adobe PostScript, Xerox Metacode, HP PCL, PDF or an XML variant,
  • special needs or special handling files may be produced as Braille Ready Format (BRF), large letter, or accessible PDF, and
  • electronic delivery may produce Adobe PDF, PDF/A, XML files, and in some cases image format files.

Production and Audit

As part of the composition process, production and audit control details are created to verify the integrity of the batch process. These include:

  • Document Number: A unique identifier assigned to each mail-piece within the job.
  • Spill Code: A human-readable number for use in the event the (cut sheet) documents are spilled on the floor: an incremental number placed on each page in the job.
  • Page Number: Numbers each page plus total number of pages in the mail- piece.
  • Audit Sum: The sum of the closing balance in each document (mail-piece); checked against closing balance sum from the calculation stage to ensure all documents in the batch have been created.
  • Barcodes and files:
    • Postal: Four State barcode: POSTNET barcode that is being replaced by Intelligent Mail barcode, or PLANET (Postal Alpha Numeric Encoding Technique) code.
    • OMR: Earliest form of marking to indicate page count and end of document, and to control insert selection.
    • 2of5: Page sequence numbers.
    • 3of9: OMR fields plus document number.
    • 2D barcode: Contains considerably more information in a very condensed space.
    • Insert control file: Created automatically by composition system to provide list of documents with attributes which is sent to the server managing the insertion process.

Insert Control File

During the emitter phase of the workflow in composition, an insert control file (ICF) can be constructed by the composition software. This insert control file is of particular importance in the downstream processing of the print file. The ICF provides the insertion process with the following information:

  • Mail-piece identifier, e.g. CBC (Customer Bar Code) in the mail-piece field,
  • Number of pages for each mail-piece,
  • Weight of each mail-piece taking into account number and weight of any intended inserts, the weight of the document based on the number of pages, and the weight of the envelope,
  • Thickness of mail-piece based on the paper characteristics, how many pages of the document are to be included, and the number of inserts,
  • Inserts to be called based on the barcodes,
  • Postal tray / pallet ID, and
  • Address block information.

The insert control file is a valuable tool generated from the composition system and is the key to a robust insert control process, and to overall integrity of the production process.

Composition Software Products

There are many vendor solutions for composition with varying capabilities. Sources of information are available offering various market analysis studies on Document Composition and Customer Communication Management (CCM) today, including those from Forrester, Gartner, and Madison Advisors.

One trend in businesses is to consolidate the use of composition tools, although migration without a need to change or enhance processing is not commonly considered to be cost effective. If new composition software is to be considered, key features that should be considered include:

  • end-to-end application capabilities,
  • design ease of use (WYSIWYG, collaboration),
  • enablement of business users, possibly through interactive access,
  • data handling flexibility,
  • re-use support for resources and application building blocks,
  • application flexibility (support of batch, interactive, and on demand requirements),
  • integration, and
  • repository capabilities.

Composition Best Practices

The following highlight key functional considerations for best practices in document composition.

Data

  • Good data is critical to the success of customer communications. Corrections or changes to data (e.g. customer address) should be made at source if possible, for example in the customer database, transactions database, or marketing database, to ensure errors are not repeated endlessly.
  • Automatic triggers to begin the extraction process should be built into the application design. Best practice dictates that normalization and sorting of data is completed in the ETL (Extract, Translate, Load) step before transmission to the composition system.

Design

Accuracy, usability, and flexibility are key requirements when designing document resources and applications. WYSIWYG design is a must. Reuse of elements across document applications is important for development productivity, reducing test requirements and maintenance costs.

The quality of the document design guide is critical to generation of successful, well- presented, and effective customer communications. Development of a good quality, up-to-date design guide is important to the composition phase; a desktop publishing tool file is not enough. Documentation should be clear and complete, with business requirements detailed up front. It should describe layout and business rules for all applications, typefaces, weights, styles and point sizes, page size, logos, and many other aspects of the document appearance. Other considerations are cut sheet, simplex or duplex, page orientation, N-UP, imposition capabilities, and booklet support.

When using outside consultants, designers, or agencies, be aware they may not be familiar with the variations which may occur either in the application, in production printing situations, or with multi-channel output requirements.

Testing design layouts requires diligent due process. See the Test topic for more information.

Trends and Considerations

Composition has provided an essential tool in the format and delivery of messages, not only to print but to the many optional channels for output. The composition utilities have expanded their function to deliver formats to Personal Computers through digital communication devices, the ubiquitous cell phones, tablets, and now wrist watches.

In this process, the utilities have gone beyond entirely outbound technology, to include interactive processing. Their future is just beginning.