Form Data Handling

Form Data Handling

Contents:
  1. What's Form Data About?
  2. Moving Data in Acrobat
  3. Data Storage Locations
  4. Data Formats
  5. Download Tools for Data Import/Export
  6. Articles and Scripts for Data Import/Export

What's Form Data About?

Traditionally, filled paper forms are transferred and stored as a kind of all inclusive data unit. Breaking the data out of a paper form means manually copying it to another location. So, it is very common to think of a form and the data entered into the form as a single entity,i.e., that the form is the data. But of course, this is not true. The form is a user interface for collecting the data, and in many cases for presenting the data. In the end, it is the data in which we are interested. The form is just a way to get the data. In practice, this idea of form/data separation is much more evident in electronic forms than it is in paper forms. This is mainly because in electronic forms it is extremely easy to move data into and out of a form.

But now we have to ask the question, "how exactly is data moved in and out of forms?", and "what does the data look like when it is not in the form?". These are the two critical issues when it comes to creating real world form/data solutions: the transfer mechanism and the data format, which are the topics covered here.

For PDF forms in Acrobat/Reader there are many different ways to move the data in a variety of data formats. It is important to have this wide range of options in order to satisfy a wide range of workflows. Any particular solution is heavily dependent on exactly how both the form and the data are used.

Form Data Primer

Form data, simply put, is a linear set of Name/Value pairs. This is true for form data in all its different incarnations, from the form itself, to the transfer mechanisms, to all the data formats.

Names

Names are the primary way that data is identified, so they need to be unique within their context. On a PDF form, the data name is the name of the form field. When data is extracted to a data format, it is the field name that is used as the name in the format. For example, if PDF form data is extracted to an Excel spreadsheet, then field names become the column names on the spreadsheet. Names pervade the entire data workflow process. It is important to start out with names that make sense so the data can be easily identified and handled later in the process.

There are a couple basic best practices to follow when creating names.

  1. In many data formats and applications that process data, punctuation characters can have special meanings. It is good to avoid using punctuation and spaces in names. There are two exceptions to this in the case of PDF forms, the underscore, and the "." (dot). Most well written data handling applications ensure punctuation and spaces are handled correctly, but this is not always the case.
  2. The field name can infer hierarchy, or structure in the data, which is important for more complicated forms/data structures. For example, the order form screenshot shown below has several rows of order items. A "." separator is used in the names of fields in this block to indicate a group relationship. If the data is exported as XML format (which handles hierarchy), the separator is automatically used to create a hierarchical data arrangement. This is not true for other formats, but regardless, the group naming is important for the application that will map this data into a more complicated arrangement, such as multiple spreadsheet pages or a relational database.
How a name is formed has a direct affect on how well the data can be handled and used throughout a data workflow process.

There are of course exceptions to strict field name/data name discussion above. If a PDF script or custom program is being used to transfer data, then that script/program can provide automatic renaming. Some applications like Excel provide name mapping for importing data. An XSLT file can be used to remap XML data. And even in PDF, every field has a submitName property, which would imply that data submission uses this name instead of the field name. Unfortunately the submitName property is only used when submitting data in HTML format, which is a shame. If this property was available for all submission and data export operations then the data name could be decoupled from the field name. But since this is not the case, name your fields carefully.

Values

Within the context of a PDF form, data is typically a value representing a number, text, a date, or true/false. All these types of values are relatively small (in data size) and can be easily converted into a text string. Text representation is important because most data formats and data handling mechanisms are intended to handle text. However, it is possible with a PDF to submit both image and raw file data, which is neither text nor small. These are special cases that require specific types of data formats and data handling, and will be covered separately. Most of the discussion and other articles on this topic will only be about handling typical form data values, for the standard form fields.

PDF form fields are enshrined in the ISO 32000 specification, so all compliant PDF viewers implement them in exactly the same way. But data handling is largely dependent on the PDF viewer and/or tool set used to manipulate the PDF form. PDF viewer/tool vendors are not required to implement JavaScript or data handling in the same ways as Acrobat. However, there are practices, formats, and standards that are common for all data handling, and all the good PDF viewers/tools also follow Adobe's lead in one way or another. So, while the techniques discussed here are specific to Adobe Acrobat/Reader, many may also work/integrate with other applications. Compatibility is not guaranteed or implied.

Moving Data in Acrobat

In most cases the functionality for importing and exporting are complimentary, i.e., data moves in both directions with the same mechanisms. Where Acrobat provides a function/mechanism for exporting data to a particular format, it also provides a function/mechanism for importing from that format. But not all of these functions/mechanisms are equal. For example, anything that touches the user's local file system requires privilege. Also, some are not available to Acrobat Reader without special Reader Extensions, or not at all. When developing a data workflow solution it is important to understand these limitations. Below is a table that lists each mechanism and the associated restrictions. The first two are manual operations the user performs from the Acrobat User Interface, which is why "Privilege is set to Not Applicable", the user always has privilege for manual operations. Each mechanism is discussed in more detail in the associated articles.

Methods for Moving Data

Function/MechanismFormatsReaderRequires
Trust/Privilege
Note
Drag & Drop data file on PDFFDF, XFDF Yes NAThis is a way to quickly populate a form.
Menu Import/ExportFDF, XFDF, XML, TXT No NAIn Acrobat DC Pro, the data menu items are available in Prepare Form mode on the More... menu.
Form SubmitFDF, XFDF, XML,
XFD, XDP, HTML
Yes NoRequires Server Script.
Submits are always two way. Both import and export
at the same time.
JavaScript Import/ExportFDF, XFDF, XML, XDP, TXT Requires RightsWith File Path
JavaScript Read FileAny Parse-able Text FileYesYesNo Export, functions read raw file data
JavaScript Data MoveJSON, other simple formatsYes/No (depends on method)Yes/No (depends on method)Move data in/out of JS accessible location, such as global object/document metadata
IAC (external VB app)NA NoNADirect programmatic access to fields
Plug-inAnyRequires Special EnablingNoPlug-ins can do anything

Exporting Data (moving data out of a PDF)

In the standard, and most general form usage model shown in the diagram at the top of this page, an arbitrary, remote user fills out the form and then submits/emails the form/data back to the form owner. The second option in the table above, Submit Form is the only mechanisms built into PDF that handles this case without any assistance from a script or application on the user's system. This mechanism is also implemented by a wide range of PDF viewers.

All of the other export data mechanisms listed in the table above are more suitable for form data automation, or for a closed environment (where the users are known and special applications/scripts can be installed on the user's computer).

  • JavaScript Export functions The Acrobat JavaScript model provides several functions for exporting data to the file system in different formats. All of these functions use industry standard data formats and treat the form data as a single, flat block of data (except in the case of the XML formats, which are hierarchical by nature).

    However the usefulness of these functions is limited in two ways. First, Acrobat restricts access to the file system. Each function has a file path input parameter. If the path is specified, then the function can only be run from a privileged context. If the path is not specified, then Acrobat displays a File Save dialog. Second, these function only operate in Acrobat Reader with special "Reader Enabling Rights". These export functions are quite useful for form/data automation in the local environment.

  • Moving data using a script is helpful for pre-filling fields on a form and maintaining form state. This option is also critical when using a PDF form to create a mini-application. You'll find several of these in the Downloads Section, the "Swat the Fly" game and the "2D Matrix Calculator" .
  • The IAC (Interapplication Communication) interface is a way for external applications to interact directly with documents in Acrobat. On Windows it is a COM interface, and on Mac it is an AppleScript interface. It provides the ability to create external applications for custom data handling. So, for example on Windows, an Excel VBA script can be written to extract individual data values from a PDF and write them directly into spreadsheet cells in a completely custom, non-linear manner.
  • A plug-in is a tightly integrated extension to Acrobat which has access to low level functionality. Unlike JavaScript add-on scripts, it does not have any security restrictions. A plug-in has the ability to do just about anything that can be done in Acrobat and on the computer. However, Adobe puts heavy restrictions on using a plug-in for Acrobat Reader, so they are primarily a tool for use with Acrobat Professional/Standard. Plug-ins are written in C++. Writing one requires a special toolset and considerable programming skills.

Importing Data (moving data into a PDF)

Most of the Import functionality in Acrobat parallels the export functions, but there are some interesting and useful variations. For example, the first option in the table is "Drag and Drop". Both the FDF and XFDF data formats are PDF specific formats, so Acrobat immediately recognizes them as form data. This means that they can be dragged and dropped directly onto a file open in Acrobat. Both also contain links to the original PDF form. In most cases, simply opening one of these data files will open the original form and populate it.

Another example is the 5th option in the table "JavaScript Read File". The Acrobat scripting model provides a couple functions that read raw file data. A script can literally open any file on the user's system (or in a file attachment) and parse data out of it. For JavaScript, the ability to parse data is usually limited to plain text files.

There are three main uses for importing data into a form.

  1. Presenting data
  2. Variable Data operations
  3. Pre-filling common fields on a portion of the form

Presenting Data

It's often the case that data needs to be presented in a different way than how it was collected. To do this with PDF, the form data is exported using any of the standard methods and then imported into a different form that uses the same form field names. The form field names provide the data mapping, from one form into another. There are many variations on this idea, such as using data from several different forms, and custom scripts that perform special data handling when fields have complex configurations and/or don't have the same names as the original data. One popular variation on this is "Variable Data".

Variable Data (Mail Merge)

Variable data means consecutively loading different data sets into the same form, where a data set could be a row in a spreadsheet or database. Each data set import is saved to a different name, printed, or emailed. This type of operation is used to create form letters, invoices, receipts, and many other types of documents. This technique is also commonly called a "Mail Merge" and there are many 3rd party tools for doing it inside and outside of Acrobat. It can also be done in Acrobat using a custom automation script. Any of the programmed techniques from the table above could be used to created a Variable Data solution.

Pre-Filling Form Fields

There are many reasons for pre-filling a subset of fields on a form. As an example, consider an order form, such as the one shown below. This screenshot shows the form open in Acrobat Professional, with the Attachments panel on the left and the Add-ons panel on the right. There are 3 sets of fields on this form that use some type of automatic "pre-fill". Each one is done in a different way, representing the 3 general locations from which data can be imported/acquired.

  1. External Data - The customer information at the top of the form is imported using a JavaScript automation tool that grabs data from an external data source. There are a wide variety of external data sources, from a data file on the local hard drive to a database on a server. External sources are useful when data changes frequently or when data is not specific to a particular form. For example, customer info on this form could also be used on many different types of forms. Since it is external, the data can be modified without changing the form itself. The import/export automation tool is shown in the Add-on tools panel on the right. It shows a popup menu that includes data management options for saving and deleting customer information.
  2. PDF Data - The dropdowns for selecting a product are filled from a data file that is attached to the PDF form itself. One of the JavaScript functions that reads raw file data is used to acquire data from the attached CSV. This data travels with the PDF so customers at other locations can also use the form. Product and pricing updates are relatively easy to make by simply replacing the attached CSV file.
  3. Hard Coded Data - The Shipping Costs are coded directly into scripts on each of the radio buttons that select for the shipping type. This type of data storage is only useful when data is not likely to change. It is difficult to update because it requires modifying a script, so for a value such as "Shipping Cost" it is not a good choice.
pdfscripting.com

Data Storage Locations

There are a large variety of locations where data can be exported to and imported from. The general categories of these locations are outlined in the "Pre-Filling Form fields" section above, essentially external to Acrobat/PDF and internal to Acrobat/PDF. The range and flexibility of how these locations can be used depends on the particular mechanism used.

As noted in the "Exporting Data" section above, the last two entries on the table, IAC and plug-in, are both completely custom solutions, so they have the greatest flexibility/capabilities of all the data transfer mechanisms. But, they also cost the most to develop, and solutions using these mechanisms are generally restricted to Acrobat Professional/Standard. Many solutions from 3rd party vendors will use one of these mechanisms.

The manual methodologies at the top of the table are restricted to accessing data files on the user's local file system. However, the local file system could include networked drives as well as remote (virtual) folders that are mapped to the local file system.

The JavaScript model has functionality for accessing the complete range of data storage options (as discussed in the pre-filling example), but any one function/method has limitations, and this is where the discussion is focused.

  • Local File system: There are several functions for both importing from and exporting to specific data file formats on the user's local file system, and there is one function for reading raw file data. All of these functions have an optional file path input. If the file path is not included, then the function can be used in a form script (non-privileged), but if the file path is used the function requires privilege. Typically, these functions are used in trusted automation scripts, and in most cases they can only be used in Reader with specific form rights. The Reader restriction has changed over time and the latest versions may allow some of these functions without the Form Rights.
  • Remote Data: Remote data means the data is on an internet server. There are two JavaScript functions for accessing the internet. One of them is the standard form submit function, which submits the data in a specific format, works with Reader, and doesn't have any privilege restrictions for pure data transfers. The other function is a general purpose HTTP request tool, and it does require privilege and will not work in Reader without Form Rights. Both of these functions require there to be a server tool on the other side to receive and send data, which usually means custom programming.
  • Global Object:: The Acrobat JavaScript model defines a "Global Object" where persistent data can be stored. This means that data is maintained across Acrobat sessions. The Global Object is inside the Acrobat application, so this data is only available to Acrobat on the user's computer, and cannot be transferred to another user, unlike a data file. There are restrictions for accessing this data from a non-privileged script, so the global object is most useful for automation scripting. This object is best for storing simple data such as text and numbers. In the order form example above the customer information is stored in the global object as a JSON string, which makes it easy to convert back into workable data.
  • PDF locations: There are several locations inside a PDF where data can be stored. One example is a file attachment, as shown in the Order form example above. Other locations are the PDF metadata, annotations, hidden form fields, and document scripts. A script can read from and write to all of these locations in Acrobat Pro and Standard, but in Reader a script can read, but cannot write to file attachments and metadata without special Reader Extensions. If a document script needs to store data locally so it is saved with the file and it needs to work in Reader, then a hidden form field is the best location.

Data Formats

Not all data formats are equal. Each one has different features that will determine its suitability for a particular solution. The table below provides a brief description of the formats most commonly used with Acrobat and PDF forms.

The second column "Native Format" is marked as "Yes" when there is a JavaScript function for importing and exporting data in this format.

The third column "Data-Sets" indicates the number of individual data-sets that can be stored in the format, 1 or many. For this discussion, a single data set is all the data from a single form. When exporting data with the JavaScript functions, Acrobat creates or overwrites a file with a single data set. It will never add a data set to an existing data file, even when the file format allows for multiple datasets.

FormatNative
Acrobat
Data-setsDescription
FDFYes1FDF (forms data format) is Acrobat specific and not used by other data handling applications, unless they were designed specifically to handle PDF data. Adobe developed it for tight integration with PDF workflows. It not only stores data, but also stores information about the original document and can transfer comment and review data, update content in PDFs, and install and run scripts among other things. If there is a need to transfer raw file and/or image data, then this is the only data format that will handle these tasks automatically through JavaScript. It is however, a difficult format to create and parse outside of Acrobat, so it is most useful for Acrobat-centric workflows. At one time Adobe provided a programming toolkit for handling this format, but unfortunately that is no longer the case.
XFDFYes1This is the XML version of FDF. Much easier to create and parse, but provides many fewer features. Used mostly for transferring form data and comments.
TXTYesManyFor data, a text file usually means "Tab Separated Values". This is a common text based format similar to CSV. Each row in the file is a different data set. Recognized by most applications that handle data. This is the only format where the JavaScript function allows data to be imported from any data set in the file.
CSVNoManyCommon and very old text based format, where each row is a different dataset. Recognized by most applications that handle data. Acrobat does not provide specific JavaScript functions for handling this format, so it requires custom scripting. However, one of the data export menu items writes to this format for merging data from several forms. CSV is automatically recognized by Excel as a single page in a spreadsheet, so it is an excellent format for importing form data into Excel.
XMLYes1XML is a general purpose, text based data format. Unlike the other data formats discussed above, it is capable of representing complex hierarchical data structures. This format is capable of holding more than one data set, but when specifically selecting "XML" as the format, Acrobat exports a single data set with an ".xml" file extension and uses a simple hierarchical grammar based on field group names. Other formats listed here are XML based, but use a more complex grammar and are saved with a different file extension.
XFDYes1XML forms format that can also contain data. Created for what became Adobe LiveCycle Forms, now AEM forms. It is a proprietary Adobe XML based form sold into the enterprise market. Looks like PDF on the outside, but isn't PDF on the inside. Acrobat will import/export this format with regular PDF forms, but it's only really useful for AEM forms on the AEM server tools.
XDPYes1Another XML data/form format for AEM forms. Adobe created this one, primarily as a data format that can contain the original XML form (not PDF form). Acrobat will also import/export this format with regular PDFs, but like XFD it is not very useful in this context.
JSONNoManyJavaScript Object Notation. Quite literally a text string of the JavaScript code for creating a JavaScript Object. Not very efficient with size, but very easy to transmit, store, create and evaluate in JavaScript. Popularized in web browser scripting, it's now used everywhere. In Acrobat the official JSON toolset was added in the DC version. To use this in previous versions use the object.toSource() and eval() functions. The toSource() function does not create strict JSON format, but it works if the JSON is only parsed with the eval() function.
ExcelNoManyThe Excel file format is proprietary to Microsoft, although the specification is public and anyone can create Excel files. Acrobat does not currently provide any way to interact directly with Excel or Excel files. But, there are three indirect ways to get data into Excel. 1) export to CSV, TXT, or XML and import this file in the Excel app. 2) Write a custom Excel Add-in with VBA that uses the Acrobat IAC to acquire form data. 3) Write an Acrobat plug-in that either writes Excel format directly, or interacts with the Excel app.
HTMLYes1This is the data format in an HTTP Post when an HTML form submits to a web server. It's a very simple name/value pair text format. Acrobat provides the ability to use this format on a form submit so that a PDF form can be submitted to the same server script that would be used for a web form. Unfortunately, the return data needs to be something Acrobat understands. This is where the scheme usually fails because Acrobat does not understand HTML, except to convert it to a PDF.

Download Tools for Data Import/Export

*Select and Load Form Data*
Loads form data from a CSV or Tab Delimited (.txt) file into form fields on the current PDF. Complete, ready to use tool, no script editing required. User can search data and manually select the data line to import. Works in Reader.

*Copy Form Data Between PDFs*
Copies all fields in one PDF into matching fields in another PDF. Works in Reader.
*Import/Export Excel Data as Text*
Acrobat can import and export data to/from a Tab Delimited Text file, which is one of the formats recognized by Microsoft Excel. This package demonstrates the process of the import/export process.
*Fill Form from external XML file*
This Automation tool uses data from an external XML file to populate fields on a LiveCycle or AcroForm PDF.
*Using the Global Object Sample*
A set of scripts that demonstrate using the Global Object, which is used to share and persist data.
*HTTP Request Tester*
Tool for testing the HTTP JavaScript function.
How to Create an Interactive PDF Form
Forms

How exactly do you create a PDF form with interactive fields? There are basically two steps. Create a Static PDF Form - Use any document creation tool to create the layout and design of your form, th... keep reading

Auto-Populating Form Fields from a Drop-Down List(ComboBox)
AcroForm, Field, Event,Data, List

Scripts and techniques for setting the values of form fields automatically from a selection on a drop-down list.... keep reading

Auto-Filling a Drop List with a Drop List
form, list

This article presents techniques and Scripts for automatically setting the list entries in a drop down (combobox) field from a selection in another drop down field. Includes sample files.... keep reading

Acquiring Raw File Data

External data, i.e., data outside of Acrobat or a PDF file, is often a very important part of a workflow process. For example, information on customers, products, employees, etc. are typically stored in Excel files, databases or on a server. One of the most common issues with automating such a workflow process is getting the data from the external file or data source into the automation script. This article provides techniques and script examples for acquiring external data.... keep reading

Setting Up an Excel File for Database Access
Database, Data Handling, Automation

Excel is probably the most used desktop data tool. And even though it is not a real data base it can be treated as one if it is setup and handled properly. This article covers the specifics of this process as it relates to importing and exporting data from/to a PDF in Acrobat.... keep reading

Importing and Exporting Excel Data
Automation, Excel, Database, AcroForm

This article explains exactly how to transfer data, in both directions, between an excel file and Acrobat. Scripts are provided for importing and exporting in a variety of scenarios, including a looping scenerio for performing variable data operations and mail merge.... keep reading

Using Excel™ with Acrobat™, PDF, and LiveCycle
Automation, Data Handling, XFA

There are several different ways Acrobat, PDF, and LiveCycle forms can exchange data with an Excel Spreadsheet. This series of articles outlines the details of the different methodologies and provides several variations on the code for implementing each.... keep reading

Free Sample PDF Files with scripts

These free sample PDF files contain scripts for common, complex, and interesting scripting tasks in Acrobat. Many more are available in the Members Only Download Library. Feel free to browse through... keep reading