www.cimtech.co.uk
Skip navigation bar
PRODUCT REVIEWS | July-August 2009

ABBYY PDF Transformer 3.0

PDF Transformer 3.0PDF Transformer 3.0 is the latest version of ABBYY’s business software designed primarily to convert PDF files for use in Microsoft Office applications, such as MS-Word and MS-Excel, and vice-versa. In fact, it does rather more than this as we indicated in the news item in our June 2009 issue. We have now had the opportunity to try out this product and, although our testing was not exhaustive, it wa sufficient to enable us to get a good feel for the product’s capabilities and to explore some of its new and improved features.

The need

The need for a product like PDF Transformer has come about due to the increasing number of business documents being distributed in the PDF format. While virtually all business users have the facility to read documents in PDF format (and to render them as they appeared in the application in which they were created) relatively few users have the facility to convert the document content into an easily-editable form. While Adobe Acrobat Reader allows the text of a PDF document to be cut and pasted into a text editor, this can be a piecemeal operation and therefore only suited for small amounts of text and when it is not necessary to retain the text formatting. Acrobat Reader can also save the text of a PDF document as plain text (.TXT format) but, again, text formatting is lost and the text may require quite a bit of editing, e.g. removing unwanted line breaks, to get it back into a useable form.

There is thus a need for a product that facilitate easy and accurate conversion of PDF document text into a widely-supported format and, of course, to create or recreate PDF documents from common office applications. ABBYY PDF Transformer 3.0 is one such product and, as indicated above, it has some other tricks up its sleeve as well.

TESTING

1. Conversion of PDF to MS-Word

In testing PDF Transformer 3.0 we put the emphasis on the conversion of existing PDF documents into Microsoft Word and Excel formats as ABBYY claims to offer some distinct advantages with Transformer 3.0 in this role.

Simple press release The first test was with a single-page press release in PDF format. This was a very simple, single-column document with all the text in 10 point Courier and no text formatting. The document was converted from PDF within Word 2007 using the preserve ‘original layout’ option and saved in the .DOCX format. The process took less than a minute and the result was very good requiring virtually no editing to replicate the appearance of the PDF file.

Formatted press release The second test was with a single-page, formatted press release in PDF format. This contained a coloured logo and graphic in the top left and right hand corners respectively, bold headlines (one underlined), and four paragraphs of text (Arial font) in a single column. Using the same conversion settings as in the first test the results was extremely good and virtually all the formatting and layout was retained, including the indented and underlined subheading. The programme had attempted to read the stylised text in the company logo to no effect and had also reproduced the printers quotation marks(“ ”) in the original with straight quotation marks (" "), but neither of these is likely to be of great consequence.

A second and similar press release was also converted very successfully although again lost some of the formatting and position of the header. The text was prefect apart from the change to the quotation marks as indicated above. The several URLs embedded in the text were retained and were immediately active in the Word document. In fairness, we did encounter one single-page, formatted press release where the converted text was reproduced in a very narrow column that spread over seven pages.

Overall, we were very impressed with PDF Transformer 3.0’s performance with the press releases.

Illustrated product datasheet For this test we used a two-page, illustrated datasheet from the Adobe website. This was not a particularly complex document but was typical of many product datasheets The datasheet comprised three columns: the left column being reserved for bulleted lists while the centre and right columns were used for the main text. The data sheet also contained two coloured graphics and Adobe logos.

For the first attempt to convert to PDF we opted for output in Word format, to retain the original layout and to keep the pictures. The conversion took less than half a minute and, although there was some slight loss of position with some of the text and graphic elements, the result very good. No text was lost in the conversion process although the Adobe logos did disappear. Changes in the position of page elements is unlikely to be a concern to most users as the Word document is unlikely to used as the source for an updated datasheet—a separate page layout application would normally be used for this.

Repeating the conversion using the ‘text flow’ option in place of the ‘original layout’ and ‘keep pictures’ options resulted in a text-only, single-column Word document with no formatting other than the text formatting. Again, the conversion process was fast and the result was very good, presenting the user with the text in a form that is very straightforward to edit or incorporate in other documents.

Rather than relying on the automatic conversion option, which can result in text blocks being converted in the order that they appear on the page rather than in the order they might be read, the user has the option of manually selecting the individual text blocks in the preferred sequence. This may reduce the amount of editing needed after conversion.

Multi-page report Business reports can vary in length from just a few pages up to several hundred pages and can be more or less complex in their structure, e.g. the use of running headers and footers, tables, graphics, indexes, tables of contents, multiple sections and chapters, etc. They are typically generated using MS-Word and equivalent applications and often go through a number of revision stages. Such documents are now commonly exchanged and distributed in PDF format. When access can be gained to the original Word document any required changes to the document can be made easily. However, where access is limited to a PDF version, any significant editing will only be possible after conversion to the original format. As indicated above, PDF Transformer 3.0 is claimed to be able to perform such conversions while recreating the logical structure, text flow and consistent formatting of documents across multiple pages.

To test PDF Transformer 3.0’s performance we took one of our own consultancy reports (MS-Word, DOC format) comprising 35 A4-size pages of single-column text. The report also included headers and footers, a table of contents, tables, bulleted lists and company logos. First, we converted the report to PDF format using ABBYY PDF Transformer 3.0—an operation which went quickly and smoothly. We then used PDF Transformer 3.0 to convert the PDF to MS-Word, (DOCX format). We simply opened the document in PDF Transformer 3.0, selected the conversion options, i.e. to retain the documents appearance and graphics, and hit the conversion button. The complete conversion process to Word format took just over two minutes.

The conversion to Word was generally very satisfactory and, at first sight, the result bore a very close resemblance to the original Word document. The original pagination was retained (with one minor imperfection), text, paragraphs, indents and bullets were all reproduced well (although the body text size was reduced from 10 point to 9 point), and all URLs were detected. Not surprisingly, a number of deficiencies were noted on closer inspection. These included the insertion of a significant number of section breaks that were not present in the original, a small table that was not recognised as such, the loss of the company logo from the running header, the loss of text styles (e.g. heading 1, heading 2, etc.) so that all text showed the style 'Normal', and the loss of a line rule from the footer.

It should be noted that this test, as with the previous tests, involved no operator intervention, i.e. we simply let Transformer 3.0 do the best conversion job it could unaided. The software does, however, provide the facility for the operator to assist with the analysis of the document structure and, thereby, improve the quality of the conversion process. This is particularly relevant with documents that have a complex structure, as with our consultancy report. For example, where the the software fails to recognise tables or graphics the operator can manually designate these page items as tables or graphics using the tools available within Transformer 3.0, and then re-run the conversion process. Re-running the conversion process takes significantly less time that the initial conversion.

Scanned documents We conducted limited testing of the conversion of PDF documents produced from scanned pages. With simple, good quality, text-only documents the results were very good with, text was recognised very accurately and most of the formatting retained. With documents of more complex structure it was difficult to retain the page layout although all good-quality text was accurately recognised. Again we would recommend downloading the trial version of the software to see how well it handles your typical documents.

2. PDF creation

ABBYY PDF Transformer 3.0 can be invoked from within MS-Office applications such as Word, Excel and PowerPoint, and from within Windows Explorer by right-clicking on any supported file. Conversion of files to PDF is very straightforward and requires little operator intervention. It is usually just a case of selecting the name and folder for the PDF file and selecting from a few menu choices to achieve the desired output.

As noted above, MS-Word files can be converted to PDF very quickly and with good results. We didn’t extend our testing to other Office applications and so if you need to work with Excel and PowerPoint files, for example, we recommend you download a trial version of the software and test it with your own files.

ABBYY PDF Transformer 3.0 can also produce PDF files from any application that allows the user to select a printer. Installation of Transformer 3.0 adds the ‘PDF X Change 4.0 for ABBYY’ virtual printer to the list of available printers. Selecting this printer should enable a PDF file to be created from the application. We did not test this although it is worth noting that the user can exercise extensive control over the printer settings including the version of Adobe Acrobat to be used, compression settings and security features.

When creating PDF documents in Transformer 3.0 the user can select different PDF file options in addition to the file name. Under the ‘PDF saving mode’ option there are three choices: compressed PDF (smallest file sizes for viewing, emailing, etc.), Non-compressed PDF (for high-quality printing), and PDF/A - no compression (for optimum compatibility). The user can also choose PDF security settings, i.e. ‘Restrict opening’ and ‘Restrict editing and printing’. Other options include Bates numbering of pages, and a redaction tool that enables blocks of text to be blocked out.

Transformer 3.0 also has the capability to combine a number of files (PDF or any other supported file formats) into a single PDF—non PDF documents are converted to PDF in the process. Our limited testing of this feature showed it to be very straightforward to use and to produce good results.

Conclusion

Overall, an impressive piece of software that is easy to use, fast in operation and providing a wealth of features for all those who need to do more than simply view PDF files. It is a very proficient PDF creation tool but we see its ability to produce fully-editable MS-Office documents, particularly MS-Word documents, as its main strength. Although the quality of the conversion process depends to some extent on the complexity of the documents, in our experience the results are generally very good and should not require excessive editing. No conversion software will give perfect results with all documents, but ABBYY PDF Transformer 3.0 consistently provides results that deliver significant benefits. It is easy to use, fast in operation and reasonably priced.

Roger Broadhurst
Cimtech Ltd

Information Management & Technology (IM@T.Online), ISSN 1757-823X