TKS
Turn-Key
Systems
 
X-ICE: Word to XML Conversion


A novel but highly effective Word to XML conversion system from Turn-Key Systems.
Download it here.

What Is X-ICE?

X-ICE is a tool for converting unstructured data from MS Word files into XML, using any appropriate DTD. It acts as a bridge between Microsoft Word™ and XMetaL from Justsystems. X-ICE automatically invokes MS Word on the source document and requests one paragraph at a time. Based on a set of rules (which can vary as required) the paragraph is converted to an XML fragment which is passed on to XMetaL.

This process can be temporarily halted if:

  • X-ICE has not been supplied with a rule to handle a particular paragraph type;
  • the resulting XML fragment cannot be inserted into the target document because it would violate the rules in the DTD;
  • the user requests a pause in the conversion.
At this point the user can do one of three things:
  • modify the Word document to correct any non-standard styling;
  • modify the XML document to resolve any DTD violations;
  • modify the X-ICE rules to handle the situation properly.
This flexible approach allows the rule set to evolve to handle common style variants, while allowing the operator to handle one-off situations at either the Word or XML end. And once the conversion is complete, the user is presented with a fully converted XML file, guaranteed valid against your preferred DTD.

What Are the Advantages of X-ICE over Similar Conversion Systems?

There are several features that set X-ICE apart from other converters:
  • You may select any DTD for your XML output. Once conversion is complete, the data is ready to use.
  • It is rule-based, so you don't need advanced programming skills to set it up.
  • It incorporates a powerful scripting language (Perl), so if you do have programming skills you can specify very sophisticated behaviour. For example you can have tagging triggered by particular words or phrases in the text.
  • It connects to an XML editor (XMetaL) in order to create the XML data. This has several advantages: for example the data is validated as it is entered, so there is no need for a separate parsing step at the end.
  • The conversion process is fully interactive. The user can watch the XML data as it is being created. XMetaL can apply a stylesheet to the data, so conversion errors are often immediately obvious. The user can choose to pause the conversion at any point in order to fix errors.
  • An error (such as not being able to find a rule to apply) does not abort the conversion, it simply pauses and allows the user to intervene.
There are a number of ways you can exploit the interactive nature of X-ICE:
  • You can develop rules incrementally. If you start a conversion with no rules defined, it will stop on the first paragraph. Define a rule that matches this paragraph and restart the conversion. Repeat until the whole document is converted.
  • Inconsistent data can be corrected as part of the conversion. When X-ICE pauses, it highlights the paragraph that caused the problem in Word. If the data is incorrect it can be fixed without having to restart the conversion.
  • Sometimes DTD clashes are caused by earlier errors which have given rise to incorrect tagging. X-ICE allows you to go back in your XML target document to correct such errors before retrying the current paragraph.
  • You can defer the conversion of "problem" data. The use may choose to ignore a paragraph that causes a problem, in which case X-ICE writes its text as a comment at that point in the XML. These comments can be reviewed and corrected when the conversion is complete.
  • You can choose to have part of the conversion done manually simply by not writing any rules for it. There are may cases where the most cost-effective solution is to automate, say, 98% of the conversion and leave the operator to handle the rest.

How Do I Write My Own Rules?

X-ICE contains an integrated rule editor. Many rules can be created by simply filling in choices such as style or font name. More powerful options, such as doing pattern-matching on the text or checking the current context of the XML document, are also possible.

The standard release comes with examples which illustrate many of the available features. Copying and modifying existing rules is a good way to get started.

How Much Does It Cost, and What Support is Offered?

X-ICE is available for anyone to use free of charge. It is an "Open Source" product released under the GPL for the benefit of the XML community.

As a free product it is of course provided "as is" and is not subject to official e-mail or phone support. However we would welcome feedback as to any problems you may encounter, and will endeavour to adjust the software or post solutions as soon as possible.

If you wish to discuss an individual support contract, or to have Turn-Key prepare or modify rule sets for your own purposes, please contact us at mailbox@turnkey.com.au.

You may also copy, modify and re-distribute the software under the terms of the GPL. We would ask that a copy of any modifications you make be sent to us, so we have the opportunity to incorporate them for the benefit of all users.

Other Resources

The download contains documentation files - you can also read them online here. (The online version is provided for convenience only and is not always up to date. If in doubt, refer to the documentation in the download.)

Some screen shots of X-ICE in action.

 Copyright © Turn-Key Systems
 All Rights Reserved
 Webmaster: mailbox@turnkey.com.au