Post-Editing Automation (PEX)

PEX constructs are powerful regular expressions, which are used to define search and replace patterns within text. PEX constructs apply post-edits to translated text derived from a KantanMT engine. They are easy to use, yet very powerful and they can dramatically reduce the manual post-editing effort on KantanMT generated output; especially if there are alot of repetitious post-edits.

For example, suppose your KantanMT engine generates the following translations:-

  • Source Text:  Modèle SP17 pour ordinateur 17"
  • Target Text:  Modelo sp17 para computador 17”

In the above scenario, the KantanMT engine has incorrectly cased the product name SP17 and this is repeated many times throughout the KantanMT generated translations.  Traditionally, a post-editor would be tasked with reviewing these segments and modifying the target text, a process that can be both time-consuming and expensive.

However, a single PEX construct can be used to automate the correction or post-editing of this.


	sp17
    SP17

However, suppose there are many product names which all begin with SP, followed by two digits. In this scenario the above PEX construct would need to be modified and a more advanced construct used:-


	sp(\d\d)
	SP$1

In this PEX construct, the letters ‘sp’ followed by any two digits ‘\d\d’ are searched for and when found they are replaced by ‘SP’ followed by the same matching digits. By enclosing /d/d in brackets ( ) we can use this part of the match in our replacement text by referring to it as $1.

Special support for $1..$9

A maximum of nine special matches are supported by PEX - as in $1, $2..$9.

This makes PEX extremely flexible and powerful.

However, suppose the product numbers are made up of SP followed by up to four digits. Again, one PEX construct can be used to represent this:-


	sp(\d{2,4})
	SP$1

In this scenario, the letters ‘sp’ followed by a minimum of two to a maximum of four digits are matched and replaced with ‘SP’ followed by the same matching digits.

Now you can see the power of PEX constructs!

What is a PEX file?

A .PEX file contains PEX constructs that will be automatically applied to translations generated by your KantanMT engine.

<?xml version="1.0"encoding="utf8"?>

		
		
			sp(\d{2,4})
			SP$1
		
		
      
		
			console
			Admin-Console
		

To apply the PEX constructs against your KantanMT engine you simply upload your PEX file to your Client Data tab. The rest will happen automatically.

PEX Files need to be UTF8 encoded

It’s important that this file is encoded in UTF-8 as this allows you to use accented characters within and elements.

What are KantanMT Regex?

KantanMT Regex (Regular Expressions) are key components to building PEX and Gentry rule files. They are similiar to standard regular expressions with a few modifications to make them more powerful and flexible. For more information, please click here.




Contact Sales


 

Early Adopters


KantanMT.com welcomes all our early adopters to our platform