SRX Extensions

From Okapi Framework
Jump to: navigation, search

The Okapi framework implements a few extensions to SRX.

It is important to use those extensions carefully as they may make your SRX document incompatible with other SRX implementations. They may also make your SRX document compatible with some implementations, for example some systems trim white spaces from segments automatically even if there is no such option is SRX 2.0.

The namespace to declare them is http://okapi.sf.net/srx-extensions

Most of the extensions are placed at the end of the <header> element of the SRX file. For example:

<srx xmlns="http://www.lisa.org/srx20" version="2.0"
 xmlns:okpsrx="http://okapi.sf.net/srx-extensions">
 <header segmentsubflows="yes" cascade="no">
  <formathandle type="start" include="no"></formathandle>
  <formathandle type="end" include="yes"></formathandle>
  <formathandle type="isolated" include="no"></formathandle>
  <okpsrx:options oneSegmentIncludesAll="yes"
   trimLeadingWhitespaces="yes"
   trimTrailingWhitespaces="yes"
  />
  <okpsrx:sample language="en" useMappedRules="yes">Mr. Holmes is from the U.K.</okpsrx:sample>
  <okpsrx:rangeRule>pattern</okpsrx:rangeRule>
 </header>
 <body>
 ...
</srx>

trimLeadingWhitespaces

This extension indicates that leading white spaces in the segments should be moved out of the segment.

Possible values: yes or no Default: no

A value yes makes the SRX document not compatible with strict SRX implementations.

Example:

<okpsrx:options trimLeadingWhitespaces="yes" />

trimTrailingWhitespaces

This extension indicates that trailing white spaces in the segments should be moved out of the segment.

Possible values: yes or no Default: no

A value yes makes the SRX document not compatible with strict SRX implementations.

Example:

<okpsrx:options trimTrailingWhitespaces="yes" />

oneSegmentIncludesAll

This extension indicates that when an entry (text unit) is composed of a single segment after the segmentation has been applied, that segment must include all the content of the entry, for example overriding the trimming extensions.

Possible values: yes or no Default: no

A value yes makes the SRX document not compatible with strict SRX implementations.

Example:

<okpsrx:options oneSegmentIncludesAll="yes" />

sample

This extension allows you to store a sample text in the SRX document. That text can be used to what segmentation results from applying the rules of the document.

The language="yes|no" attribute indicates the language code to use when applying the rules on the sample.

The useMappedRules="yes|no" attributes indicates if all languages rules for the given language should be applied on the sample. If the value is no only the language rules currently displayed should be used.

Defaults: Empty sample text, language: en, useMappedRules: yes

This extension does not affect the compatibility of your rules.

Example:

<okpsrx:sample
 language="en"
 useMappedRules="no"
>Mr. Holmes is from the U.K.</okpsrx:sample>

rangeRule

This extension indicates that the specified regular expression patter should be searched on the text unit being segmented and if found, a segment corresponding to the matching content should be made a segment, overriding normal segmentation rules.

This is useful for example in content where you may have a great number of inline codes mixed with no-translatable text and translatable text where using normal rules is too complicated.

A non-empty content makes the SRX document not compatible with strict SRX implementations (if the specified pattern occurs in the text being segmented).

Example:

<okpsrx:rangeRule>thePattern</okpsrx:rangeRule>