lautr.com

Hannes Blog for Development and Stuff besides

Convert all applicable characters to Numeric entities for use in XML

| Keine Kommentare

If you wanna make sure your text gets parsed  correctly you mostly use htmlentities. However this method has 2 downsides:

1. It does not convert in to numeric entities so you’ll have problems when parsing as XML

2. It does NOT cover all characters that are like to show up.

So, to address this Issues, first for Point 1:

function _convertAlphaEntitysToNumericEntitys($entity){
return '&#'.ord(html_entity_decode($entity[0])).';';
}
$content = preg_replace_callback('/&([\w\d]+);/i','_convertAlphaEntitysToNumericEntitys',$content);

Here all “normal” entities are taken (which you already have, using  htmlentities) and replaced by their numeric counterparts so they can be parsed as XML, now that leaves us with our second Problem, the Fact that only a small range of characters is covered in the first Place:

function _convertAsciOver127toNumericEntitys($entity){
if(($asciCode = ord($entity[0])) > 127){
return '&#'.$asciCode.';';
}else{
return $entity[0];
}
}
$content = preg_replace_callback('/[^\w\d ]/i','_convertAsciOver127toNumericEntitys'), $content);

And there you go, the resulting Text should have no entitie Problems in XML.

Autor: Hannes

Hi! I’m Johannes Lauter a 26 year old Web Application Developer based in Berlin ... more

Hinterlasse eine Antwort

Pflichtfelder sind mit * markiert.

*