Fix htmlspecialchars() in PHP 5.4+ for Latin1 (ISO-8859-1)
Problem
In many PHP legacy products the function htmlspecialchars($string) is used to convert characters like < and > and quotes etc. to HTML entities. This avoids the interpretation of HTML tags and asymmetric quote situations.Since PHP 5.4 for $string in htmlspecialchars($string) utf8 characters are expected if no charset is defined explicitly as third parameter in the function. Legacy products are mostly in Latin1 (alias iso-8859-1) which causes the functions htmlspecialchars(), htmlentites() and html_entity_decode() to return empty strings if a special character, e. g. a German Umlaut, is present in $string:
PHP<5.4
echo htmlspecialchars('<b>Woermann</b>') //Output: <b>Woermann</b>echo htmlspecialchars('<b>Wörmann</b>') //Output: <b>Wörmann</b>
PHP>=5.4
echo htmlspecialchars('<b>Woermann</b>') //Output: <b>Woermann</b>echo htmlspecialchars('<b>Wörmann</b>') //Output: empty
Three alternative solutions
a) Non-runnig legacy products on PHP 5.4b) Change all find spots in your code from
htmlspecialchars($string) and *** to
htmlspecialchars($string, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1')
c) Replace all htmlspecialchars() and *** with a new self-made function
*** The same is true for htmlentities() and html_entity_decode() and htmlspecialchars_decode() and get_html_translation_table()
Solution c
1 Make Search and Replace in the concerned legacy project:
Search for: htmlspecialchars
Replace with: htmlXspecialchars
Search for: htmlentities
Replace with: htmlXentities
Search for: html_entity_decode
Replace with: htmlX_entity_decode
2a Copy and paste the following three functions in an existing PHP file already included everywhere in your legacy project. (of course this PHP file must be included only once per request; otherwise, you will get a Redeclare Function Fatal Error).
return htmlspecialchars($string, $ent, $charset);
}
function htmlXentities($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
return htmlentities($string, $ent, $charset);
}
function htmlX_entity_decode($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
return html_entity_decode($string, $ent, $charset);
}
or 2b create a new PHP file containing the three functions mentioned above, let's say, e. g. htmlXfunctions.inc.php and include it on the first line of every PHP file in your legacy product like this: require_once('htmlXfunctions.inc.php').
Addendum
If your legacy product also uses the functions htmlspecialchars_decode() und get_html_translation_table() you have to implement them into the solution the same way as html_entity_decode().Also see more developed version: pdf “htmlXfunctions”
Another compatibility issue: Emulate PHP’s original mysql extension with mysqli mapping functions