html_entity_decode

(PHP 4 >= 4.3.0, PHP 5)

html_entity_decode — Convert all HTML entities to their applicable characters

说明

string html_entity_decode ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' ]] )

html_entity_decode() is the opposite of htmlentities() in that it converts all HTML entities in the string to their applicable characters.

参数

string

The input string.

flags

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_COMPAT | ENT_HTML401.

**Available *`flags`* constants**
Constant Name	Description
`ENT_COMPAT`	Will convert double-quotes and leave single-quotes alone.
`ENT_QUOTES`	Will convert both double and single quotes.
`ENT_NOQUOTES`	Will leave both double and single quotes unconverted.
`ENT_HTML401`	Handle code as HTML 4.01.
`ENT_XML1`	Handle code as XML 1.
`ENT_XHTML`	Handle code as XHTML.
`ENT_HTML5`	Handle code as HTML 5.

encoding

Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

以下字符集设置从 PHP 4.3.0 版本开始被支持。

**支持的字符集列表**
字符集	别名	描述
ISO-8859-1	ISO8859-1	西欧，Latin-1
ISO-8859-15	ISO8859-15	西欧，Latin-9。增加欧元符号，法语和芬兰语字母在 Latin-1(ISO-8859-1) 中缺失。
UTF-8		ASCII 兼容的多字节 8 位 Unicode。
cp866	ibm866, 866	DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251	Windows-1251, win-1251, 1251	Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252	Windows-1252, 1252	Windows 特有的西欧编码。
KOI8-R	koi8-ru, koi8r	俄语。本字符集在 4.3.2 版本中得到支持。
BIG5	950	繁体中文，主要用于中国台湾省。
GB2312	936	简体中文，中国国家标准字符集。
BIG5-HKSCS		繁体中文，附带香港扩展的 Big5 字符集。
Shift_JIS	SJIS, 932	日语
EUC-JP	EUCJP	日语

Note: 其他字符集没有认可。可以使用 ISO-8859-1 来替代。

返回值

Returns the decoded string.

更新日志

版本	说明
5.4.0	Default encoding changed from ISO-8859-1 to UTF-8.
5.4.0	The constants `ENT_HTML401`, `ENT_XML1`, `ENT_XHTML` and `ENT_HTML5` were added.
5.0.0	Support for multi-byte encodings was added.

范例

Example #1 Decoding HTML entities


<?php
$orig = "I'll \"walk\" the <b>dog</b> now";

$a = htmlentities($orig);

$b = html_entity_decode($a);

echo $a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

echo $b; // I'll "walk" the <b>dog</b> now
?>

注释

Note:
You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 encoding.

参见

htmlentities() - Convert all applicable characters to HTML entities
htmlspecialchars() - Convert special characters to HTML entities
get_html_translation_table() - 返回使用 htmlspecialchars 和 htmlentities 后的转换表
urldecode() - 解码已编码的 URL 字符串

PHP手册 - N: Convert all HTML entities to their applicable characters

用户评论:

Victor (14-Dec-2011 02:22)

We were having very peculiar behavior regarding foreign characters such as e-acute. However, it was only showing up as a problem when extracting those characters out of our mysql database and when being displayed through a proxy server of ours that handles dns issues. As other users have made a note of, the default character setting wasn't what they were expecting it to be when they left theirs blank. When we changed our default_charset to "UTF-8", our problems and needs for using functions like these were no longer necessary in handling foreign characters such as e-acute. Good enough for us!

Martin (26-Jun-2011 12:37)

If you need something that converts &#[0-9]+ entities to UTF-8, this is simple and works: <?php /* Entity crap. / $input = "Fovič"; $output = preg_replace_callback("/(&#[0-9]+;)/", function($m) { return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); }, $input); /* Plain UTF-8. */ echo $output; ?>

John_Schlick at hotmail dot com (16-Feb-2011 10:39)

BE AWARE: The documentation around the default charset might be wrong. The changelog says: 5.3.3 Default charset changed from ISO-8859-1 to UTF-8. Despite the fact that we are running 5.3.3-7 when we do html_entity_decode(" ", ENT_QUOTES); we get "\xa0" the ISO-8859-1 version of a non breaking space. When we change this to: html_entity_decode(" ", ENT_QUOTES, 'UTF-8'); we properly get "\xc2\xa0" Implying that 'UTF-8' is NOT the default for our installation of php.

neurotic dot neu at gmail dot com (10-Aug-2010 08:25)

This is a safe rawurldecode with utf8 detection: <?php function utf8_rawurldecode($raw_url_encoded){ $enc = rawurldecode($raw_url_encoded); if(utf8_encode(utf8_decode($enc))==$enc){; return rawurldecode($raw_url_encoded); }else{ return utf8_encode(rawurldecode($raw_url_encoded)); } } ?>

Free at Key dot no (01-Jul-2010 01:51)

Handy function to convert remaining HTML-entities into human readable chars (for entities which do not exist in target charset): <?php function cleanString($in,$offset=null) { $out = trim($in); if (!empty($out)) { $entity_start = strpos($out,'&',$offset); if ($entity_start === false) { // ideal return $out; } else { $entity_end = strpos($out,';',$entity_start); if ($entity_end === false) { return $out; } // zu lang um eine entity zu sein else if ($entity_end > $entity_start+7) { // und weiter gehts $out = cleanString($out,$entity_start+1); } // gottcha! else { $clean = substr($out,0,$entity_start); $subst = substr($out,$entity_start+1,1); // &scaron; => "s" / š => "_" $clean .= ($subst != "#") ? $subst : "_"; $clean .= substr($out,$entity_end+1); // und weiter gehts $out = cleanString($clean,$entity_start+1); } } } return $out; } ?>

Matt Robinson (06-Sep-2009 10:11)

I wrote in a previous comment that html_entity_decode() only handled about 100 characters. That's not quite true; it only handles entities that exist in the output character set (the third argument). If you want to get ALL HTML entities, make sure you use ENT_QUOTES and set the third argument to 'UTF-8'. If you don't want a UTF-8 string, you'll need to convert it afterward with something like utf8_decode(), iconv(), or mb_convert_encoding(). If you're producing XML, which doesn't recognise most HTML entities: When producing a UTF-8 document (the default), then htmlspecialchars(html_entity_decode($string, ENT_QUOTES, 'UTF-8'), ENT_NOQUOTES, 'UTF-8') (because you only need to escape < and > and & unless you're printing inside the XML tags themselves). Otherwise, either convert all the named entities to numeric ones, or declare the named entities in the document's DTD. The full list of 252 entities can be found in the HTML 4.01 Spec, or you can cut and paste the function from my site (http://inanimatt.com/php-convert-entities.php).

marion at figmentthinking dot com (10-Mar-2009 01:11)

I just ran into the: Bug #27626 html_entity_decode bug - cannot yet handle MBCS in html_entity_decode()! The simple solution if you're still running PHP 4 is to wrap the html_entity_decode() function with the utf8_decode() function. <?php $string = ' '; $utf8_encode = utf8_encode(html_entity_decode($string)); ?> By default html_entity_decode() returns the ISO-8859-1 character set, and by default utf8_decode()... http://us.php.net/manual/en/function.utf8-decode.php "Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1"

jl dot garcia at gmail dot com (04-Mar-2009 11:33)

I created this function to filter all the text that goes in or comes out of the database. <?php function filter_string($string, $nohtml='', $save='') { if(!empty($nohtml)) { $string = trim($string); if(!empty($save)) $string = htmlentities(trim($string), ENT_QUOTES, 'ISO-8859-15'); else $string = html_entity_decode($string, ENT_QUOTES, 'ISO-8859-15'); } if(!empty($save)) $string = mysql_real_escape_string($string); else $string = stripslashes($string); return($string); } ?>

Anonymous (31-Jul-2008 06:01)

You may want to specify the character set if you see unexpected behavior. Here is an example. # cat test.php <?php $str = '!'; $quotes = html_entity_decode($str, ENT_QUOTES); $noquotes = html_entity_decode($str, ENT_NOQUOTES); $noquotesutf8 = html_entity_decode($str, ENT_NOQUOTES, 'UTF-8'); echo "quotes='$quotes', noquotes='$noquotes', noquotesutf8='$noquotesutf8'\n"; ?> # php test.php quotes='!', noquotes='!', noquotesutf8='!'

kae at verens dot com (09-May-2008 02:11)

the references to 'chr()' in the example unhtmlentities() function should be changed to unichr, using the example unichr() function described in the 'chr' reference (http://php.net/chr). the reason for this is characters such as € which do not break down into an ASCII number (that's the Euro, by the way).

me at richardsnazell dot com (21-Jan-2008 12:19)

I had a problem getting the 'TM' trademark symbol to display correctly in an email subject line. Using html_entity_decode() with different charsets didn't work, but directly replacing the entity with it's ASCII equivalent did: $subject = str_replace('™', chr(153), $subject);

jojo (04-Nov-2006 04:27)

The decipherment does the character encoded by the escape function of JavaScript. When the multi byte is used on the page, it is effective. javascript escape('aaああaa') ..... 'aa%u3042%u3042aa' php jsEscape_decode('aa%u3042%u3042aa')..'aaああaa' <?php function jsEscape_decode($jsEscaped,$outCharCode='SJIS'){ $arrMojis = explode("%u",$jsEscaped); for ($i = 1;$i < count($arrMojis);$i++){ $c = substr($arrMojis[$i],0,4); $cc = mb_convert_encoding(pack('H*',$c),$outCharCode,'UTF-16'); $arrMojis[$i] = substr_replace($arrMojis[$i],$cc,0,4); } return implode('',$arrMojis); } ?>

romekt at CUTTHISgmail dot com (01-Sep-2006 10:15)

here's a simple workaround for the UTF-8 support problem <?php $var=iconv("UTF-8","ISO-8859-1",$var); $var=html_entity_decode($var, ENT_QUOTES, 'ISO-8859-1'); $var=iconv("ISO-8859-1","UTF-8",$var); ?>

grvg (at) free (dot) fr (29-Jul-2006 05:44)

Here is the ultimate functions to convert HTML entities to UTF-8?: The main function is?htmlentities2utf8 Others are helper functions <?php function chr_utf8($code) { if ($code < 0) return false; elseif ($code < 128) return chr($code); elseif ($code < 160) // Remove Windows Illegals Cars { if ($code==128) $code=8364; elseif ($code==129) $code=160; // not affected elseif ($code==130) $code=8218; elseif ($code==131) $code=402; elseif ($code==132) $code=8222; elseif ($code==133) $code=8230; elseif ($code==134) $code=8224; elseif ($code==135) $code=8225; elseif ($code==136) $code=710; elseif ($code==137) $code=8240; elseif ($code==138) $code=352; elseif ($code==139) $code=8249; elseif ($code==140) $code=338; elseif ($code==141) $code=160; // not affected elseif ($code==142) $code=381; elseif ($code==143) $code=160; // not affected elseif ($code==144) $code=160; // not affected elseif ($code==145) $code=8216; elseif ($code==146) $code=8217; elseif ($code==147) $code=8220; elseif ($code==148) $code=8221; elseif ($code==149) $code=8226; elseif ($code==150) $code=8211; elseif ($code==151) $code=8212; elseif ($code==152) $code=732; elseif ($code==153) $code=8482; elseif ($code==154) $code=353; elseif ($code==155) $code=8250; elseif ($code==156) $code=339; elseif ($code==157) $code=160; // not affected elseif ($code==158) $code=382; elseif ($code==159) $code=376; } if ($code < 2048) return chr(192 | ($code >> 6)) . chr(128 | ($code & 63)); elseif ($code < 65536) return chr(224 | ($code >> 12)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63)); else return chr(240 | ($code >> 18)) . chr(128 | (($code >> 12) & 63)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63)); } // Callback for preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $str); function html_entity_replace($matches) { if ($matches[2]) { return chr_utf8(hexdec($matches[3])); } elseif ($matches[1]) { return chr_utf8($matches[3]); } switch ($matches[3]) { case "nbsp": return chr_utf8(160); case "iexcl": return chr_utf8(161); case "cent": return chr_utf8(162); case "pound": return chr_utf8(163); case "curren": return chr_utf8(164); case "yen": return chr_utf8(165); //... etc with all named HTML entities } return false; } function htmlentities2utf8 ($string) // because of the html_entity_decode() bug with UTF-8 { $string = preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $string); return $string; } ?>

loufoque (08-Oct-2005 09:15)

If you want to decode NCRs to utf-8 use this function instead of chr(). <?php function utf8_chr($code) { if($code<128) return chr($code); else if($code<2048) return chr(($code>>6)+192).chr(($code&63)+128); else if($code<65536) return chr(($code>>12)+224).chr((($code>>6)&63)+128).chr(($code&63)+128); else if($code<2097152) return chr($code>>18+240).chr((($code>>12)&63)+128) .chr(($code>>6)&63+128).chr($code&63+128)); } ?>

florianborn (at) yahoo (dot) de (20-Jul-2005 11:43)

Note that <?php echo urlencode(html_entity_decode(" ")); ?> will output "%A0" instead of "+".

gaui at gaui dot is (05-Jul-2005 01:15)

[If you are missing the html_entity_decode() function in your version of PHP, you may wish to try this code snippet.] <?php if( !function_exists( 'html_entity_decode' ) ) { function html_entity_decode( $given_html, $quote_style = ENT_QUOTES ) { $trans_table = array_flip(get_html_translation_table( HTML_SPECIALCHARS, $quote_style )); $trans_table['''] = "'"; return ( strtr( $given_html, $trans_table ) ); } } ?>

marius (at) hot (dot) ee (08-Apr-2005 02:40)

To convert html entities into unicode characters, use the following: <?php $trans_tbl = get_html_translation_table(HTML_ENTITIES); foreach($trans_tbl as $k => $v) { $ttr[$v] = utf8_encode($k); } $text = strtr($text, $ttr); ?>

php dot net at c dash ovidiu dot tk (18-Mar-2005 08:37)

Quick & dirty code that translates numeric entities to UTF-8. <?php function replace_num_entity($ord) { $ord = $ord[1]; if (preg_match('/^x([0-9a-f]+)$/i', $ord, $match)) { $ord = hexdec($match[1]); } else { $ord = intval($ord); } $no_bytes = 0; $byte = array(); if ($ord < 128) { return chr($ord); } elseif ($ord < 2048) { $no_bytes = 2; } elseif ($ord < 65536) { $no_bytes = 3; } elseif ($ord < 1114112) { $no_bytes = 4; } else { return; } switch($no_bytes) { case 2: { $prefix = array(31, 192); break; } case 3: { $prefix = array(15, 224); break; } case 4: { $prefix = array(7, 240); } } for ($i = 0; $i < $no_bytes; $i++) { $byte[$no_bytes - $i - 1] = (($ord & (63 * pow(2, 6 * $i))) / pow(2, 6 * $i)) & 63 | 128; } $byte[0] = ($byte[0] & $prefix[0]) | $prefix[1]; $ret = ''; for ($i = 0; $i < $no_bytes; $i++) { $ret .= chr($byte[$i]); } return $ret; } $test = 'This is a čא test''; echo $test . "<br />\n"; echo preg_replace_callback('/&#([0-9a-fx]+);/mi', 'replace_num_entity', $test); ?>

Silvan (29-Jan-2005 03:33)

Passing NULL or FALSE as a string will generate a '500 Internal Server Error' (or break the script when inside a function). So always test your string first before passing it to html_entity_decode().

daniel at brightbyte dot de (14-Nov-2004 02:12)

This function seems to have to have two limitations (at least in PHP 4.3.8): a) it does not work with multibyte character codings, such as UTF-8 b) it does not decode numeric entity references a) can be solved by using iconv to convert to ISO-8859-1, then decoding the entities, than convert to UTF-8 again. But that's quite ugly and detroys all characters not present in Latin-1. b) can be solved rather nicely using the following code: <?php function decode_entities($text) { $text= html_entity_decode($text,ENT_QUOTES,"ISO-8859-1"); #NOTE: UTF-8 does not work! $text= preg_replace('/&#(\d+);/me',"chr(\\1)",$text); #decimal notation $text= preg_replace('/&#x([a-f0-9]+);/mei',"chr(0x\\1)",$text); #hex notation return $text; } ?> HTH

aidan at php dot net (14-Sep-2004 08:57)

This functionality is now implemented in the PEAR package PHP_Compat. More information about using this function without upgrading your version of PHP can be found on the below link: http://pear.php.net/package/PHP_Compat