mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5)

mb_decode_numericentity — Decode HTML numeric string reference to character

说明

string mb_decode_numericentity ( string $str , array $convmap , string $encoding )

Convert numeric string reference of string str in a specified block to character.

参数

str: The string being decoded.
convmap: convmap is an array that specifies the code area to convert.
encoding: encoding 参数为字符编码。如果省略，则使用内部字符编码。

返回值

The converted string.

范例

Example #1 convmap example


<?php
$convmap = array (
   int start_code1, int end_code1, int offset1, int mask1,
   int start_code2, int end_code2, int offset2, int mask2,
   ........
   int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN, 
// then convert value to numeric string reference.
?>

参见

mb_encode_numericentity() - Encode character to HTML numeric string reference

PHP手册 - N: Decode HTML numeric string reference to character

用户评论:

tom (13-Apr-2011 10:46)

When I use this function, I had found an error. Example) ------------------------ INPUT STRING1 : abc& INPUT STRING2 : abc&# ------------------------ OUTPUT STRING : abc ------------------------ <?php $input = 'abc&'; $convmap = array (0x0, 0xffff, 0, 0xffff); $output = mb_decode_numericentity($intput, $convmap, 'UTF-8'); echo $output; ?> result : abc If an input string is finished with some characters such the beginning of NCR-form, this function remove that characters. So, I use an trick. <?php function decode_numericentity($string){ $string = $string.chr(32); $string = mb_decode_numericentity($string, $convmap, 'UTF-8'); $pos = strlen($string)-1; //if(ord($string[$pos]) == 32){ $string = substr($string,0,$pos); //} return $string; } ?>

Navi (01-Apr-2009 09:00)

Manual entity => utf8 conversion: <?php // parse entities $raw = preg_replace_callback ( "/&#(\\d+);/u", "_pcreEntityToUtf", $raw ); function _pcreEntityToUtf($matches) { $char = intval(is_array($matches) ? $matches[1] : $matches); if ($char < 0x80) { // to prevent insertion of control characters if ($char >= 0x20) return htmlspecialchars(chr($char)); else return "&#$char;"; } else if ($char < 0x8000) { return chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char)); } else { return chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char)); } } ?>

donovan at conduit it (19-Apr-2006 05:05)

note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities. This fact would have saved me a good hour of time in debugging. For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.

dirk at camindo de (30-Jan-2005 05:51)

By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the function mb_encode_numericentity before: // convert $text from UTF-8 to ISO-8859-1 $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF); $text = mb_encode_numericentity($text, $convmap, "UTF-8"); $text = utf8_decode($text); The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF

Andrew Simpson (11-Dec-2004 01:29)

Many web browsers will tend upload high order characters as UTF-8 encoded entities. Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters: <?php //decode decimal HTML entities added by web browser $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body ); //decode hex HTML entities added by web browser $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body ); //callback function for the regex function utf8_entity_decode($entity){ $convmap = array(0x0, 0x10000, 0, 0xfffff); return mb_decode_numericentity($entity, $convmap, 'UTF-8'); } ?>

php at cNhOiSpPpAlMe dot org (31-Mar-2004 08:55)

Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text. <?php // Supported characters: // (space) // !#$%&()*+,./0123456789:;<=>?@ // ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` // abcdefghijklmnopqrstuvwxyz{|} // (Katakana isn't supported.) function f_han2zen ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x20,0x20,0x3000-0x20,0xffff, // Space 0x21,0x7e,0xff01-0x21,0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } function f_zen2han ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x3000,0x3000,-(0x3000-0x20),0xffff, // Space 0xff01,0xff5e,-(0xff01-0x21),0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } // Sample usage: f_han2zen("test","shift_jis"); f_han2zen("test","utf-8"); ?>

dev at glossword info (19-Nov-2003 03:43)

Just two great functions for daily use: /* Converts any HTML-entities into characters */ function my_numeric2character($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_decode_numericentity($t, $convmap, 'UTF-8'); } /* Converts any characters into HTML-entities */ function my_character2numeric($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_encode_numericentity($t, $convmap, 'UTF-8'); } print my_numeric2character('’ ἀ â'); print my_character2numeric(' ');