字符串函数
在线手册:中文 英文
PHP手册

htmlspecialchars

(PHP 4, PHP 5)

htmlspecialcharsConvert special characters to HTML entities

说明

string htmlspecialchars ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' [, bool $double_encode = true ]]] )

Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

This function is useful in preventing user-supplied text from containing HTML markup, such as in a message board or guest book application.

The translations performed are:

参数

string

The string being converted.

flags

A bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_COMPAT | ENT_HTML401.

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_IGNORE Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
ENT_DISALLOWED Replace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

encoding

Defines encoding used in conversion. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

For the purposes of this function, the encodings ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, provided the string itself is valid for the encoding, as the characters affected by htmlspecialchars() occupy the same positions in all of these encodings.

以下字符集设置从 PHP 4.3.0 版本开始被支持。

支持的字符集列表
字符集 别名 描述
ISO-8859-1 ISO8859-1 西欧,Latin-1
ISO-8859-15 ISO8859-15 西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1) 中缺失。
UTF-8   ASCII 兼容的多字节 8 位 Unicode。
cp866 ibm866, 866 DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251 Windows-1251, win-1251, 1251 Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252 Windows-1252, 1252 Windows 特有的西欧编码。
KOI8-R koi8-ru, koi8r 俄语。本字符集在 4.3.2 版本中得到支持。
BIG5 950 繁体中文,主要用于中国台湾省。
GB2312 936 简体中文,中国国家标准字符集。
BIG5-HKSCS   繁体中文,附带香港扩展的 Big5 字符集。
Shift_JIS SJIS, 932 日语
EUC-JP EUCJP 日语

Note: 其他字符集没有认可。可以使用 ISO-8859-1 来替代。

double_encode

When double_encode is turned off PHP will not encode existing html entities, the default is to convert everything.

返回值

The converted string.

If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

更新日志

版本 说明
5.4.0 The default value for the encoding parameter was changed to UTF-8.
5.4.0 The constants ENT_SUBSTITUTE, ENT_DISALLOWED, ENT_HTML401, ENT_XML1, ENT_XHTML and ENT_HTML5 were added.
5.3.0 The constant ENT_IGNORE was added.
5.2.3 The double_encode parameter was added.
4.1.0 The encoding parameter was added.

范例

Example #1 htmlspecialchars() example

<?php
$new 
htmlspecialchars("<a href='test'>Test</a>"ENT_QUOTES);
echo 
$new// &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt;
?>

注释

Note:

Note that this function does not translate anything beyond what is listed above. For full entity translation, see htmlentities().

参见


字符串函数
在线手册:中文 英文
PHP手册
PHP手册 - N: Convert special characters to HTML entities

用户评论:

steve at mcdragonsoftware dot com (18-Oct-2011 05:13)

I am working with xml and zip functions to create an xlsx document from a template. Just as I thought I had it finished it stopped working. After a bit of hunting I discovered my zip file began with php notices about undefined constant. I have no idea why my installation can't remember what ENT_XML1 is, it used to know it (or so I thought)

To save anyone else this headache I recommend adding the code at the top of your scripts to verify that these constants are registered. Something like:

defined( "ENT_XML1") or define( "ENT_XML1",        16    );

for each constant you use. Again, I don't know why this problem suddenly came up, but better safe than "Excel cannot open the file....... format... extension... corrupted... blah blah blah".

cheers :)

here's a list of the constant values (since they are not on this page) as taken from html.h for php5

#define ENT_HTML_QUOTE_NONE            0
#define ENT_HTML_QUOTE_SINGLE           1
#define ENT_HTML_QUOTE_DOUBLE                2
#define ENT_HTML_IGNORE_ERRORS        4
#define ENT_HTML_SUBSTITUTE_ERRORS     8
#define ENT_HTML_DOC_TYPE_MASK        (16|32)
#define ENT_HTML_DOC_HTML401           0
#define ENT_HTML_DOC_XML1            16
#define ENT_HTML_DOC_XHTML            32
#define ENT_HTML_DOC_HTML5            (16|32)
/* reserve bit 6 */
#define ENT_HTML_SUBSTITUTE_DISALLOWED_CHARS    128

#define ENT_COMPAT        ENT_HTML_QUOTE_DOUBLE
#define ENT_QUOTES        (ENT_HTML_QUOTE_DOUBLE | ENT_HTML_QUOTE_SINGLE)
#define ENT_NOQUOTES    ENT_HTML_QUOTE_NONE
#define ENT_IGNORE        ENT_HTML_IGNORE_ERRORS
#define ENT_SUBSTITUTE    ENT_HTML_SUBSTITUTE_ERRORS
#define ENT_HTML401        0
#define ENT_XML1        16
#define ENT_XHTML        32
#define ENT_HTML5        (16|32)
#define ENT_DISALLOWED    128

ivan at lutrov dot com (15-Jun-2011 09:30)

Be careful, the "charset" argument IS case sensitive. This is counter-intuitive and serves no practical purpose because the HTML spec actually has the opposite.

info at 8th dot at (12-May-2011 07:28)

English:
I'd found THE Final Solution!
it finds and replaces all unknown letters!
(like ?, ?, ?, ?, and much much more)

it turn em in a HTML AND XML compatible format

parameter: $text: a String with unsuported letters in it
return: a String where all unsupported(XML und HTML) letters are changed into the Unicode value (for example &#196;)

Deutsch/German:
Ich hab die perfekte L?sung gefunden!
Es findet und tauscht alle unbekannten Buchstaben!
(wie ?, ?, ?, ?, und viel viel mehr)

es tauscht sie in ein HTML und XML kompatibles Format

parameter: $text: ein String mit nichtunterstüzten Buchstaben
return: ein String wo alle von XML und HTML ununterstützten Buchstaben ins Unicode-Format getauscht sind (z.B. &#196;)

FUNCTION:

<?php
function umlaute($text){
   
$returnvalue="";
    for(
$i=0;$i<strlen($text);$i++){
       
$teil=hexdec(rawurlencode(substr($text, $i, 1)));
        if(
$teil<32||$teil>1114111){
           
$returnvalue.=substr($text, $i, 1);
        }else{
           
$returnvalue.="&#".$teil.";";
        }
    }
    return
$returnvalue;
}
?>

pinkgothic at gmail dot com (11-Mar-2011 01:00)

Please note that this function results in an E_WARNING when display_errors is off and an invalid multibyte string is passed to it (e.g. with 'utf-8' as the encoding parameter and broken utf-8 characters somewhere in the string).

This is ESPECIALLY IMPORTANT if you have an EXCEPTION-THROWING ERROR HANDLER, since even though you can't reproduce it in a development mode where display_errors is on, you MUST wrap your function call in a try-catch, or your application will crash.

[ The reason PHP makes this distinction is because this is a core function and many production servers are misconfigured to have display_errors on (to prevent such things as path disclosure from error messages from accidentally cropping up). See: http://bugs.php.net/bug.php?id=47494 ]

Thomasvdbulk at gmail dot com (28-Dec-2010 06:05)

i searched for a while for a script, that could see the difference between an html tag and just < and > placed in the text,
the reason is that i recieve text from a database,
wich is inserted by an html form, and contains text and html tags,
the text can contain < and >, so does the tags,
with htmlspecialchars you can validate your text to XHTML,
but you'll also change the tags, like <b> to &lt;b&gt;,
so i needed a script that could see the difference between those two...
but i couldn't find one so i made my own one,
i havent fully tested it, but the parts i tested worked perfect!
just for people that were searching for something like this,
it may looks big, could be done easier, but it works for me, so im happy.

<?php
function fixtags($text){
$text = htmlspecialchars($text);
$text = preg_replace("/=/", "=\"\"", $text);
$text = preg_replace("/&quot;/", "&quot;\"", $text);
$tags = "/&lt;(\/|)(\w*)(\ |)(\w*)([\\\=]*)(?|(\")\"&quot;\"|)(?|(.*)?&quot;(\")|)([\ ]?)(\/|)&gt;/i";
$replacement = "<$1$2$3$4$5$6$7$8$9$10>";
$text = preg_replace($tags, $replacement, $text);
$text = preg_replace("/=\"\"/", "=", $text);
return
$text;
}
?>

an example:

<?php
$string
= "
this is smaller < than this<br />
this is greater > than this<br />
this is the same = as this<br />
<a href=\"http://www.example.com/example.php?test=test\">This is a link</a><br />
<b>Bold</b> <i>italic</i> etc..."
;
echo
fixtags($string);
?>

will echo:
this is smaller &lt; than this<br />
this is greater &gt; than this<br />
this is the same = as this<br />
<a href="http://www.example.com/example.php?test=test">This is a link</a><br />
<b>Bold</b> <i>italic</i> etc...

I hope its helpfull!!

Anonymous (01-Aug-2010 09:48)

This may seem obvious, but if you want to output arbitrary (i.e. user-input) data as an attribute inside an HTML tag (such as the INPUT tags on a FORM), be aware of whether you are using ENT_QUOTES or ENT_COMPAT.  If you're using ENT_COMPAT, the attribute must be wrapped in double-quotes, as single-quotes will not be encoded and the user will be able to inject arbitrary HTML attributes (including javascript behavior) inside the tag, even though they will not be able to inject arbitrary HTML tags.

Also, if you want to allow users to input HTML attributes without them being double-encoded on display, there are two ways to accomplish this:

1 - Run their input through htmlentities_decode() followed by htmlspecialchars().

2 - Call htmlspecialchars() with $double_encode=false.

There is one functional difference between these two methods:  If you want to perform any search-replace on a user's input (such as word censoring in a message-board application), the second method will allow users to circumvent it by HTML-encoding their input, whereas the first will not.

nachitox2000 [at] hotmail [dot] com (01-Jul-2010 01:57)

I had problems with spanish special characters. So i think in using htmlspecialchars but my strings also contain HTML.
So I used this :) Hope it help

<?php
function htmlspanishchars($str)
{
    return
str_replace(array("&lt;", "&gt;"), array("<", ">"), htmlspecialchars($str, ENT_NOQUOTES, "UTF-8"));
}
?>

nessthehero at gmail dot com (19-May-2010 07:53)

Here's a simple function I wrote for parsing form data.

It checks if it's an array and it is recursive (it calls itself).

It also decodes things that have already been encoded so it doesn't change &amp; to &amp;amp;

[In this version,] I found it easier to use a regular expression to check and see if any previously encoded data exists, then decode it repeatedly until there is none left, then re-encode it.

<?php
function formspecialchars($var)
    {
       
$pattern = '/&(#)?[a-zA-Z0-9]{0,};/';
       
        if (
is_array($var)) {    // If variable is an array
           
$out = array();      // Set output as an array
           
foreach ($var as $key => $v) {     
               
$out[$key] = formspecialchars($v);         // Run formspecialchars on every element of the array and return the result. Also maintains the keys.
           
}
        } else {
           
$out = $var;
            while (
preg_match($pattern,$out) > 0) {
               
$out = htmlspecialchars_decode($out,ENT_QUOTES);      
            }                            
           
$out = htmlspecialchars(stripslashes(trim($out)), ENT_QUOTES,'UTF-8',true);     // Trim the variable, strip all slashes, and encode it
           
       
}
       
        return
$out;
    }
?>

alif (11-Feb-2010 05:17)

I had XML files with both '&amp;' , '&copyright;' and '&' characters. So, basically, I wrote this preg_replace, which replaces all '&' thats not an  entity to '&amp;'.

So '&amp;' doesnot get converted to '&ampamp;', only '&' gets converted to '&amp;'. Also, '&copyright;' , '&#160;' remains unaffected. Its basic, feel free to modify it.

preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$theString);

timgvdh at gmail dot com (16-Dec-2009 04:54)

Here's something that replicates the appearance of <pre> but still allows for word wrap:

<?php
function special_formatting($input) {
   
$output = htmlspecialchars($input, ENT_QUOTES);
   
$output = str_replace(array('  ', "\n"), array('&nbsp;&nbsp;', '<br>'), $output);
    return
str_replace('&nbsp; ', '&nbsp;&nbsp;', $output);
}
?>

Anonymous (18-Sep-2009 06:16)

This may seem obvious, but it caused me some frustration. If you try and use htmlspecialchars with the $charset argument set and the string you run it on is not actually the same charset you specify, you get any empty string returned without any notice/warning/error.

<?php

$ok_utf8
= "A valid UTF-8 string";
$bad_utf8 = "An invalid UTF-8 string";

var_dump(htmlspecialchars($bad_utf8, ENT_NOQUOTES, 'UTF-8'));  // string(0) ""

var_dump(htmlspecialchars($ok_utf8, ENT_NOQUOTES, 'UTF-8'));  // string(20) "A valid UTF-8 string"

?>

So make sure your charsets are consistent

<?php

$bad_utf8
= "An invalid UTF-8 string";

// make sure it's really UTF-8
$bad_utf8 = mb_convert_encoding($bad_utf8, 'UTF-8', mb_detect_encoding($bad_utf8));

var_dump(htmlspecialchars($bad_utf8, ENT_NOQUOTES, 'UTF-8'));  // string(23) "An invalid UTF-8 string"

?>

I had this problem because a Mac user was submitting posts copy/pasted from a program and it contained weird chars in it.

Anonymous (17-Sep-2009 05:43)

Just a few notes on how one can use htmlspecialchars() and htmlentities() to filter user input on forms for later display and/or database storage...

1. Use htmlspecialchars() to filter text input values for html input tags.  i.e.,

echo '<input name=userdata type=text value="'.htmlspecialchars($data).'" />';

 
2. Use htmlentities() to filter the same data values for most other kinds of html tags, i.e.,

echo '<p>'.htmlentities($data).'</p>';

3. Use your database escape string function to filter the data for database updates & insertions, for instance, using postgresql,

pg_query($connection,"UPDATE datatable SET datavalue='".pg_escape_string($data)."'");
 

This strategy seems to work well and consistently, without restricting anything the user might like to type and display, while still providing a good deal of protection against a wide variety of html and database escape sequence injections, which might otherwise be introduced through deliberate and/or accidental input of such character sequences by users submitting their input data via html forms.

chuck at N0SPAM1command dot com (12-Aug-2009 03:06)

NOTE:
I made an error in my last post.

The last 3 lines should have read
<?php

...

$text = get_page($url);
--------^^^^^^^^
$new = htmlspecialchars($text, ENT_QUOTES); // here is the magic :)

   
echo '<pre>' .$new. '</pre>';

?>

OOPS!

chuck at N0SPAM1command dot com (12-Aug-2009 01:57)

Need to dump the source of page retrieved via <a href="http://us3.php.net/curl">CURL</a>?
I found it's easily done with htmlspecialchars()

eg;

<?php

function get_page($url)
{
 
$curl = curl_init();
 
curl_setopt($curl, CURLOPT_URL, $url);
 
curl_setopt($curl, CURLOPT_USERAGENT, 'some bot');
 
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
 
curl_setopt($curl, CURLOPT_REFERER, '-');
 
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
...
   
curl_setopt($curl, CURLOPT_HEADER, 1);
   
curl_setopt($curl, CURLOPT_NOBODY, 0);
 
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
 
$html = curl_exec($curl);
 
curl_close($curl);

  return
$html;
}

$text = get_page($url);
$new = htmlspecialchars($text, ENT_QUOTES); // here is the magic :)

   
echo '<pre>' .$new. '</pre>';


?>

HTH

hm2k at php.net (22-Jun-2009 06:02)

<?php
/**
 * A recursive version of htmlspecialchars() for arrays and strings.
 *
 */

function htmlspecialchars_deep($mixed, $quote_style = ENT_QUOTES, $charset = 'UTF-8')
{
    if (
is_array($mixed)) {
        foreach(
$mixed as $key => $value) {
           
$mixed[$key] = htmlspecialchars_deep($value, $quote_style, $charset);
        }
    } elseif (
is_string($mixed)) {
       
$mixed = htmlspecialchars(htmlspecialchars_decode($mixed, $quote_style), $quote_style, $charset);
    }
    return
$mixed;
}
?>

hello at haroonahmad dot co dot uk (23-Mar-2009 09:56)

a common confusion among beginner is that what is the difference between htmlentities() and htmlspecialchars() really, because the manual examples are converting angular brackets for both.

well, htmlentities() will ALSO look for other language characters in the string e.g German, French or Italian etc. So if you think your attacker can use some foreign language characters for a XSS attack in URL etc then use htmlentities() instead of htmlspecialchars().

I hope it helps,

Haroon Ahmad

Kenneth Kin Lum (09-Oct-2008 02:45)

if your goal is just to protect your page from Cross Site Scripting (XSS) attack, or just to show HTML tags on a web page (showing <body> on the page, for example), then using htmlspecialchars() is good enough and better than using htmlentities().  A minor point is htmlspecialchars() is faster than htmlentities().  A more important point is, when we use  htmlspecialchars($s) in our code, it is automatically compatible with UTF-8 string.  Otherwise, if we use htmlentities($s), and there happens to be foreign characters in the string $s in UTF-8 encoding, then htmlentities() is going to mess it up, as it modifies the byte 0x80 to 0xFF in the string to entities like &eacute;.  (unless you specifically provide a second argument and a third argument to htmlentities(), with the third argument being "UTF-8").

The reason htmlspecialchars($s) already works with UTF-8 string is that, it changes bytes that are in the range 0x00 to 0x7F to &lt; etc, while leaving bytes in the range 0x80 to 0xFF unchanged.  We may wonder whether htmlspecialchars() may accidentally change any byte in a 2 to 4 byte UTF-8 character to &lt; etc.  The answer is, it won't.  When a UTF-8 character is 2 to 4 bytes long, all the bytes in this character is in the 0x80 to 0xFF range. None can be in the 0x00 to 0x7F range.  When a UTF-8 character is 1 byte long, it is just the same as ASCII, which is 7 bit, from 0x00 to 0x7F.  As a result, when a UTF-8 character is 1 byte long, htmlspecialchars($s) will do its job, and when the UTF-8 character is 2 to 4 bytes long, htmlspecialchars($s) will just pass those bytes unchanged.  So htmlspecialchars($s) will do the same job no matter whether $s is in ASCII, ISO-8859-1 (Latin-1), or UTF-8.

brendel at krumedia dot de (15-May-2008 06:28)

I know some people posted similar functions but may be you are looking for this version:

function jschars($str)
{
    $str = mb_ereg_replace("\\\\", "\\\\", $str);
    $str = mb_ereg_replace("\"", "\\\"", $str);
    $str = mb_ereg_replace("'", "\\'", $str);
    $str = mb_ereg_replace("\r\n", "\\n", $str);
    $str = mb_ereg_replace("\r", "\\n", $str);
    $str = mb_ereg_replace("\n", "\\n", $str);
    $str = mb_ereg_replace("\t", "\\t", $str);
    $str = mb_ereg_replace("<", "\\x3C", $str); // for inclusion in HTML
    $str = mb_ereg_replace(">", "\\x3E", $str);
    return $str;
}

if you use smarty your code may look like:

<a onclick="alert('{$text|jschars|htmlchars}');return false;">Test</a>

(Yes, we have the shortcur htmlchars instead of htmlspecialchars, so we are able to tell the encoding e.g. UTF-8 or ISO-8859-1 to htmlspecialchars)

php dot net at orakio dot net (10-Apr-2008 07:26)

I was recently exploring some code when I saw this being used to make data safe for "SQL".

This function should not be used to make data SQL safe (although to prevent phishing it is perfectly good).

Here is an example of how NOT to use this function:

<?php
$username
= htmlspecialchars(trim("$_POST[username]"));

$uniqueuser = $realm_db->query("SELECT `login` FROM `accounts` WHERE `login` = '$username'");
?>

(Only other check on $_POST['username'] is to make sure it isn't empty which it is after trim on a white space only name)

The problem here is that it is left to default which allows single quote marks which are used in the sql query. Turning on magic quotes might fix it but you should not rely on magic quotes, in fact you should never use it and fix the code instead. There are also problems with \ not being escaped. Even if magic quotes were used there would be the problem of allowing usernames longer than the limit and having some really weird usernames given they are to be used outside of html, this just provide a front end for registering to another system using mysql. Of course using it on the output wouldn;t cause that problem.

Another way to make something of a fix would be to use ENT_QUOTE or do:

<?php
$uniqueuser
= $realm_db->query('SELECT `login` FROM `accounts` WHERE `login` = "'.$username.'";');
?>

Eitherway none of these solutions are good practice and are not entirely unflawed. This function should simply never be used in such a fashion.

I hope this will prevent newbies using this function incorrectly (as they apparently do).

treyh (09-Apr-2008 07:41)

Here is a function that wraps htmlspecialchars and makes it work for xml.

function xmlspecialchars($text) {
   return str_replace('&#039;', '&apos;', htmlspecialchars($text, ENT_QUOTES));
}

crestfresh at gmail dot com (19-Feb-2008 03:02)

Re ish1301's jsspecialchars() function: use json_encode() instead.

ish1301 at gmail doooot com (20-Nov-2007 10:56)

used this function for making a variable javascript compatible

<?php
function jsspecialchars( $string = '') {
   
$string = preg_replace("/\r*\n/","\\n",$string);
   
$string = preg_replace("/\//","\\\/",$string);
   
$string = preg_replace("/\"/","\\\"",$string);
   
$string = preg_replace("/'/"," ",$string);
    return
$string;
}
?>
hope this may help those embedding php in javascripts

drew at august-harper dot com (23-Aug-2007 05:21)

:// Escapes strings to be included in javascript
:function jsspecialchars($s) {
:    return preg_replace('/([^ :!#$%@()*+,-.\x30-\x5b\x5d-\x7e])/e',
:        "'\\x'.(ord('\\1')<16? '0': '').dechex(ord('\\1'))",$s);
:}

This function DOES NOT produce correct output in PHP5. Any strings containing a ” will be improperly escaped to \x5c, when it should be \x22.

I am not very good with regular expressions, so this is my solution to the problem.
//this is a workaround for jsspecialchars!
function ord2($s) {
if (strlen($s) == 2) {
return ord(substr($s,1,1));
} else {
return ord($s);
}
}
function JS_SpecialChars($s) {
return preg_replace(’/([^ !#$%@()*+,.\x30\x5b\x5d-\x7e])/e’,
”’\\x’.(ord2(’\\1’)&lt;16? ‘0’: ’’).dechex(ord2(’\\1’))”,$s);
}

I am sure that there is a better solution, but I can’t figure one out. This approach will probably also fix any other characters that end up being improperly escaped.

solar-energy (16-Jun-2007 11:21)

also see function "urlencode()", useful for passing text with ampersand and other special chars through url

(i.e. the text is encoded as if sent from form using GET method)

e.g.

<?php
echo "<a href='foo.php?text=".urlencode("foo?&bar!")."'>link</a>";
?>

produces

<a href='foo.php?text=foo%3F%26bar%21'>link</a>

and if the link is followed, the $_GET["text"] in foo.php will contain "foo?&bar!"

galvao at galvao dot eti dot br (19-May-2007 02:19)

There's a tiny error on alex-0 at hotmail dot co dot uk example:

The line:

$new = htmlspecialchars($_POST[message], ENT_QUOTES);

Should be written as:

$new = htmlspecialchars($_POST['message'], ENT_QUOTES);

Regards,

terminatorul at gmail dot com (27-Apr-2007 06:04)

To html-encode Unicode characters that may not be part of your document character set (given in the META tag of your page), and so can not be output directly into your document source, you need to use mb_encode_numericentity(). Pay attention to it's conversion map argument.

frank at codedor dot be (16-Jan-2007 09:25)

If you seem to have a problem with rendering dynamic RSS files from a database - try using htmlspecialchars() or htmlentities() on the text you are rendering.

Since XML and RSS is very strict about what is allowed inside nodes, you need to make sure everything is "A-OK" according to XML standards ...

Especially if the database you're pulling data from is fi. Latin-Swedish encoding, which seems to be the standard setting for MySQL databases.

alex-0 at hotmail dot co dot uk (23-Dec-2006 09:09)

You can also use variables.
This is handy when working with forms to clear out an malicious html

<?php
$new
= htmlspecialchars($_POST[message], ENT_QUOTES);
echo
$new;
?>

richard at mf2fm dot com (03-Mar-2006 09:06)

I had a script which detected swearing and wanted to make sure that words such as 'f &uuml; c k' didn't slip through the system.

After using htmlentities(), the following line converts most extended alphabet characters back to the standard alphabet so you can spot such problems..

$text=eregi_replace("&([a-z])[a-z0-9]{3,};", "\\\\1", $text);

This changes, for example, '&uuml;' into 'u' and '&szlig' into 's'.  Sadly it also converts '&pound;' and '&para;' into 'p' so it's not perfect but does solve a lot of the problems

mikiwoz at yahoo dot co dot uk (06-Oct-2005 10:40)

I am not sure, maybe I'm missing something, but I have found something interesting:
I've been working on a project, where I had to use htmlspecialchars (for opbvious reasons). I olso needed to de-code the encoded string. What I have done was almost a copy and paste from php.net:
$trans=get_html_translation_table(HTML_SPECIALCHARS, ENT_QUOTES);
$trans=array_flip($trans);
$string=strtr($encoded, $trans);
(it looked a bit different in my code, but the idea is clear)
I couldn't get the apostrophe sign de-coded, and I needed it for the <A> tags. After an hour or so of debuging, I decided do print_r($trans). What I got was:
...
[&#39;] => '
...
BUT the apostrophe was encoded to $#039; -> note the zero.
I don't suppose it's a bug, but it definetely IS a potential pitfall, watch out for this one.

Luiz Miguel Axcar (lmaxcar at yahoo dot com dot br) (01-Sep-2005 02:16)

Hello,

If you are getting trouble to SGDB write/read HTML data, try to use this:

<?php

//from html_entity_decode() manual page
function unhtmlentities ($string) {
  
$trans_tbl =get_html_translation_table (HTML_ENTITIES );
  
$trans_tbl =array_flip ($trans_tbl );
   return
strtr ($string ,$trans_tbl );
}

//read from db
$content = stripslashes (htmlspecialchars ($field['content']));

//write to db
$content = unhtmlentities (addslashes (trim ($_POST['content'])));

//make sure result of function get_magic_quotes_gpc () == 0, you can get strange slashes in your content adding slashes twice

//better to do this using addslashes
$content = (! get_magic_quotes_gpc ()) ? addslashes ($content) : $content;

?>

jspalletta at gmail dot com (12-Jul-2005 01:37)

I have found that this regular expression is sufficient for making sure that existing character entities show after htmlspecialchars() replaces _all_ occurrences of & with the &amp; entity.

<?php
// Note: hsc is an abbreviation of htmlspecialchars
function hscFixed($str)
{
    return
preg_replace("/&amp;(#[0-9]+|[a-z]+);/i", "&$1;", htmlspecialchars($str));
}
?>

The only flaw I can think of is if you have text of the vein; "&[word];", that is not meant to be a character but rather uses the ampersand and semicolon in their traditional grammatical denotations.  However I think this is highly unlikely to occur (among other reasons, the fact that anyone with enough grammatical inclination to use them as such probably won't leave out the space between the ampersand and the word).

(25-Jun-2005 04:44)

You can't use htmlspecialchars to create RSS feeds, since it expands ampersands.You need to use something like this:
$content = preg_replace(array('/</', '/>/', '/"/'), array('&lt;', '&gt;', '&quot;'), $content);

palrich at gmail dot com (16-May-2005 09:29)

To Alexander Nofftz and urbanheroes:
It's not an IE problem.  There is no &apos; in HTML.  So it's only a problem if someone else does render this as an apostraphe on an HTML page.

paul dot l at aon dot at (09-May-2005 05:50)

function reverse_htmlentities($mixed)
{
    $htmltable = get_html_translation_table(HTML_ENTITIES);
    foreach($htmltable as $key => $value)
    {
        $mixed = ereg_replace(addslashes($value),$key,$mixed);
    }
    return $mixed;
}

this is my version of a reversed htmlentities function

thisiswherejunkgoes at gmail dot com (06-May-2005 06:06)

If there're any n00bs out there looking for a way to ensure that no html/special chars are getting sent to their databases/put through forms/etc., this has been doing the trick for me (though being at least slightly n00bish, if this won't always work perhaps someone will ammend :-)

function checkforchars ($foo) {

  if ($foo === htmlspecialchars($foo)) {
        return "Valid entry.";
  } else {
        return "Invalid entry.";
  }

}

urbanheroes {at} gmail {dot} com (30-Apr-2005 07:32)

In response to the note made by Alexander Nofftz on October 2004, &#39; is used instead of &apos; because IE unfortunately seems to have trouble with the latter.

gt at realvertex.com (28-Apr-2005 05:55)

Here is the recursive version that works for both arrays and strings. Doesn't look as elegant as the other recursive versions, because of the input checks.

function HTML_ESC($_input = null, $_esc_keys = false)
{
    if ((null != $_input) && (is_array($_input)))
    {
        foreach($_input as $key => $value)
        {
            if($_esc_keys)
            {
                $_return[htmlspecialchars($key)] = HTML_ESC($value,$_esc_keys);
            }
            else
            {
                $_return[$key] = HTML_ESC($value);
            }
        }
        return $_return;
    }
    elseif(null != $_input)
    {
        return htmlspecialchars($_input);
    }
    else
    {
        return null;
    }
}

took (23-Apr-2005 05:14)

The Algo from donwilson at gmail dot com to reverse the action of htmlspecialchars(), edited for germany:

function unhtmlspecialchars( $string )
{
  $string = str_replace ( '&amp;', '&', $string );
  $string = str_replace ( '&#039;', '\'', $string );
  $string = str_replace ( '&quot;', '"', $string );
  $string = str_replace ( '&lt;', '<', $string );
  $string = str_replace ( '&gt;', '>', $string );
  $string = str_replace ( '&uuml;', '', $string );
  $string = str_replace ( '&Uuml;', '', $string );
  $string = str_replace ( '&auml;', '', $string );
  $string = str_replace ( '&Auml;', '', $string );
  $string = str_replace ( '&ouml;', '', $string );
  $string = str_replace ( '&Ouml;', '', $string );   
  return $string;
}

(11-Mar-2005 12:22)

function htmlspecialchars_array($arr = array()) {
   $rs =  array();
   while(list($key,$val) = each($arr)) {
       if(is_array($val)) {
           $rs[$key] = htmlspecialchars_array($val);
       }
       else {
           $rs[$key] = htmlspecialchars($val, ENT_QUOTES);
       }   
   }
   return $rs;
}

beer UNDRSCR nomaed AT hotmail DOT com (01-Feb-2005 10:46)

After inspecting the non-native encoding problem, I noticed that for example, if the encoding is cyrillic, and I write Latin characters that are not part of the encoding ( for example - ae-ligature), the browser will send the real entity, such as &aelig; for this case.
Therefore, the only way I see to display multilingual text that is encoded with entities is by:
<?php
   
echo str_replace('&amp;', '&', htmlspecialchars($txt));
?>

The regex for numeric entities will skip the Latin-1 textual entities.

zolinak at zoli dot szathmari dot hu (14-Dec-2004 12:46)

A sample function, if anybody want to turn html entities (and special characters) back to simple. (eg: "&egrave;", "<" etc)

function html2specialchars($str){
    $trans_table = array_flip(get_html_translation_table(HTML_ENTITIES));
    return strtr($str, $trans_table);
}

beer UNDRSCR nomaed AT hotmail DOT com (21-Oct-2004 09:03)

Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding, some browser (for sure IExplorer) will send the different charset characters using HTML Entities, such as &#1073; for small russian 'b'.
htmlspecialchars() will convert this character to the entity, since it changes all & to &amp;
What I usually do, is either turn &amp; back to & so the correct characters will appear in the output, or I use some regex to replace all entities of characters back to their original entity:
<?php
   
// treat this as pseudo-code, it hasn't been tested...
   
$result = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source);
?>

Alexander Nofftz (20-Oct-2004 12:41)

Why &#39;? The HTML and XML DTDs proposed &apos; for this.
See http://www.w3.org/TR/html/dtds.html#a_dtd_Special_characters

So better use this:

$text = htmlspecialchars($text, ENT_QUOTES);
$text = preg_replace('/&#0*39;/', '&apos;', $text);

mlvanbie at gmail dot com (07-Oct-2004 12:45)

The code in the previous note has a bug.  If the original text was `&gt;' then htmlspecialchars will turn it into `&amp;gt;' and the suggested code will turn that into `>'.  The &amp; translation must be last.

donwilson at gmail dot com (25-Sep-2004 05:58)

To reverse the action of htmlspecialchars(), use this code:

<?php
    unhtmlspecialchars
( $string )
    {
       
$string = str_replace ( '&amp;', '&', $string );
       
$string = str_replace ( '&#039;', '\'', $string );
       
$string = str_replace ( '&quot;', '\"', $string );
       
$string = str_replace ( '&lt;', '<', $string );
       
$string = str_replace ( '&gt;', '>', $string );
       
        return
$string;
    }
?>

thelatesundayshow.com @ nathan (flip it) (02-Sep-2004 07:51)

heres a version of the recursive escape function that takes the array byref rather than byval so saves some resources in case of big arrays

function recurse_array_HTML_safe(&$arr) {
    foreach ($arr as $key => $val)
        if (is_array($val))
            recurse_array_HTML_safe($arr[$key]);
        else
            $arr[$key] = htmlspecialchars($val, ENT_QUOTES);
}

moc.xnoitadnuof@310symerej (21-Apr-2004 12:04)

Here are some usefull functions.
They will apply || decode, htmlspecialchars || htmlentities recursivly to arrays() || to regular $variables. They also protect agains "double encoding".

<?PHP
function htmlspecialchars_or( $mixed, $quote_style = ENT_QUOTES ){
    return
is_array($mixed) ? array_map('htmlspecialchars_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlspecialchars(htmlspecialchars_decode($mixed, $quote_style ),$quote_style);
}

function
htmlspecialchars_decode( $mixed, $quote_style = ENT_QUOTES ) {
    if(
is_array($mixed)){
      return
array_map('htmlspecialchars_decode',$mixed, array_fill(0,count($mixed),$quote_style));
  }
 
$trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
    if(
$trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
       
$trans_table["'"] = '&#039;';
    }
    return (
strtr($mixed, array_flip($trans_table)));
}

function
htmlentities_or($mixed, $quote_style = ENT_QUOTES){
    return
is_array($mixed) ? array_map('htmlentities_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlentities(htmlentities_decode($mixed, $quote_style ),$quote_style);
}

function
htmlentities_decode( $mixed, $quote_style = ENT_QUOTES ) {
  if(
is_array($mixed)){
      return
array_map('htmlentities_decode',$mixed, array_fill(0,count($mixed),$quote_style));
  }
   
$trans_table = get_html_translation_table(HTML_ENTITIES, $quote_style );
    if(
$trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
       
$trans_table["'"] = '&#039;';
    }
    return (
strtr($mixed, array_flip($trans_table)));
}
?>

These functions are an addition to an earlier post. I would like to give the person some credit but I do not know who it was.

<?  ;llnu=u!eJq dHd?>

Dave Duchene (20-Feb-2004 01:58)

Here is a handy function that will escape the contents of a variable, recursing into arrays.

<?php
function escaporize($thing) {
  if (
is_array($thing)) {
   
$escaped = array();
 
    foreach (
$thing as $key => $value) {
     
$escaped[$key] = escaporize($value);
    }
   
    return
$escaped;
  }
 
 
// else
 
return htmlspecialchars($thing);
}
?>

mike-php at emerge2 dot com (20-Nov-2003 10:13)

Here's a handy function that guards against 'double' encoding:

# Given a string, this function first strips out all html special characters, then
# encodes the string, safely returning an encoded string without double-encoding.
function get_htmlspecialchars( $given, $quote_style = ENT_QUOTES ){
   return htmlspecialchars( html_entity_decode( $given, $quote_style ), $quote_style );
}

# Needed for older versions of PHP that do not have this function built-in.
function html_entity_decode( $given_html, $quote_style = ENT_QUOTES ) {
   $trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
   if( $trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
      $trans_table["'"] = '&#039;';
   }
   return ( strtr( $given_html, array_flip( $trans_table ) ) );
}

Note: I set the default to ENT_QUOTES, as this makes more sense to me than the PHP function's default of ENT_COMPAT.

nospam at somewhere dot com (15-Jun-2003 06:28)

most simple function for decoding html-encoded strings:

function htmldecode($encoded) {
    return strtr($encoded,array_flip(get_html_translation_table(HTML_ENTITIES)));
}

dystopia589 at yahoo dot com (13-Mar-2003 03:58)

Sorry, part of that code was unnecessary. Here's a more readable version:

function SpecialChars($Security)
{
if (is_array($Security))
{
while(list($key, $val) = each($Security))
{
$Security[$key] = SpecialChars($val);
}
}
else
{
$Security = htmlspecialchars(stripslashes($Security), ENT_QUOTES);
}
return $Security;
}

webmaster at NOSPAM dot onlinegs dot com (29-Jan-2003 06:51)

for those of you using V 4.3.0+ you can use html_entity_decode() to decode a string encoded with htmlspecialschars(), this should be faster and easier then using a str_replace or ereg.

_____ at luukku dot com (14-Sep-2002 10:21)

People, don't use ereg_replace for the most simple string replacing operations (replacing constant string with another).
Use str_replace.

akira dot yoshi at shrine dot de (16-May-2002 05:15)

If you need to htmlspecialchars a jis string, here's a function that does:

function htmlspecialchars_jis($text) {
    $ret="";
    if ($text=="") return "";
    $esc=chr(27);
    $text=$esc."$B".$esc."$B".$text;
    $text=str_replace($esc."(B", $esc."$B", $text);
    $trans=explode($esc."$B", $text);
    $enc=0;
    while (list (, $val) = each ($trans))  {
        if ($enc==0) {
            $val.="";
            if ($val!="") $ret.=htmlspecialchars($val);
            $enc=1;
        } else {
            $val.="";
            if ($val!="") $ret.=$esc."$B".$val.$esc."(B";
            $enc=0;
        };
    }
    return $ret;
};

BTW: I'm very(!) sure that JIS is iso-2022-jp, not iso-2002-jp

juadielon_NOSPAM at hotmail dot com (01-May-2002 05:09)

I was trying to retrieve information from a database to display it into the browser. However it did not work as I was expecting.  For instance double quotes () and single quotes () were conflicting in HTML in an INPUT selector.

The first approach to solve this was to use htmlspecialchars to convert special characters to HTML entities to display the input box with its value.

$encode=htmlspecialchars($str, ENT_QUOTES);

However, the result was having HTML entities with a \ (backslash) preceding it (escape characters).  For instance ampersand (&) becomes \&amp; displaying \& and double quotes becomes \&quot; displaying \

So the final solution was to replace first any \ (backslash) and then ask htmlspecialchars to make the conversion.

[Editor's Note: This is the wrong way to do this. The proper way is to use

$encoded = htmlspecialchars(stripslashes($str), ENT_QUOTES);
]

$encoded=htmlspecialchars(str_replace('\\', '', $str), ENT_QUOTES);

Try this example to see it your self.

<form action="<?php echo $PHP_SELF; ?>">
<input type="text" name="str" size="20" value="">
<input type="submit" value="Submit">
<br>
<?php
 
if (!empty($str)) {
   
$encoded=htmlspecialchars(str_replace('\\', '', $str), ENT_QUOTES);
    echo
"<br><p>Result: <b>".$encoded."</b>. It should be the same you just typed</p>";
    echo
"<p>But source code is transformed to:<b><xmp>".$encoded."</xmp></b></p>";
   
// I know, I know <xmp> is deprecated in HTML 4 but was easy to use this time to display result.
 
}
?>
</form>

Hope this will helps someone.

akira at kurogane dot net (01-Apr-2002 05:42)

Beware of parsing JIS (aka 'iso-2002-jp') text through this function, as this function does not appear to have a sense for multibyte characters and may corrupt some characters. Eg. the japanese comma (the two ascii characters !" as viewed by an ascii client) gets transferred into !&quot; , which transforms the comma into a 'maru' mark and the following characters into garbage.
Conceivably this could affect other multibyte charsets.

joseph at nextique dot com (20-Feb-2002 09:21)

Here is a handy function to htmlalize an array (or scalar) before you hand it off to xml.

function htmlspecialchars_array($arr = array()) {
    $rs =  array();
    while(list($key,$val) = each($arr)) {
        if(is_array($val)) {
            $rs[$key] = htmlspecialchars_array($val);
        }
        else {
            $rs[$key] = htmlspecialchars($val, ENT_QUOTES);
        }   
    }
    return $rs;
}

(15-Jul-2001 07:18)

If your sending data from one form to another, the data in the textareas and text inputs may need to have htmlspecialchars("form data", ENT_QUOTES) applied, assuming you will ever have quotes or less-than signs or any of those special characters.  Using htmlspecialchars will make the text show up properly in the second form.  The changes are automatically undone whenever the form data is submitted. It does seem a little strange, but it works and my headache is now starting to go away.

AZ

ryan at ryano dot net (29-Jun-2001 11:06)

Actually, if you're using >= 4.0.5, this should theoretically be quicker (less overhead anyway):

$text = str_replace(array("&gt;", "&lt;", "&quot;", "&amp;"), array(">", "<", "\"", "&"), $text);

thorax at inforocket dot com (09-Dec-1999 01:26)

to convert a document back from this,
do string replacements in this order:

>   >
<   <
" "
&  &

Doing the last phase first will
reveal erroneous results.. For example:

'<'  => specialchars() => '&lt;' '&lt;' => convert ampersands => '<' => convert everything else => '<'