PHP:HTML ENTITIES(&#xxxxx;)转UTF-8

一些诸如 ? 的代码叫做HTML ENTITIES,在含UTF-8文本的转换过程中必须要考虑到的,在PHP里面至少有两种简便的方法将其转为正常文字:

// 对于大量文字夹杂HTML-ENTITIES较好,只将HTML-ENTITIES转为UTF-8
// http://php.net/manual/en/function.html-entity-decode.php

$content=html_entity_decode($content,ENT_COMPAT,'UTF-8');

// 将逐一从HTML-ENTITIES转换到UTF-8,正常字符可能会乱码,
// 最好配合正则表达式针对HTML-ENTITIES使用:
// http://php.net/manual/en/function.mb-convert-encoding.php

$content=mb_convert_encoding($content,'UTF-8','HTML-ENTITIES');

//使用正则表达式,使用兼容性高一点的create_function:
$content=preg_replace_callback('/&[^;]+;/',create_function('$matches','return mb_convert_encoding($matches[0],"UTF-8","HTML-ENTITIES");'),$content);

No related posts.

Posted by on 2012-01-06.
Categories & Tags: PHP