Multibyte String 函数
在线手册:中文 英文
PHP手册

mb_split

(PHP 4 >= 4.2.0, PHP 5)

mb_splitSplit multibyte string using regular expression

说明

array mb_split ( string $pattern , string $string [, int $limit = -1 ] )

Split a multibyte string using regular expression pattern and returns the result as an array.

参数

pattern

The regular expression pattern.

string

The string being split.

limit
If optional parameter limit is specified, it will be split in limit elements as maximum.

返回值

The result as an array.

注释

Note:

mb_regex_encoding() 指定的内部编码或字符编码将会当作此函数用的字符编码。

参见


Multibyte String 函数
在线手册:中文 英文
PHP手册
PHP手册 - N: Split multibyte string using regular expression

用户评论:

gunkan at terra dot es (06-Apr-2012 02:17)

To split an string like this: "日、に、本、ほん、語、ご" using the "、" delimiter i used:

     $v = mb_split('、',"日、に、本、ほん、語、ご");

but didn't work.

The solution was to set this before:

       mb_regex_encoding('UTF-8');
      mb_internal_encoding("UTF-8");
     $v = mb_split('、',"日、に、本、ほん、語、ご");

and now it's working:

Array
(
    [0] => 日
    [1] => に
    [2] => 本
    [3] => ほん
    [4] => 語
    [5] => ご
)

boukeversteegh at gmail dot com (14-Apr-2011 09:23)

The $pattern argument doesn't use /pattern/ delimiters, unlike other regex functions such as preg_match.

<?php
  
# Works. No slashes around the /pattern/
  
print_r( mb_split("\s", "hello world") );
   Array (
      [
0] => hello
     
[1] => world
  
)

  
# Doesn't work:
  
print_r( mb_split("/\s/", "hello world") );
   Array (
      [
0] => hello world
  
)
?>

boukeversteegh at gmail dot com (10-Sep-2010 05:43)

In addition to Sezer Yalcin's tip.

This function splits a multibyte string into an array of characters. Comparable to str_split().

<?php
function mb_str_split( $string ) {
   
# Split at all position not after the start: ^
    # and not before the end: $
   
return preg_split('/(?<!^)(?!$)/u', $string );
}

$string   = '火车票';
$charlist = mb_str_split( $string );

print_r( $charlist );
?>

# Prints:
Array
(
    [0] => 火
    [1] => 车
    [2] => 票
)

qdb at kukmara dot ru (25-Mar-2010 10:46)

an other way to str_split multibyte string:
<?php
$s
='???????';

//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for(
$i=0;$i<$temp_a_len;$i++){
   
//$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
   
$temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}

echo(
'<pre>');
print_r($temp_a);
echo(
'</pre>');

//also possible to directly use UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo(
$temp_s);
?>

gert dot matern at web dot de (03-Aug-2009 11:34)

We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string:

'Wei?e Rosen sind nicht grün!'

And because I didn't find a regex to simulate a str_split I optimized the first solution from adjwilli a bit:

<?php
$string
= 'Wei?e Rosen sind nicht grün!'
$stop   = mb_strlen( $string);
$result = array();

for(
$idx = 0; $idx < $stop; $idx++)
{
  
$result[] = mb_substr( $string, $idx, 1);
}
?>

Here is an example with adjwilli's function:

<?php
mb_internal_encoding
( 'UTF-8');
mb_regex_encoding( 'UTF-8'); 

function
mbStringToArray
( $string
)
{
 
$stop   = mb_strlen( $string);
 
$result = array();

  for(
$idx = 0; $idx < $stop; $idx++)
  {
    
$result[] = mb_substr( $string, $idx, 1);
  }

  return
$result;
}

echo
'<pre>', PHP_EOL,
print_r( mbStringToArray( 'Wei?e Rosen sind nicht grün!', true)), PHP_EOL,
'</pre>';
?>

Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.

Sezer Yalcin (19-Feb-2009 01:13)

To split by mb letters, use preg_split with /u modifier instead of calling mb functions thousand times.

adjwilli at yahoo dot com (26-Dec-2007 05:37)

I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method.

<?php
function mbStringToArray ($string) {
   
$strlen = mb_strlen($string);
    while (
$strlen) {
       
$array[] = mb_substr($string,0,1,"UTF-8");
       
$string = mb_substr($string,1,$strlen,"UTF-8");
       
$strlen = mb_strlen($string);
    }
    return
$array;
}
?>