My/PHP

preg_match 정리

뭔일이여 2009. 3. 2. 13:57

preg_match의 pattern인자에서 modifier(구분자?)별 의미

i : 대소문자 구분안함
u : utf-8(자세한 사항은 확인 중)

utf-8에서 모든문자를 각각의 문자별로 자르기

예제)
<?php
$str = '한글 english どをウィ中國＃＆＊§※☆★';
preg_match_all('/./u', $str, $match);
echo implode(',', $match[1]);
?>

결과값)
한,글, ,e,n,g,l,i,s,h, ,ど,を,ウ,ィ, ,中,國, ,＃,＆,＊,§,※,☆,★

Pattern Modifiers

The current possible PCRE modifiers are listed below. The names in parentheses refer to internal PCRE names for these modifiers. Spaces and newlines are ignored in modifiers, other characters cause error.

i (PCRE_CASELESS)

If this modifier is set, letters in the pattern match both upper and lower case letters.

m (PCRE_MULTILINE)

By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

x (PCRE_EXTENDED)

If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl's /x modifier, and makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

e (PREG_REPLACE_EVAL)

If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes and NULL chars will be escaped by backslashes in substituted backreferences.

Only preg_replace() uses this modifier; it is ignored by other PCRE functions.

A (PCRE_ANCHORED)

If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.

D (PCRE_DOLLAR_ENDONLY)

If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.

When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.

U (PCRE_UNGREEDY)

This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) modifier setting within the pattern or by a question mark behind a quantifier (e.g. .*?).

X (PCRE_EXTRA)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. There are at present no other features controlled by this modifier.

J (PCRE_INFO_JCHANGED)

The (?J) internal option setting changes the local PCRE_DUPNAMES option. Allow duplicate names for subpatterns.

u (PCRE_UTF8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

preg_match

(PHP 4, PHP 5)

preg_match -- 정규표현식 매치를 수행합니다.

설명

int preg_match ( string $pattern, string $subject [, array $matches [, int $flags [, int $offset]]] )

pattern에 주어진 정규표현식을 subject에서 찾습니다.

matches가 주어지면, 검색 결과를 채워넣습니다. $matches[0]는 전체 패턴 텍스트가 들어가고, $matches[1]부터 괄호로 둘러싸인 서브 패턴을 채워넣습니다.

flags는 다음과 같은 플래그를 사용할 수 있습니다:

PREG_OFFSET_CAPTURE: 이 플래그를 넘기면, 모든 매치에 대한 문자열 시작 위치를 함께 반환합니다. 반환값을 0에 매치한 문자열을 가지고, 1에 문자열 시작 위치를 가지는 배열을 원소로 갖는 배열로 변경하는 점에 주의하십시오. 이 플래그는 PHP 4.3.0부터 사용할 수 있습니다.

flags 인자는 PHP 4.3.0부터 사용할 수 있습니다.

보통, 검색은 목표 문자열의 처음에서 시작합니다. 선택적인 인자 offset으로 검색을 시작할 다른 위치를 지정할 수 있습니다. 이는 preg_match()의 목표 문자열에 substr()($subject, $offset)을 넘기는 것과 동일합니다. offset 인자는 PHP 4.3.3부터 사용할 수 있습니다.

preg_match()는 pattern이 매치된 횟수를 반환합니다. 이는 0(매치 없음)이나 1입니다. preg_match()는 처음 매치 후에 검색을 중지하기 때문입니다. 대조적으로, preg_match_all()는 subject의 끝까지 계속해서 실행합니다. 에러가 발생하면, preg_match()는 FALSE를 반환합니다.

작은 정보

단순히 하나의 문자열이 다른 문자열에 들어있는지를 확인하고 싶을때는 preg_match()를 사용하지 마십시오. 대신, strpos()나 strstr()를 사용하는 편이 더욱 빠릅니다.

예 1620. 문자열 "php" 찾기

<?php

// 패턴 구분자 뒤의 "i"는 대소문자를 구별하지 않게 합니다.

if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {

    echo "발견하였습니다.";

} else {

    echo "발견하지 못했습니다.";

}

?>

예 1621. 단어 "Web" 찾기

<?php

/* 패턴에서 \b는 단어를 지시합니다. 단어 "web"만 매치하고,

 * "webbing"이나 "cobweb" 등의 부분적인 경우에는 매치하지 않습니다. */

if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {

    echo "발견하였습니다.";

} else {

    echo "발견하지 못했습니다.";

}



if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {

    echo "발견하였습니다.";

} else {

    echo "발견하지 못했습니다.";

}

?>

예 1622. URL에서 도메인 이름 얻기

<?php

// URL에서 호스트 이름 얻기

preg_match("/^(http:\/\/)?([^\/]+)/i",

    "http://www.php.net/index.html", $matches);

$host = $matches[2];



// 호스트 이름에서 마지막 두 세그멘트 얻기

preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);

echo "도메인 이름은: {$matches[0]}\n";

?>

이 예제의 결과:

도메인 이름은: php.ne