I think the best and most common method is to use regular expressions. Using regular expressions allows you to match a wider variety of patterns. If you only use str_replace, then if the user makes a mistake in the BBC or the BBC contains something you weren't expecting, you could get a lot of errors.
For example, here is some BBC for URL using regular expressions:
CODE
<?php
$input = "[url]http://www.handyphp.com[/url]<br />\n[url=http://www.handyphp.com]Handy PHP[/url]";
$pattern = array(
'@(\[url=)([^\]]*?)(\])(.*?)(\[/url\])@si', // This matches [url=http://www.domain.com]Domain[/url]
'@(\[url)([^\]]*?)(\])(.*?)(\[/url\])@si' // This matches [url]http://www.domain.com[/url]
);
$replace = array(
'<a href="${2}">${4}</a>',
'<a href="${4}">${4}</a>'
);
$output = preg_replace($pattern, $replace, $input);
echo $input . "\n<hr />\n" . $output;
?>
See, instead of replacing a part of the BBC, the entire tag is replaced and selected parts of the tag is reinserted into the new string. The first pattern is actually composed of 5 sub-patterns:
(\[url=) - First, the string must start with "[ur..."
([^\]]*?) - Second, match everything after that up to but not including "]" - This is the "http..."
(\]) - Third, find the end bracket for the opening tag.
(.*?) - Fourth, Match everything here until the next sub-pattern. - This is the "Handy PHP"
(\[/url\]) - Fifth, find the closing tag for the BBC.
Now, the replacements include back references to parts of the original string. For example, ${2} means use the second sub-pattern match from the original string which is
http://www.handyphp.com.
Since you are matching a full string instead of pieces and parts of a string, you can better control how the output will be formated. While this pattern doesn't tackle the issue of single, double, or no quotes being used by the user, it could be easily modified to do so.
See this new version that looks for single and double quotes:
CODE
<?php
$input = "[url]http://www.handyphp.com[/url]<br />\n
[url=\"http://www.handyphp.com\"]Handy PHP[/url]<br />\n
[url='http://www.handyphp.com']Handy PHP[/url]<br />\n
[url=http://www.handyphp.com]Handy PHP[/url]";
$pattern = array(
'@(\[url=)(\'|")*([^\]]*?)(\'|")*(\])(.*?)(\[/url\])@si',
'@(\[url)([^\]]*?)(\])(.*?)(\[/url\])@si'
);
$replace = array(
'<a href="${3}">${6}</a>',
'<a href="${4}">${4}</a>'
);
$output = preg_replace($pattern, $replace, $input);
echo $input . "\n<hr />\n" . $output;
?>
You may have noticed that I use arrays for both pattern and replace.
preg_replace will cycle through each array item in $pattern and replace it with the corresponding item from $replace. You should also see that I have 2 different patterns and 2 different matches. This is because of the 2 different methods that the URL BBC can usually be implemented. The first should always be the more specific pattern followed by the more general. Since the link with a a name is a more complex string, the pattern for it has to be more specific.
Due to the complexity of regular expressions, many newer programmers have a lot of trouble figuring out how to use them. In fact, I still learn new things every time I try to use regular expressions. I depend a lot on trial and error. The above examples are not quite as optimized as they could be but these are the easiest to understand examples I could come up with. For example, since you only need 2 back references in this example, it isn't really necessary to have everything broken down into sub-patterns. Only that which you want to back reference needs to be sub-patterned.
I recommend that you do more research into regular_expression. Here is a good place to learn:
http://www.ilovejackdaniels.com/cheat-shee...ns-cheat-sheet/In general, that is a fantastic website for web developers. I printed a number of the full color cheat sheets onto glossy double sided photo paper.
As for checking for injections, you can reject urls the end with ".exe" if you want. It just requires you to adjust the regular expression to "NOT" match the BBC if it contains a link with .exe at the end. Or you could replace offensive links with some other string which is easier and will cover all links in the input and not just the once in BBC.
I hope this helps

vujsa
Reply