Well, it may not be the most efficient method but I did write the following for Perl Compatable:
CODE
<?php
$string = "http://www.domain.com/file1.php?var1=value1&var2=value2";
$search = '|(\.php)(.*)|i';
$replace = '${1}';
echo preg_replace( $search, $replace, $string );
?>
There may be a better way to do this but this is how I usually do it.
Here is the same code for POSIX Extended:
CODE
<?php
$string = "http://www.domain.com/file1.php?var1=value1&var2=value2";
$search = '(\.php)(.*)';
$replace = '\\1';
echo ereg_replace( $search, $replace, $string );
?>
The "p" in preg_replace is for Perl while the "e" in ereg_replace is for POSIX Extended.
I'll attempt to expalin each individually,
preg_replace()
This function requires 3 parameters; a search string, a replace string, and a base string.
As a result , I named my variables, $search, $replace, and $string.
Of course, you do not have to use variables, you can direct insert the strings in the function.
I find it easier to explain and use the function with the variables and I'll explain them now.
The $search variable contains the regular expression string. This will vary greatly depending on what you want to find, how you want to use it later, and how simple the base string is.
First, use single quotes, double quotes have their use but not here!
The pipe character '|' is used a delimiter, it tell the function when the regular expression starts and stops since additional information is contained in the string as well.
The next thing I do is use an opening parenthesis '(' which specifies the beginning of a group. It isn't essntial in this example, but is useful later.
now, the period in regular expressions has it's own use so to match an actual period, you have to escape it with a backslash '\' which is universally used in programming to escape a character. so '\.php' means match '.php' followed by a closing parenthesis ')' to end the group.
The next thing is to match pretty much everything else! (.*) means just that take everything and put it ina a group. the period as suggested has it's own use and here it says, any character and the asterisks '*' means zero or more matches of that character. It will keep matching every character until it matches the next regex pattern in the search string. In this case, there isn't another regex so it will finish off the entire base string matching every character.
Then we finish the regex portion of the string and follow it with the ending delimiter (pipe). After the ending delimiter, there is an 'i'. This means that the regex is case insensitive! To make it case sensitive, leave the 'i' out.
So,
'|(\.php)(.*)|i' means match the groups of characters that matches '.php' followed by the group of any characters till the end of the string without consideration of case.
The replace string is simple enought to explain, ${1} simple means use whatever the first group that was matched as the replacement. In this case, it means '.php'.
The same can be accomplished with '\\1'. This is called a backe reference. It is most handy when you want to simple extract something from a string instead of replace it. It is also how you turn query string url's into search engine friendly url's. Another use is for referal links. If a person visits your website from a search engine, you can extract the exact search term they used to find you!
ereg_replace()
This function is exactly the same except it doesn't require a delimiter since there are no further parameters used in this function. It also will not allow the replace parameter to use the '${1}' of back reference. Only the '\\1' method is allowed!
The PHP manual recommends using the Perl compatable functions since they are more efficient and offer more flexibility.
I'm not sure that I have expalined everything well enough but I'll try to clear some things up here:
The delimiter in preg_replace can be nearly any character as long as the character isn't used in the search pattern. (|,@,#,%, {},[],<>,/)
The reason you might use the back reference is because you might match multiple patterns like so:
'#(\.php|\.asp)(.*)#i' Will strip out the query data from PHP and ASP links. Notice that I changed the delimiters to pound '#' instead of pipe since the pipe is used in the regex to denote 'or'. Now if the link is to an ASP file, with the back reference, the file extension will be correct in the output!
So:
www.domain.com/file1.php?var1=value1&var2=value2 will become
www.domain.com/file1.phpwww.domain.com/file2.asp?var1=value1&var2=value2 will become
www.domain.com/file2.aspNow, just to leave you with one last thought, here is the most simplified version of the script and means replace everything from the question mark '?' to the end with nothing:
CODE
<?php
$string = "http://www.domain.com/file1.asp?var1=value1&var2=value2";
$search = '#\?.*#';
$replace = '';
echo preg_replace( $search, $replace, $string );
?>
Since it already looks for any character, no modifier is needed after the delimiter. Also, since the match begins after the file extention, it will be left alone.
Here is the same for POSIX Extention:
CODE
<?php
$string = "http://www.domain.com/file1.asp?var1=value1&var2=value2";
$search = '\?.*';
$replace = '';
echo ereg_replace( $search, $replace, $string );
?>
I thought I should provide you with a more complex methed first to show you what was going on! The more options (complex) the regex, the more powerful it will be but also require more resources to execute. It all depends on your needs.
There are a few regex topics on the forums that will help.
Hope this helps.

vujsa
Reply