logo       

Re: PHP Regular expression help: msg#00016

php.tcphp

Subject: Re: PHP Regular expression help

I've seen and used weirder and more obfuscated, but nothing more
complex than this one. I worked it out by hand using the RFCs. It is
probably not complete, but was as close as I could stand to get it.

// returns false if URL contains spaces or illegal characters.
// ignores the http:// (programming should strip or require)
// second parameter is whatever you want to return if $val is empty
function djp_is_valid_url($val,$allow_empty=0)
{
if ($val)
{
$http='http(s)?://'; // http(s) is optional.

// usernames and passwords, theoretically can be transmitted
thru the url,
// as username:password-9IKiO1iGCm/QT0dZR+AlfA@xxxxxxxxxxxxxxxx, etc.
$userpass='[a-z0-9]{1,}:[a-z0-9]{1,}@';

// domain is alphanumeric with hyphens,
// any number of iterations, separated by periods.
// followed by at least two alphanumeric with hyphens,
// followed by a period, then 2-6 alphas (.com, .edu, .tv,
.net, .museum, et al)
$domain='((([0-9a-z-]*\.)*)?[0-9a-z-]{2,})+\.[a-z]{2,6}';

// an ip address is 4 numerics delimited by periods.
$ip='([0-9]{1,3}.){3}([0-9]{1,3}){1}';

// and very optional port #,
$port=':[0-9]{1,5}';

// with an optional following slash.
$slash='/?';

// path must start with a slash, followed by optional tilde,
// then alphanumeric with underscores, slashes and periods.
$path='/(~?[0-9a-z/\._-])*';

// "GET" strings. There's an easier way BEGGING to come out of this....
$getparams='('.
// the first get string starts with ?,
// followed by Alphanumeric,'=',optional Alphanumeric or
%(2hex characters)
'(\?[_0-9a-z]{1,}=(([0-9a-z+/_*.-]*)|(%[a-f0-9]{2}))*)*'.
// the second and subsequent get parameters start with &,
// followed by Alphanumeric,'=',optional Alphanumeric or
%(2hex characters)
'(\&[_0-9a-z]{1,}=(([0-9a-z+/_*.-]*)|(%[a-f0-9]{2}))*)*'.
// allow zero or one of the preceding.
')?';

$fragment='(#([0-9a-z_.,]*))?';
// assemble the parts & check.

$validurl="^(($http)?(($userpass)?($domain)|($ip))($port)?){1}$slash($path$getparams$fragment)?$";
// for extra credit, uncomment the next line and read the output
// print("The regular expression that approximates a valid URL is:\n$validurl");
if (eregi($validurl,$val))
{
return 1;
}
else
{
return 0;
}
}
else return $allow_empty;
}


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise