Results 1 to 8 of 8

Thread: URL Parsing [PHP]

  1. #1
    Ah, Mrs. Peel! mike_w's Avatar
    Join Date
    Oct 2003
    Location
    Hertfordshire, England
    Posts
    3,326
    Thanks
    3
    Thanked
    9 times in 7 posts

    URL Parsing [PHP]

    I trying to write a little something in PHP, and seem to have done alright thus far, but have gotten a little stuck. All it has to do is detect when there is a hyperlink in a string, and replace the link with the relevant markup. e.g. replace

    Code:
    Just visit http://www.example.com to find out more
    with

    Code:
    Just visit <a href="http://www.example.com">http://www.example.com</a> to find out more
    The best I can come up with is something like this:

    Code:
    $text = preg_replace("/http:\/\/([$string]+)/",'<a href="http://$1">http://$1</a>',$text);
    Where $string is a string of all of the valid characters for a URL. So, the questions are:

    1) What are the valid characters for a URL i.e. what should $string be?
    2) Is there a better way of doing this?
    3) Are there any examples of this on the net?

    I did try searching for this, but all I could find was a irrelevant PHP function.

    Any help is appreciated.

    Mike.
    "Well, there was your Uncle Tiberius who died wrapped in cabbage leaves but we assumed that was a freak accident."

  2. #2
    HEXUS.net Webmaster
    Join Date
    Jul 2003
    Location
    UK
    Posts
    3,108
    Thanks
    1
    Thanked
    0 times in 0 posts
    $string = '/(ftp|http):\/\/([_a-z\d\-]+(\.[_a-z\d\-]+)+)(([_a-z\d\-\\\.\/]+[_a-z\d\-\\\/])+)*/ '

    $text = preg_replace($string,'<a href="$1">$1</a>',$text);

  3. #3
    Ah, Mrs. Peel! mike_w's Avatar
    Join Date
    Oct 2003
    Location
    Hertfordshire, England
    Posts
    3,326
    Thanks
    3
    Thanked
    9 times in 7 posts
    Sorry, that just gives me an error message:

    Warning: preg_replace(): Unknown modifier ']' in test.php on line 117
    [Line of $text = preg_replace...]

    Removing the \\ towards the end of $string stops the error message, and rejiggling it around seems to produce the right output, using this:

    $string = '/(ftp|http):\/\/([_a-z\d\-]+(\.[_a-z\d\-]+)+)(([_a-z\d\-\\\.\/]+[_a-z\d\-\/])+)*/ ';
    $text = preg_replace($string,'<a href="$1://$2$4">$1://$2$4</a>',$text);

    So, is that what I want? Or will that not correctly translate all URLs correctly?

    Thanks for the help so far.
    "Well, there was your Uncle Tiberius who died wrapped in cabbage leaves but we assumed that was a freak accident."

  4. #4
    Comfortably Numb directhex's Avatar
    Join Date
    Jul 2003
    Location
    /dev/urandom
    Posts
    17,074
    Thanks
    228
    Thanked
    1,027 times in 678 posts
    • directhex's system
      • Motherboard:
      • Asus ROG Strix B550-I Gaming
      • CPU:
      • Ryzen 5900x
      • Memory:
      • 64GB G.Skill Trident Z RGB
      • Storage:
      • 2TB Seagate Firecuda 520
      • Graphics card(s):
      • EVGA GeForce RTX 3080 XC3 Ultra
      • PSU:
      • EVGA SuperNOVA 850W G3
      • Case:
      • NZXT H210i
      • Operating System:
      • Ubuntu 20.04, Windows 10
      • Monitor(s):
      • LG 34GN850
      • Internet:
      • FIOS
    my brain is telling me there are security dangers to what you're doing

    i'm not sure of the specifics, but it's often got some basis in reality

  5. #5
    HEXUS.net Webmaster
    Join Date
    Jul 2003
    Location
    UK
    Posts
    3,108
    Thanks
    1
    Thanked
    0 times in 0 posts
    Sorry was in a rush when I posted, you do need $1, $2 etc as the regexp is broken down into components. Not sure why you got that error on my regexp string as it's the one I use. Try wrapping it in double quotes instead of single and see if that makes a difference

    Not sure why hexeh thinks there's security dangers, the danger would be in not attempting to parse it

  6. #6
    Ah, Mrs. Peel! mike_w's Avatar
    Join Date
    Oct 2003
    Location
    Hertfordshire, England
    Posts
    3,326
    Thanks
    3
    Thanked
    9 times in 7 posts
    Hmm, double quotes doesn't make a difference. I also discovered that removing just one of the backslashes at the end (rather than two) also stops the error message. That's on PHP 4.4.2.

    Just to check I understand how this is working:

    (ftp|http):\/\/ - pretty obvious
    ([_a-z\d\-]+(\.[_a-z\d\-]+)+) - gives the subdomain(s) and domain - only underscores, letters, hyphens - what's the \d? Any digit?
    (([_a-z\d\-\\\.\/]+[_a-z\d\-\\\/])+) - gives everything after the domain - allowing underscores, letters, digits (?), hyphens, backslashes, dots, and forward slashes. I presume the last bit is added minus the dot so that full stops aren't included when links are posted at the end of sentence.

    Thanks again for the help!
    "Well, there was your Uncle Tiberius who died wrapped in cabbage leaves but we assumed that was a freak accident."

  7. #7
    Comfortably Numb directhex's Avatar
    Join Date
    Jul 2003
    Location
    /dev/urandom
    Posts
    17,074
    Thanks
    228
    Thanked
    1,027 times in 678 posts
    • directhex's system
      • Motherboard:
      • Asus ROG Strix B550-I Gaming
      • CPU:
      • Ryzen 5900x
      • Memory:
      • 64GB G.Skill Trident Z RGB
      • Storage:
      • 2TB Seagate Firecuda 520
      • Graphics card(s):
      • EVGA GeForce RTX 3080 XC3 Ultra
      • PSU:
      • EVGA SuperNOVA 850W G3
      • Case:
      • NZXT H210i
      • Operating System:
      • Ubuntu 20.04, Windows 10
      • Monitor(s):
      • LG 34GN850
      • Internet:
      • FIOS
    Quote Originally Posted by mike_w View Post
    Hmm, double quotes doesn't make a difference. I also discovered that removing just one of the backslashes at the end (rather than two) also stops the error message. That's on PHP 4.4.2.

    Just to check I understand how this is working:

    (ftp|http):\/\/ - pretty obvious
    ([_a-z\d\-]+(\.[_a-z\d\-]+)+) - gives the subdomain(s) and domain - only underscores, letters, hyphens - what's the \d? Any digit?
    (([_a-z\d\-\\\.\/]+[_a-z\d\-\\\/])+) - gives everything after the domain - allowing underscores, letters, digits (?), hyphens, backslashes, dots, and forward slashes. I presume the last bit is added minus the dot so that full stops aren't included when links are posted at the end of sentence.

    Thanks again for the help!
    you'll be wanting percent signs too, to allow urls like http://en.wikipedia.org/wiki/Maratho...ambiguation%29

  8. #8
    Ah, Mrs. Peel! mike_w's Avatar
    Join Date
    Oct 2003
    Location
    Hertfordshire, England
    Posts
    3,326
    Thanks
    3
    Thanked
    9 times in 7 posts
    Quote Originally Posted by directhex View Post
    you'll be wanting percent signs too, to allow urls like http://en.wikipedia.org/wiki/Maratho...ambiguation%29
    Right you are - that reminds me that there are question marks and equal signs as well. So far, I have this:

    $string = "/(ftp|http|https):\/\/([_A-Za-z\d\-]+(\.[_A-Za-z\d\-]+)+)(([_A-Za-z\d\-\\\.\/\%\?\=]*[\\_A-Za-z\d\-\/\%\?\=])+)*/ ";

    That's as before with a few extra characters added, and change a + to a * (so that URLs such as http://www.example.com/ are parsed fully).
    "Well, there was your Uncle Tiberius who died wrapped in cabbage leaves but we assumed that was a freak accident."

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Porsche 911 URL for sale!
    By Zak33 in forum Automotive
    Replies: 4
    Last Post: 15-09-2005, 01:02 PM
  2. RSS Parsing and PHP Include
    By DR in forum Software
    Replies: 4
    Last Post: 04-07-2005, 09:15 PM
  3. BT BroadBand Users - URL Check request please
    By ikonia in forum Networking and Broadband
    Replies: 7
    Last Post: 28-03-2005, 02:30 PM
  4. Smiley Parsing
    By eldren in forum HEXUS Suggestions
    Replies: 7
    Last Post: 02-05-2004, 09:38 PM
  5. New Gallery URL
    By Elmo in forum General Discussion
    Replies: 12
    Last Post: 29-01-2004, 11:20 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •