How do you write an ASP code using Regular Expressions to isolate the links from a...

  • Thread starter Thread starter Bucco Bruce
  • Start date Start date
B

Bucco Bruce

Guest
...webpage? I know how to use the FileSystemObject to get the HTML from another page on my site. I want to create an ASP page that will automatically give me a list of all the links on the page, and nothing more... just the URL's of the links. I know this is possible using Regular Expressions, but I'm not experienced enough to do it. Anyone willing to help out and earn ten points? Best answer to the first person who give me a code that works. Thanks in advance.
I found a way to do it using REPLACE and SPLIT, though I'm sure using a Regular Expression would be more efficient. I just didn't feel like sitting on my thumbs waiting for a reply. I will still offer best answer to the first reply with a code that works.
17 hours later, and still no answer. Is there no one on Answers that knows how to do this? I'm surprised, really.
 
Set regEx = New RegExp
regEx.Global = True
sourcestring = "your source string"
regEx.Pattern = "(?:<a[^>]*href="")([^""]*)(?:""[^>]*>.*?<\/a>)"
Set Matches = regEx.Execute(sourcestring)
For z = 0 to Matches.Count-1
results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
For zz = 0 to Matches(z).SubMatches.Count-1
results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode( Matches(z).SubMatches(zz)) & chr(34) & chr(13)
next
results=Left(results,Len(results)-1) & chr(13)
next
Response.Write "<pre>" & results

Please note the above pattern may be cut short due to the way Yahoo Answers affects code. Try this pattern, which I've split into parts and concatenated!!:
regEx.Pattern = "(?:<a[^>]*href="")" & "([^""]*)(?:""[^>]" & "*>.*?<\/a>)"
 
Back
Top