writing perl script to extract html....?

Jemi Sulivan · Jan 28, 2009

Hello,

I would like to get your help regarding html source code extraction via Perl. Anybody could help me extract an html code without the tags? Example, I would like to insert the table content into a neatly done table. I am done with very brief answers. I will be quite thankful to get a detailed answer. Please notice that:
- I have quite good information in Perl scripting and programming as a whole.
- I have good information as well regarding html file content.

I hope only experts answer this for me.

Thanks a lot

Bob J · Jan 28, 2009

This a piece of one of my scripts. it will get the pageurl and print it out. IN the foreach loop you would replace the print with whatever you want to do with the web page contents.

#setup of script
use LWP::UserAgent;
use strict;
use warnings;

my $pageURL; #url for
my $ua = new LWP::UserAgent;
$ua->proxy(['http'], 'http://my.proxy.com/'); # set proxy

$pageURL="http://url.com";

my $req = new HTTP::Request GET => $pageURL;
my $res = $ua->request($req);
my @contents;

if ($res->is_success)
{
(@contents) = split(/\n/,$res->content);
}
else
{
die "Could not get content";
}

foreach my $line (@contents){
print "$line\n";
}

writing perl script to extract html....?

Jemi Sulivan

Guest

Bob J

Guest