How to read in an HTML file in python?

Shawn

New member
I am writing a program and i am trying to read in an HTML file and then place all of the tags in the file into a list, while ignoring all the txt in between tags. Any ideas on how to go about doing this in python?
i am reading the file from a local filesystem
 
I'd advise you to learn Regular Expressions.

Opening the file and loading it into an object is simple, from then on Regex will do most of the work for you if you care to learn it.

Don't ignore this, Python coders (and not only on Python, but many other languages) should know Regex.
 
import urllib
#Get a file-like object for the Python Web site's home page.
f = urllib.urlopen("http://www.python.org")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close()
 
import urllib
#Get a file-like object for the Python Web site's home page.
f = urllib.urlopen("http://www.python.org")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close()
 


Write your reply...
Back
Top