Reading content from a web page with urllib
The first step of the project would be to find a way to connect to a web page and read its content.
In python there are numerous modules to perform various functions.
To connect to a web page on the internet and read its content there are several modules that we could use
. beautiful soap
We are going to choose the urllib module that will allow us to see some interesting Python concepts.
To read the content of a web page we could use the following program
We are going to use my blog site on the internet as an example, you could try any other website.
# – * – coding: iso-8859-15 – * –
indicates the type of coding of the code, if it is not put we will not be able to put accents even in the comments
With urllib.urlopen (url) we connect to the indicated address and with .read we read all the content of the web page.
We will copy the previous code in our webscrap project and execute it.
You will see how the HTML code of the page is displayed on the screen.