Web Scraping Course for SEO with Python – Class 2

Reading content from a web page with urllib

The first step of the project would be to find a way to connect to a web page and read its content.

In python there are numerous modules to perform various functions.

 

To connect to a web page on the internet and read its content there are several modules that we could use

. sockets

. urllib

. beautiful soap

 

We are going to choose the urllib module that will allow us to see some interesting Python concepts.

To read the content of a web page we could use the following program

We are going to use my blog site on the internet as an example, you could try any other website.

 

# – * – coding: iso-8859-15 – * -# Web Scraping Project for SEO with Python
# Class 2: Read content from a web page with urllib
# webscrap2import urllib
url = “https://evginformatica.blogspot.com/”

html = urllib.urlopen (url) .read ()
print html

The line

# – * – coding: iso-8859-15 – * –
indicates the type of coding of the code, if it is not put we will not be able to put accents even in the comments

 

With urllib.urlopen (url) we connect to the indicated address and with .read we read all the content of the web page.

We will copy the previous code in our webscrap project and execute it.

 

You will see how the HTML code of the page is displayed on the screen.

 

Leave a Reply

Your email address will not be published. Required fields are marked *