Get internal links from a web page
Once we have read the content of the website, we will extract the internal links.
We will search for html tags of the form <a href = ’…..’ that go to our website
To achieve this, we will use the regular expression, where we instruct you to search the text for lines that begin with the address of our website, followed by 1 or more characters that are not quotation marks and that end with html.
The findall function will search for all search patterns in the html text.
with this we obtain on the screen all the internal links used on the website
In some cases, links may appear with% 20 symbols that are special characters such as spaces or accents.