Scrap web content using Python Beautifulsoup

Hello to all,

Now I am sharing about Python BeautifulSoup package which helps to scrap html content from the web.
You can install BeautifulSoup by pip using the following command.

$ sudo pip install beautifulsoup4

the following simple program will scraps all links from index.html file

index.html

<html>
<headl>
<title> Simple BeautifulSoup <title>
<body>
<a href="http://google.com">Click</a>
</html>

test.py

from bs4 import BeautifulSoup
 
soup = BeautifulSoup(open("./index.html"))
 
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

Save the both file into a single folder. Then run the python command “python test.py“. The Beautifulsoup will open the index.html file and finds all a (anchor) tags in the html file. Then it’ll get the href in the anchor tag, after that it’ll print the href’s text.
The output will look like this.

$ python test.py
http://google.com

And you can also scrap content from the web. The following is my code which gets all link from a website.

flipkart_href.py

from bs4 import BeautifulSoup
import urllib2

url = "http://flipkart.com"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
links = soup.find_all("a")

for link in links:
    address = "flipkart.com"+link.get('href')
    print address+ '\n'

Explanation->

url = http://flipkart.com -> This line stores the url.

page = urllib2.urlopen(url) -> The urllib2 will open the url from the web and stores it into page variable

soup = BeautifulSoup(page) -> The BeautifulSoup function will stores the content of the webpage into soup variable

links = soup.find_all(“a”) -> The soup.find_all function will get all “a” anchor tags into links varible

for link in links: -> This for loop will stores all the links variable into link variable for looping process.

address = “flipkart.com”+link.get(‘href’) -> The link.get(‘href’) function will get links from the anchor tag, and stores it into address variable.

print address -> This will prints the address finally.

Similarly we can scrap anything from web with the use of python BeautifulSoup.

Any queries with BeautifulSoup, please comment.

With Regards,
S. Praveen

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Praveen Learner

Praveen Learner

Scrap web content using Python Beautifulsoup

Leave a comment Cancel reply

Scrap web content using Python Beautifulsoup

Share this:

Related

Leave a comment Cancel reply