Introduction

Let’s observe how to extract MercadoLibre product data with BeautifulSoup & Python in an easier and refined manner.

So, primarily, you require to make certain that you have Python 3 already installed and if you are not having that, just install the Python 3 before doing any proceeding.

pip3 install beautifulsoup4

In addition, we also need library’s requests, lxml, as well as soupsieve, to extract data, break that to XML, as well as make use of CSS selectors. After that, install them with

pip3 install requests soupsieve lxml

After the installation, open the editor as well as type

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

Then, visit the search page of MercadoLibre and study the data, which we can have.

It will look like this:

mercado-site-page

Now, let’s come back to code we have created and get data by assuming that we are using a browser similar to that.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://listado.mercadolibre.com.mx/phone#D[A:phone]'
response=requests.get(url,headers=headers)

Then, save it with the file name of scrapeMercado.py.

In case you run it.

python3 scrapeMercado.py

You can observe the full HTML page.

At the moment, let’s use the CSS selectors and find the necessary data… To perform that, we need to utilize Chrome as well as open an examined tool. We inform that with the class ‘.results-item.’, we have all the separate product data together.

Coding-Image

If, you notice that this blog’s title is restricted in the elements within results or item classes, we can have it like that.

# -*- coding: utf-8 -*- 
from bs4 import BeautifulSoup 
import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.11 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9', 'Accept-Encoding': 'identity' } 
#'Accept-Encoding': 'identity'url = 'https://listado.mercadolibre.com.mx/phone#D[A:phone]' 
response=requests.get(url,headers=headers) 
#print(response.content) 
soup=BeautifulSoup(response.content,'lxml') 
for item in soup.select('.results-item'): 
try: print('---------------------------') 
print(item.select('h2')[0].get_text()) 
except Exception as e: #raise e print('')

It selects all pb-layout-item blocks and also runs that, looking for different elements as and printing a text.

So, every time you run that, you will have

code-1

Bingo!!! We have the product titles.

Currently, with the similar process, we have class names regarding all other information like product’s images, links, and prices.

# -*- coding: utf-8 -*- 
from bs4 import BeautifulSoup 
import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.11 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9', 'Accept-Encoding': 'identity' } 
#'Accept-Encoding': 'identity' url = 'https://listado.mercadolibre.com.mx/phone#D[A:phone]' 
response=requests.get(url,headers=headers) #print(response.content) soup=BeautifulSoup(response.content,'lxml') 
for item in soup.select('.results-item'): 
try: print('---------------------------') 
print(item.select('h2')[0].get_text()) 
print(item.select('h2 a')[0]['href']) 
print(item.select('.price__container .item__price')[0].get_text()) 
print(item.select('.image-content a img')[0]['data-src']) 
except Exception as e: 
#raise e print('')

So, whenever we run, it needs to print things that we require from all products like that.

code-2

In case, you wish to utilize it in production or wish to scale up at thousands of different links, you would discover that you will quickly find the IP getting clogged by MercadoLibre. With the scenario, by rotating the proxy services, rotation of IPs is a necessity. You can use different services including Proxies API to route your calls using a group of millions of local proxies.

If you want to increase the crawling speed or don’t want to set an infrastructure, then you should use Cloud base crawlers to extract MercadoLibre product data with high speed from a group of different crawlers.

Still not sure about your requirements getting fulfilled? Then, contact RetailGators and we will solve all your problems.