How to Extract Coupon Details from the Walmart Store using LXML and Python?

This tutorial blog will help you know how to scrape coupon details from Walmart.

We’ll scrape the following data from every coupon listed in the store:

Discounted Pricing
Category
Brand
Activation Date
Expiry Date
Product Description
URL

From, below screenshot you can see how data is getting extracted.

You can extract or go further with different coupons created on different brand & filters. But as of now, you need to keep it simple.

Finding the Data

Use any browser or choice a store URL.

https://www.walmart.com/store/5941/washington-dc.

Click the option Coupon on left-hand side and you will able to see list of all the coupons which are offered for Walmart store 5941.

You need to Right-click on the given link on page and select – Inspect Element. The browser will help you to open toolbar and will display HTML Content of the Website, organized nicely. Click on the Network panel so that you can clear all requirements from the Demand table.

Click on this request – ?pid=19521&nid=10&zid=vz89&storezip=20001

You can see this Request URL – https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

After that, you need to recognize the parameters values- nid, pid, as well as storezip. Check the variables in a page source - https://www.walmart.com/store/5941/washington-dc

Here, you can observe different variables are allocated to the javascript variable _wml.config. You can use variables from different source, page and make the URL of coupons endpoint – https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

Recover the HTML coupon from URL and you will see how data can be extract from javascript variable APP_COUPONSINC. You can copy data into JSON parser to display data in a structured format.

You can see data fields for the coupons with each coupon ID.

Building the Scraper

Utilize Python 3 in this tutorial. This code is not going to work if you use Python 2.7. You require a computer to start PIP and Python 3 fixed in it.

Many UNIX OS like Mac OS and Linux come with pre-installed Python. However, not each Linux OS ships by default with Python 3.

Let’s check Python version. Exposed the terminal (in Mac OS and Linux) or Facility Prompt (with Windows) and kind

-- python version

and click enter. In case, the outputs look like Python 3.x.x, then you need to install Python 3. If you say Python 2.x.x then you are using Python 2. If error comes, that means you don’t have installed Python. If Python 3 is not install then, install that first.

Installing Python 3 as well as Pip

You can go through the guide of installing Python 3 with Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

The Mac Users may also follow the guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Installing Packages

Python requirements, for making requests as well as downloading HTML content about various pages (http://docs.python-requests.org/en/master/user/install/).
You can use Python LXML to analyze HTML Tree Assembly through Xpaths (Find out how to install it there – http://lxml.de/installation.html)
UnicodeCSV to handle Unicode typescripts in output folder. Install that using pip install unicodecsv.

The Code

from lxml import html

import csv

import requests

import re

import json

import argparse

import traceback

def parse(store_id):"""

Function to retrieve coupons in a particular walmart store

:param store_id: walmart store id, you can get this id from the output of walmart store location script

#sending request to get coupon related meta details

url = "https://www.walmart.com/store/%s/coupons"%store_id

headers = {"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

"accept-encoding":"gzip, deflate, br",

"accept-language":"en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",

"referer":"https://www.walmart.com/store/finder",

"upgrade-insecure-requests":"1",

"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36" }

#adding retry

for retry in range(5):

try:

response = requests.get(url, headers=headers)

raw_coupon_url_details = re.findall('"couponsData":({.*?})',response.text)

if raw_coupon_url_details:

coupons_details_url_info_dict = json.loads(raw_coupon_url_details[0])

#these variables are used to create coupon page url

pid = coupons_details_url_info_dict.get('pid')

nid = coupons_details_url_info_dict.get('nid')

zid = coupons_details_url_info_dict.get('zid')

#coupons details are rendering from the following url

#example link

:https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

coupons_details_url =

"https://www.coupons.com/coupons/?pid={0}&nid={1}&zid={2}".format(pid,nid,zid)

print("retrieving coupon page")

coupon_headers =

{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

"Accept-Encoding":"gzip, deflate, br",

"Accept-Language":"en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",

"Host":"www.coupons.com",

"Upgrade-Insecure-Requests":"1",

"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"

}

response = requests.get(coupons_details_url,headers=coupon_headers) coupon_raw_json = re.findall("APP_COUPONSINC\s?=\s?({.*});",response.text) print("processing coupons data")

if coupon_raw_json:

data = []

coupon_json_data = json.loads(coupon_raw_json[0])

coupons = coupon_json_data.get('contextData').get('gallery').get('podCache')

for coupon in coupons:

price = coupons[coupon].get('summary')

product_brand = coupons[coupon].get('brand')

details = coupons[coupon].get('details')

expiration = coupons[coupon].get('expiration')

category_1 = coupons[coupon].get('catdesc1','')

category_2 = coupons[coupon].get('catdesc2','')

category_3 = coupons[coupon].get('catdesc3','')

category = ' > '.join([category_1,category_2,category_3])

wallmart_data={

"offer":price,

"brand":product_brand,

"description":details,

"category":category,

"activated_date":activated,

"expiration_date":expiration,

"url":coupons_details_url

}

data.append(wallmart_data)

return data

except:

print(traceback.format_exc())

return []

if __name__=="__main__":

argparser = argparse.ArgumentParser()

argparser.add_argument('store_id',help = 'walmart store id')

args = argparser.parse_args()

store_id = args.store_id{

scraped_data = parse(store_id)

if scraped_data:

print ("Writing scraped data to %s_coupons.csv"%(store_id))

with open('%s_coupons.csv'%(store_id),'w') as csvfile:

fieldnames =

["offer","brand","description","category","activated_date","expiration_date","url"]

writer = csv.DictWriter(csvfile,fieldnames = fieldnames,quoting=csv.QUOTE_ALL)

writer.writeheader()

for data in scraped_data:

writer.writerow(data)

For example, get the coupon information from store 3305, we can run a script like that:

Also, you will get file name 3305_coupons.csv which will remain in the similar folder as a script. The result file will appearance similar.

Identified Limitations

The given code works for extract eCommerce Data Scraping coupons information of Walmart stores for store IDs obtainable on Walmart.com. In case, you wish to extract data of millions of pages you need to go through more sources.

If you are looking for the professional with scraping complex website, then you can contact RetailGators for all your queries.

How to Extract Coupon Details from the Walmart Store using LXML and Python?

Finding the Data

Building the Scraper

Installing Python 3 as well as Pip

Installing Packages

The Code

Identified Limitations

Leave a Reply

Ready to get started?

Looking For Scalable Retail Web Data?

Our Headquarters

Our Achievements

Our Services

Popular Etailer

Quick Links

Get In Touch