This tutorial blog will help you know how to scrape coupon details from Walmart.

We’ll scrape the following data from every coupon listed in the store:

  • Discounted Pricing
  • Category
  • Brand
  • Activation Date
  • Expiry Date
  • Product Description
  • URL

From, below screenshot you can see how data is getting extracted.

discount-amount

You can extract or go further with different coupons created on different brand & filters. But as of now, you need to keep it simple.

Finding the Data

Use any browser or choice a store URL.

https://www.walmart.com/store/5941/washington-dc.

Click the option Coupon on left-hand side and you will able to see list of all the coupons which are offered for Walmart store 5941.

find-data

You need to Right-click on the given link on page and select – Inspect Element. The browser will help you to open toolbar and will display HTML Content of the Website, organized nicely. Click on the Network panel so that you can clear all requirements from the Demand table.

Click on this request – ?pid=19521&nid=10&zid=vz89&storezip=20001

You can see this Request URL – https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

After that, you need to recognize the parameters values- nid, pid, as well as storezip. Check the variables in a page source - https://www.walmart.com/store/5941/washington-dc

Here, you can observe different variables are allocated to the javascript variable _wml.config. You can use variables from different source, page and make the URL of coupons endpoint – https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

Recover the HTML coupon from URL and you will see how data can be extract from javascript variable APP_COUPONSINC. You can copy data into JSON parser to display data in a structured format.

You can see data fields for the coupons with each coupon ID.

Building the Scraper

Utilize Python 3 in this tutorial. This code is not going to work if you use Python 2.7. You require a computer to start PIP and Python 3 fixed in it.

Many UNIX OS like Mac OS and Linux come with pre-installed Python. However, not each Linux OS ships by default with Python 3.

Let’s check Python version. Exposed the terminal (in Mac OS and Linux) or Facility Prompt (with Windows) and kind

-- python version

and click enter. In case, the outputs look like Python 3.x.x, then you need to install Python 3. If you say Python 2.x.x then you are using Python 2. If error comes, that means you don’t have installed Python. If Python 3 is not install then, install that first.

Installing Python 3 as well as Pip

You can go through the guide of installing Python 3 with Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

The Mac Users may also follow the guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Installing Packages
  • Python requirements, for making requests as well as downloading HTML content about various pages (http://docs.python-requests.org/en/master/user/install/).
  • You can use Python LXML to analyze HTML Tree Assembly through Xpaths (Find out how to install it there – http://lxml.de/installation.html)
  • UnicodeCSV to handle Unicode typescripts in output folder. Install that using pip install unicodecsv.
The Code
from lxml import html
import csv
import requests
import re
import json
import argparse
import traceback
def parse(store_id):"""
Function to retrieve coupons in a particular walmart store
:param store_id: walmart store id, you can get this id from the output of walmart store location script
#sending request to get coupon related meta details
url = "https://www.walmart.com/store/%s/coupons"%store_id
headers = {"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding":"gzip, deflate, br",
"accept-language":"en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",
"referer":"https://www.walmart.com/store/finder",
"upgrade-insecure-requests":"1",
"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36" }
#adding retry
for retry in range(5):
try:
response = requests.get(url, headers=headers)
raw_coupon_url_details = re.findall('"couponsData":({.*?})',response.text)
if raw_coupon_url_details:
coupons_details_url_info_dict = json.loads(raw_coupon_url_details[0])
#these variables are used to create coupon page url
pid = coupons_details_url_info_dict.get('pid')
nid = coupons_details_url_info_dict.get('nid')
zid = coupons_details_url_info_dict.get('zid')
#coupons details are rendering from the following url
#example link
:https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001
coupons_details_url =
"https://www.coupons.com/coupons/?pid={0}&nid={1}&zid={2}".format(pid,nid,zid)
print("retrieving coupon page")
coupon_headers =
{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate, br",
"Accept-Language":"en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",
"Host":"www.coupons.com",
"Upgrade-Insecure-Requests":"1",
"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"
}
response = requests.get(coupons_details_url,headers=coupon_headers) coupon_raw_json = re.findall("APP_COUPONSINC\s?=\s?({.*});",response.text) print("processing coupons data")
if coupon_raw_json:
data = []
coupon_json_data = json.loads(coupon_raw_json[0])
coupons = coupon_json_data.get('contextData').get('gallery').get('podCache')
for coupon in coupons:
price = coupons[coupon].get('summary')
product_brand = coupons[coupon].get('brand')
details = coupons[coupon].get('details')
expiration = coupons[coupon].get('expiration')
category_1 = coupons[coupon].get('catdesc1','')
category_2 = coupons[coupon].get('catdesc2','')
category_3 = coupons[coupon].get('catdesc3','')
category = ' > '.join([category_1,category_2,category_3])
wallmart_data={
"offer":price,
"brand":product_brand,
"description":details,
"category":category,
"activated_date":activated,
"expiration_date":expiration,
"url":coupons_details_url
}
data.append(wallmart_data)
return data
except:
print(traceback.format_exc())
return []
if __name__=="__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('store_id',help = 'walmart store id')
args = argparser.parse_args()
store_id = args.store_id{
scraped_data = parse(store_id)
if scraped_data:
print ("Writing scraped data to %s_coupons.csv"%(store_id))
with open('%s_coupons.csv'%(store_id),'w') as csvfile:
fieldnames =
["offer","brand","description","category","activated_date","expiration_date","url"]
writer = csv.DictWriter(csvfile,fieldnames = fieldnames,quoting=csv.QUOTE_ALL)
writer.writeheader()
for data in scraped_data:
writer.writerow(data)