Introduction

In this blog, we will discuss about how to build a web scraper that will get latest delivery status and price for liquor from local wine and different store.

At RetailGators, we can scrape the following data fields from total wine & wine store:

  • Name of Wine
  • Pricing of Wine
  • Size of Quantity
  • Stock of Liquor
  • Delivery Available or
  • URL of Website
data-field

We can save data in CSV or Excel format.

sample-data

It is Mandatory to Install-Package to Route Total Wine and Other Web Store Scraper

We can use Python 3 for libraries and this you can do in Cloud or VPS or a Raspberry Pi.

We can easily use these libraries: -

  • Python Request is for making various request to download HTML content. (http://docs.python-requests.org/en/master/user/install/)
  • Selectorlib for extracting data using the YAML file we have developed from different websites that we have downloaded.
  • Easily Install them with pip3.

Installing Request for pip3 selectorlib

Python Code

Contact us for full code which is use in this Blog.

https://www.retailgators.com/

You can make the file name products.py or you can paste the Python code is given in it.

from selectorlib import Extractor
import requests
import csv
e = Extractor.from_yaml_file('selectors.yml')
def scrape(url):
headers = {
'authority': 'www.totalwine.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'referer': 'https://www.totalwine.com/beer/united-states/c/001304',
'accept-language': 'en-US,en;q=0.9',
}
r = requests.get(url, headers=headers)
return e.extract(r.text, base_url=url)
with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile:
writer = csv.DictWriter(outfile, fieldnames=["Name","Price","Size","InStock","DeliveryAvailable","URL"],quoting=csv.QUOTE_ALL)
writer.writeheader()
for url in urllist.read().splitlines():
data = scrape(url)
if data:
for r in data['Products']:
writer.writerow(r)

The Code can do below mention things: -

  • You can easily read the list of URLs and Wines from the file name urls.txt (This file contains URLs for TWM products page like Scotch, Beer, & Wines, etc.)
  • Using selectorlib YAML file, we can identify Total Wine pages’ data in the file name selectors.yml (Want to know more, how you can create the file you will come to know in this Blog).
  • Extract the Data
  • Download data in CSV Spreadsheet layout data.csv name.
Create a YAML file name selectors.yml

Products:

from selectorlib import Extractor
css: article.productCard__2nWxIKmi
multiple: true
type: Text
children:
Price:
css: span.price__1JvDDp_x
type: Text
Name:
css: 'h2.title__2RoYeYuO a'
type: Text
Size:
css: 'h2.title__2RoYeYuO span'
type: Text
InStock:
css: 'p:nth-of-type(1) span.message__IRMIwVd1'
type: Text
URL:
css: 'h2.title__2RoYeYuO a'
type: Link
DeliveryAvailable:
css: 'p:nth-of-type(2) span.message__IRMIwVd1'
type: Text