How to Extract Alibaba Product Data with Scrapy

Extracting information from E-Commerce sites such as Alibaba, Amazon, eBay, help to provide enormous opportunity for competitors, market research, and price comparison firm. Being among the foremost e-commerce companies, Alibaba products catalog is huge and handy to anyone who is looking to extract data. Extracting Alibaba Product Data can be difficult if you are not having accurate resources and team to perform Alibaba Product Data Extracting. Outsourcing Alibaba extracting helps you to fulfill all your requirements with dedicated scraping services.

Installing Python 3 with Pip

We utilize Python 3 in this Blog. To begin, you require a PC using Python 3 as well as PIP.

Mac: - http://docs.python-guide.org/en/latest/starting/install3/osx/
Linux: - http://docs.python-guide.org/en/latest/starting/install3/linux/
Window: - https://www.retailgators.com/how-to-install-python3-in-windows-10/

PackagesInstall

pip3 install scrapyselectorlib

Find out more information by installing here -

https://doc.scrapy.org/en/latest/intro/

Creating Scrapy Projects

Let us create scrapy task using the command given below.

scrapystartprojectscrapy_alibaba

It can help to create Scrapy task with the help of Name of Project (scrapy_alibaba) as folder name. This contains all required files with accurate structure as well as basics with each file.

from selectorlib import Extractor

scrapy_alibaba/ # Project root directory

scrapy.cfg # Contains the configuration information to deploy the spider

scrapy_alibaba/ # Project's python module

__init__.py

items.py # Describes the definition of each item that we’re scraping

middlewares.py # Project middlewares

pipelines.py # Project pipelines file

settings.py # Project settings file

spiders/ # All the spider code goes into this directory

__init__.py

Creating a Spider

The Scrapy has built a command named genspiderso that you can produce the fundamental spider templet.

scrapygenspider(spidername)(website)

Let’s produce our spider

scrapygenspideralibaba_crawleralibaba.com

This will help to create a file spider/scrapy_alibaba.py for recent templets for crawling Alibaba.com

This code is shown here:

importscrapy

classAlibabaCrawlerSpider(scrapy.Spider):

name = 'alibaba_crawler'

allowed_domains = ['alibaba.com']

start_urls = ['http://alibaba.com/']

defparse(self, response):

pass

Searching Keywords from the file

Let us make the CSV file it named keywords.csv.

This file shows that if we want to search distinctly for earplugs and headphones.

keywords

headphones

earplugs

It’s time to use CSV Python’s standard module for reading the keyword file.

defparse(self, response):

"""Function to read keywords from keywords file"""

keywords = csv.DictReader(open(os.path.join(os.path.dirname(__file__),"../resources/keywords.csv")))

for keyword in keywords:

search_text = keyword["keyword"]

url = "https://www.alibaba.com/trade/search?fsb=y&IndexArea=product_en&CatId=&SearchText={0}&viewtype=G".format(search_text)

yieldscrapy.Request(url, callback=self.parse_listing, meta={"search_text":search_text})

A Complete Scrapy Spider’s Code

You can see the whole code at - https://contactus/retailgators/alibaba-scraper

A spider called alibaba_crawler will look at

https://contactus/retailgators/alibaba-scraper/blob/master/scrapy_alibaba/spiders/alibaba_crawler.py

https://contactus/retailgators/Let’s run this scraper with

scrapy crawl alibaba_crawler

DEBUG: Forbidden by robots.txt:

It is because Alibaba’s website has discovered to crawl different URLs array /trade. So, you can easily that by visiting robots.txt file, positioned at https://www.alibaba.com/robots.txt

Export Products data inCSV & JSON using Scrapy

The Scrapy offers in-built JSON & CSV formats for output.

scrapy crawl (spidername) -o output_filename.csv -t csv

scrapy crawl (spidername) -o output_filename.json -t json

CSV output:

scrapycrawlalibaba_crawler-oalibaba.csv-tcsv

JSON Output:

scrapycrawlalibaba_crawler-oalibaba.csv-tjson

List of Data Fields

At RetailGators, we extract data for Alibaba Web Data Scraping Services. Data Fields are given below:

Name of Product
Product Price Range
Images of Product
Product Links
Minimum Product Order
Name of Seller
Seller Reply Rate
Number of sellers on Alibaba

Key Features of Alibaba Web Scraping Solutions

RetailGators help you to provide fully customized eCommerce Data Scraping that are accessible to deal with data requirements for big companies. Quality and Stability are one of the most important factors if data crawling is concerned. Many DIY Tools are available for scraping through in-house resources.

Here are some of the Key Advantages which is given below: -

Fully-Customized
Many Alternative Data Delivery
Fully manageable Solutions
High-Quality & Well-Structured Data

What we can scrape from Alibaba?

Website data can help the company to fill the intelligence gap in the association. Here are few things you can do with data scraping from Alibaba.

Price Comparison Data
Cataloging Data
Analyses

Why RetailGators?

If you are looking for the best Alibaba Web Data Scraping Services, then you can contact RetailGators for all your queries.

How to Extract Alibaba Product Data with Scrapy - Extract Alibaba Product Data

Installing Python 3 with Pip

PackagesInstall

Creating Scrapy Projects

Creating a Spider

This code is shown here:

Searching Keywords from the file

A Complete Scrapy Spider’s Code

Export Products data inCSV & JSON using Scrapy

CSV output:

JSON Output:

List of Data Fields

Key Features of Alibaba Web Scraping Solutions

What we can scrape from Alibaba?

Leave a Reply

Ready to get started?

Looking For Scalable Retail Web Data?

Our Headquarters

Our Achievements

Our Services

Popular Etailer

Quick Links

Get In Touch