爬虫框架PySpider教程:一个国人编写的强大的网络爬虫系统并带有强大的WebUI

Pyspider 是一款强大的简单易用的网络爬虫, 基于HTML的可在线定制的强大爬虫框架,

A Powerful Spider(Web Crawler) System in Python. TRY IT NOW!

Write script in Python
Powerful WebUI with script editor, task monitor, project manager and result viewer
MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend
RabbitMQ, Beanstalk, Redis and Kombu as message queue
Task priority, retry, periodical, recrawl by age, etc…
Distributed architecture, Crawl Javascript pages, Python 2&3, etc…

Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases

Sample Code

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

Installation

pip install pyspider
run command pyspider, visit http://localhost:5000/

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/

Contribute

Use It
Open Issue, send PR
User Group
中文问答

TODO

v0.4.0

local mode, load script from file.
works as a framework (all components running in one process, no threads)
redis
shell mode like scrapy shell
a visual scraping interface like portia

edit script with vim via WebDAV

License

Licensed under the Apache License, Version 2.0

爬虫框架PySpider教程:一个国人编写的强大的网络爬虫系统并带有强大的WebUI

Sample Code

Installation

Contribute

TODO

v0.4.0

more

License

发表回复

Sample Code

Installation

Contribute

TODO

v0.4.0

more

License

Related Posts

[PDF电子书] 文明之光：第三册 电子书下载 PDF下载

JSP程序设计从入门到精通 word版 PDF 免费下载

[PDF电子书] JavaScript高级程序设计 电子书下载 PDF下载

发表回复

[PDF电子书] 文明之光：第三册电子书下载 PDF下载

[PDF电子书] JavaScript高级程序设计电子书下载 PDF下载