Scrapy爬虫框架详细解析 Portia:Scrapy 可视化爬取, 我们在网络数据抓取网页内容抓取的时候不得不提到的框架是Scrapy框架, 但是有UI界面的Pyspider要简单易用很多就是因为有WEB UI界面. 但是有了可视化UI的Scrapy则大大加强.
Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.
Try it out
To try Portia for free without needing to install anything sign up for an account at scrapinghub and you can use our hosted version.
Running Portia
The easiest way to run Portia is using Vagrant.
Clone the repository:
git clone https://github.com/scrapinghub/portia
Then inside the Portia directory, run:
vagrant up
For more detailed instructions, and alternatives to using Vagrant, see the Installation docs.
Documentation
Documentation can be found here. Source files can be found in the docs
directory.