scrapy.org
/
文档
第一步
Scrapy 概览
安装指南
Scrapy 教程
示例
基本概念
命令行工具
Spider
选择器 (Selectors)
Item
Item Loader
Scrapy shell
Item Pipeline
Feed 导出
Request 和 Response
Link Extractor
设置 (Settings)
异常 (Exceptions)
内置服务
日志 (Logging)
统计收集 (Stats Collection)
发送电子邮件
Telnet 控制台
解决特定问题
常见问题 (FAQ)
调试 Spider
Spider Contract
常用实践
大规模抓取 (Broad Crawls)
使用浏览器开发者工具进行抓取
选择动态加载的内容
调试内存泄漏
下载和处理文件与图像
部署 Spider
AutoThrottle 扩展
基准测试 (Benchmarking)
作业:暂停和恢复抓取
协程 (Coroutines)
asyncio
扩展 Scrapy
架构概览
插件 (Add-ons)
下载器中间件 (Downloader Middleware)
Spider 中间件 (Spider Middleware)
扩展 (Extensions)
信号 (Signals)
调度器 (Scheduler)
Item 导出器
组件 (Components)
核心 API
其他
版本说明 (Release notes)
贡献 Scrapy
版本控制和 API 稳定性
Scrapy
索引
索引
_
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
J
|
L
|
M
|
N
|
O
|
P
|
Q
|
R
|
S
|
T
|
U
|
V
|
W
|
X
_
__bool__() (scrapy.Selector 方法)
__init__()
(scrapy.core.scheduler.Scheduler 方法)
__len__() (scrapy.core.scheduler.Scheduler 方法)
A
accepts() (scrapy.extensions.feedexport.ItemFilter 方法)
adapt_response() (scrapy.spiders.XMLFeedSpider 方法)
add_css() (scrapy.loader.ItemLoader 方法)
add_jmes() (scrapy.loader.ItemLoader 方法)
add_to_list() (scrapy.settings.BaseSettings 方法)
add_value() (scrapy.loader.ItemLoader 方法)
add_xpath() (scrapy.loader.ItemLoader 方法)
ADDONS
设置
adjust_request_args() (scrapy.contracts.Contract 方法)
allow_offsite
reqmeta
allowed() (scrapy.robotstxt.RobotParser 方法)
allowed_domains (scrapy.Spider 属性)
ASYNCIO_EVENT_LOOP
设置
attrib (scrapy.Selector 属性)
(scrapy.selector.SelectorList 属性)
attributes (scrapy.http.JsonRequest 属性)
(scrapy.http.Response 属性)
(scrapy.http.TextResponse 属性)
(scrapy.Request 属性)
AUTOTHROTTLE_DEBUG
设置
autothrottle_dont_adjust_delay
reqmeta
AUTOTHROTTLE_ENABLED
设置
AUTOTHROTTLE_MAX_DELAY
设置
AUTOTHROTTLE_START_DELAY
设置
AUTOTHROTTLE_TARGET_CONCURRENCY
设置
AWS_ACCESS_KEY_ID
设置
AWS_ENDPOINT_URL
设置
AWS_REGION_NAME
设置
AWS_SECRET_ACCESS_KEY
设置
AWS_SESSION_TOKEN
设置
AWS_USE_SSL
设置
AWS_VERIFY
设置
B
BaseDupeFilter (scrapy.dupefilters 模块中的类)
BaseItemExporter (scrapy.exporters 模块中的类)
BaseScheduler (scrapy.core.scheduler 模块中的类)
BaseSettings (scrapy.settings 模块中的类)
BaseSpiderMiddleware (scrapy.spidermiddlewares.base 模块中的类)
bench
命令
bindaddress
reqmeta
body (scrapy.http.Response 属性)
(scrapy.Request 属性)
BOT_NAME
设置
build_from_crawler() (在 scrapy.utils.misc 模块中)
bytes_received
信号
bytes_received() (在 scrapy.signals 模块中)
Bz2Plugin (scrapy.extensions.postprocessing 模块中的类)
C
CacheStorage (scrapy.extensions.httpcache 模块中的类)
callback (scrapy.Request 属性)
CallbackKeywordArgumentsContract (scrapy.contracts.default 模块中的类)
cb_kwargs (scrapy.http.Response 属性)
(scrapy.Request 属性)
certificate (scrapy.http.Response 属性)
check
命令
clear_stats() (scrapy.statscollectors.StatsCollector 方法)
close()
(scrapy.core.scheduler.BaseScheduler 方法)
(scrapy.core.scheduler.Scheduler 方法)
close_spider()
(scrapy.extensions.httpcache.CacheStorage 方法)
(scrapy.statscollectors.StatsCollector 方法)
closed() (scrapy.Spider 方法)
CloseSpider
(scrapy.extensions.closespider 模块中的类)
CLOSESPIDER_ERRORCOUNT
设置
CLOSESPIDER_ITEMCOUNT
设置
CLOSESPIDER_PAGECOUNT
设置
CLOSESPIDER_PAGECOUNT_NO_ITEM
设置
CLOSESPIDER_TIMEOUT
设置
CLOSESPIDER_TIMEOUT_NO_ITEM
设置
命令
bench
check
crawl
edit
fetch
genspider
list
parse
runspider
settings
shell
startproject
version
view
COMMANDS_MODULE
设置
COMPRESSION_ENABLED
设置
CONCURRENT_ITEMS
设置
CONCURRENT_REQUESTS
设置
CONCURRENT_REQUESTS_PER_DOMAIN
设置
CONCURRENT_REQUESTS_PER_IP
设置
configure_logging() (在 scrapy.utils.log 模块中)
connect() (scrapy.signalmanager.SignalManager 方法)
context (scrapy.loader.ItemLoader 属性)
Contract (scrapy.contracts 模块中的类)
ContractFail (scrapy.exceptions 模块中的类)
cookiejar
reqmeta
COOKIES_DEBUG
设置
COOKIES_ENABLED
设置
CookiesMiddleware (scrapy.downloadermiddlewares.cookies 模块中的类)
copy() (scrapy.http.Response 方法)
(scrapy.Item 方法)
(scrapy.Request 方法)
(scrapy.settings.BaseSettings 方法)
copy_to_dict() (scrapy.settings.BaseSettings 方法)
CoreStats (scrapy.extensions.corestats 模块中的类)
crawl
命令
crawl() (scrapy.crawler.Crawler 方法)
(scrapy.crawler.CrawlerProcess 方法)
(scrapy.crawler.CrawlerRunner 方法)
crawled() (scrapy.logformatter.LogFormatter 方法)
Crawler (scrapy.crawler 模块中的类)
crawler (scrapy.Spider 属性)
CrawlerProcess (scrapy.crawler 模块中的类)
CrawlerRunner (scrapy.crawler 模块中的类)
crawlers (scrapy.crawler.CrawlerProcess 属性)
(scrapy.crawler.CrawlerRunner 属性)
CrawlSpider (scrapy.spiders 模块中的类)
create_crawler() (scrapy.crawler.CrawlerProcess 方法)
(scrapy.crawler.CrawlerRunner 方法)
css() (scrapy.http.TextResponse 方法)
(scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)
CSVFeedSpider (scrapy.spiders 模块中的类)
CsvItemExporter (scrapy.exporters 模块中的类)
csviter() (在 scrapy.utils.iterators 模块中)
curl_to_request_kwargs() (在 scrapy.utils.curl 模块中)
custom_settings (scrapy.Spider 属性)
D
DbmCacheStorage (scrapy.extensions.httpcache 模块中的类)
Debugger (scrapy.extensions.periodic_log 模块中的类)
deepcopy() (scrapy.Item 方法)
DEFAULT_DROPITEM_LOG_LEVEL
设置
default_input_processor (scrapy.loader.ItemLoader 属性)
DEFAULT_ITEM_CLASS
设置
default_item_class (scrapy.loader.ItemLoader 属性)
default_output_processor (scrapy.loader.ItemLoader 属性)
DEFAULT_REQUEST_HEADERS
设置
default_selector_class (scrapy.loader.ItemLoader 属性)
DefaultHeadersMiddleware (scrapy.downloadermiddlewares.defaultheaders 模块中的类)
DefaultReferrerPolicy (scrapy.spidermiddlewares.referer 模块中的类)
deferred_f_from_coro_f() (在 scrapy.utils.defer 模块中)
deferred_from_coro() (在 scrapy.utils.defer 模块中)
deferred_to_future() (在 scrapy.utils.defer 模块中)
delimiter (scrapy.spiders.CSVFeedSpider 属性)
DEPTH_LIMIT
设置
DEPTH_PRIORITY
设置
DEPTH_STATS_VERBOSE
设置
DepthMiddleware (scrapy.spidermiddlewares.depth 模块中的类)
disconnect() (scrapy.signalmanager.SignalManager 方法)
disconnect_all() (scrapy.signalmanager.SignalManager 方法)
DNS_RESOLVER
设置
DNS_TIMEOUT
设置
DNSCACHE_ENABLED
设置
DNSCACHE_SIZE
设置
dont_cache
reqmeta
dont_filter (scrapy.Request 属性)
dont_merge_cookies
reqmeta
dont_obey_robotstxt
reqmeta
dont_redirect
reqmeta
dont_retry
reqmeta
DontCloseSpider
DOWNLOAD_DELAY
设置
download_error() (scrapy.logformatter.LogFormatter 方法)
DOWNLOAD_FAIL_ON_DATALOSS
设置
download_fail_on_dataloss
reqmeta
DOWNLOAD_HANDLERS
设置
DOWNLOAD_HANDLERS_BASE
设置
download_latency
reqmeta
DOWNLOAD_MAXSIZE
设置
download_maxsize
reqmeta
DOWNLOAD_SLOTS
设置
DOWNLOAD_TIMEOUT
设置
download_timeout
reqmeta
DOWNLOAD_WARNSIZE
设置
download_warnsize
reqmeta
DOWNLOADER
设置
DOWNLOADER_CLIENT_TLS_CIPHERS
设置
DOWNLOADER_CLIENT_TLS_METHOD
设置
DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING
设置
DOWNLOADER_CLIENTCONTEXTFACTORY
设置
DOWNLOADER_HTTPCLIENTFACTORY
设置
DOWNLOADER_MIDDLEWARES
设置
DOWNLOADER_MIDDLEWARES_BASE
设置
DOWNLOADER_STATS
设置
DownloaderMiddleware (scrapy.downloadermiddlewares 模块中的类)
DownloaderStats (scrapy.downloadermiddlewares.stats 模块中的类)
DownloadTimeoutMiddleware (scrapy.downloadermiddlewares.downloadtimeout 模块中的类)
DropItem
dropped() (scrapy.logformatter.LogFormatter 方法)
DummyPolicy (scrapy.extensions.httpcache 模块中的类)
DummyStatsCollector (scrapy.statscollectors 模块中的类)
DUPEFILTER_CLASS
设置
DUPEFILTER_DEBUG
设置
E
edit
命令
EDITOR
设置
encoding (scrapy.exporters.BaseItemExporter 属性)
(scrapy.http.TextResponse 属性)
engine (scrapy.crawler.Crawler 属性)
engine_started
信号
engine_started() (在 scrapy.signals 模块中)
engine_stopped
信号
engine_stopped() (在 scrapy.signals 模块中)
enqueue_request() (scrapy.core.scheduler.BaseScheduler 方法)
(scrapy.core.scheduler.Scheduler 方法)
errback (scrapy.Request 属性)
ExecutionEngine (scrapy.core.engine 模块中的类)
export_empty_fields (scrapy.exporters.BaseItemExporter 属性)
export_item() (scrapy.exporters.BaseItemExporter 方法)
EXTENSIONS
设置
extensions (scrapy.crawler.Crawler 属性)
EXTENSIONS_BASE
设置
extract_links() (scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor 方法)
F
FEED_EXPORT_BATCH_ITEM_COUNT
设置
FEED_EXPORT_ENCODING
设置
FEED_EXPORT_FIELDS
设置
FEED_EXPORT_INDENT
设置
feed_exporter_closed
信号
feed_exporter_closed() (在 scrapy.signals 模块中)
FEED_EXPORTERS
设置
FEED_EXPORTERS_BASE
设置
feed_slot_closed
信号
feed_slot_closed() (在 scrapy.signals 模块中)
FEED_STORAGE_FTP_ACTIVE
设置
FEED_STORAGE_GCS_ACL
设置
FEED_STORAGE_S3_ACL
设置
FEED_STORAGES
设置
FEED_STORAGES_BASE
设置
FEED_STORE_EMPTY
设置
FEED_TEMPDIR
设置
FEED_URI_PARAMS
设置
FEEDS
设置
fetch
命令
Field (scrapy 中的类)
fields (scrapy.Item 属性)
fields_to_export (scrapy.exporters.BaseItemExporter 属性)
file_path() (scrapy.pipelines.files.FilesPipeline 方法)
(scrapy.pipelines.images.ImagesPipeline 方法)
FILES_EXPIRES
设置
FILES_RESULT_FIELD
设置
FILES_STORE
设置
FILES_STORE_GCS_ACL
设置
FILES_STORE_S3_ACL
设置
FILES_URLS_FIELD
设置
FilesPipeline (scrapy.pipelines.files 模块中的类)
FilesystemCacheStorage (scrapy.extensions.httpcache 模块中的类)
find_by_request() (scrapy.spiderloader.SpiderLoader 方法)
fingerprint()
(在 scrapy.utils.request 模块中)
finish_exporting() (scrapy.exporters.BaseItemExporter 方法)
flags (scrapy.http.Response 属性)
follow() (scrapy.http.Response 方法)
(scrapy.http.TextResponse 方法)
follow_all() (scrapy.http.Response 方法)
(scrapy.http.TextResponse 方法)
freeze() (scrapy.settings.BaseSettings 方法)
from_crawler()
(scrapy.core.scheduler.BaseScheduler 类方法)
(scrapy.core.scheduler.Scheduler 类方法)
(scrapy.robotstxt.RobotParser 类方法)
(scrapy.Spider 方法)
from_curl() (scrapy.Request 类方法)
from_response() (scrapy.FormRequest 类方法)
from_settings() (scrapy.spiderloader.SpiderLoader 方法)
frozencopy() (scrapy.settings.BaseSettings 方法)
FTP_PASSIVE_MODE
设置
FTP_PASSWORD
设置
ftp_password
reqmeta
FTP_USER
设置
ftp_user
reqmeta
G
GCS_PROJECT_ID
设置
genspider
命令
get() (scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)
(scrapy.settings.BaseSettings 方法)
get_addon() (scrapy.crawler.Crawler 方法)
get_collected_values() (scrapy.loader.ItemLoader 方法)
get_css() (scrapy.loader.ItemLoader 方法)
get_downloader_middleware() (scrapy.crawler.Crawler 方法)
get_extension() (scrapy.crawler.Crawler 方法)
get_item_pipeline() (scrapy.crawler.Crawler 方法)
get_jmes() (scrapy.loader.ItemLoader 方法)
get_media_requests() (scrapy.pipelines.files.FilesPipeline 方法)
(scrapy.pipelines.images.ImagesPipeline 方法)
get_oldest() (在 scrapy.utils.trackref 模块中)
get_output_value() (scrapy.loader.ItemLoader 方法)
get_processed_item() (scrapy.spidermiddlewares.base.BaseSpiderMiddleware 方法)
get_processed_request() (scrapy.spidermiddlewares.base.BaseSpiderMiddleware 方法)
get_retry_request() (在 scrapy.downloadermiddlewares.retry 模块中)
get_settings_priority() (在 scrapy.settings 模块中)
get_spider_middleware() (scrapy.crawler.Crawler 方法)
get_stats() (scrapy.statscollectors.StatsCollector 方法)
get_value() (scrapy.loader.ItemLoader 方法)
(scrapy.statscollectors.StatsCollector 方法)
get_xpath() (scrapy.loader.ItemLoader 方法)
getall() (scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)
getbool() (scrapy.settings.BaseSettings 方法)
getdict() (scrapy.settings.BaseSettings 方法)
getdictorlist() (scrapy.settings.BaseSettings 方法)
getfloat() (scrapy.settings.BaseSettings 方法)
getint() (scrapy.settings.BaseSettings 方法)
getlist() (scrapy.settings.BaseSettings 方法)
getpriority() (scrapy.settings.BaseSettings 方法)
getwithbase() (scrapy.settings.BaseSettings 方法)
global_object_name() (在 scrapy.utils.python 模块中)
GzipPlugin (scrapy.extensions.postprocessing 模块中的类)
H
handle_httpstatus_all
reqmeta
handle_httpstatus_list
reqmeta
has_pending_requests() (scrapy.core.scheduler.BaseScheduler 方法)
(scrapy.core.scheduler.Scheduler 方法)
headers (scrapy.http.Response 属性)
(scrapy.Request 属性)
(scrapy.spiders.CSVFeedSpider 属性)
headers_received
信号
headers_received() (在 scrapy.signals 模块中)
HtmlResponse (scrapy.http 模块中的类)
HttpAuthMiddleware (scrapy.downloadermiddlewares.httpauth 模块中的类)
HTTPCACHE_ALWAYS_STORE
设置
HTTPCACHE_DBM_MODULE
设置
HTTPCACHE_DIR
设置
HTTPCACHE_ENABLED
设置
HTTPCACHE_EXPIRATION_SECS
设置
HTTPCACHE_GZIP
设置
HTTPCACHE_IGNORE_HTTP_CODES
设置
HTTPCACHE_IGNORE_MISSING
设置
HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS
设置
HTTPCACHE_IGNORE_SCHEMES
设置
HTTPCACHE_POLICY
设置
HTTPCACHE_STORAGE
设置
HttpCacheMiddleware (scrapy.downloadermiddlewares.httpcache 模块中的类)
HttpCompressionMiddleware (scrapy.downloadermiddlewares.httpcompression 模块中的类)
HTTPERROR_ALLOW_ALL
设置
HTTPERROR_ALLOWED_CODES
设置
HttpErrorMiddleware (scrapy.spidermiddlewares.httperror 模块中的类)
HTTPPROXY_AUTH_ENCODING
设置
HTTPPROXY_ENABLED
设置
HttpProxyMiddleware (scrapy.downloadermiddlewares.httpproxy 模块中的类)
I
IgnoreRequest
IMAGES_EXPIRES
设置
IMAGES_MIN_HEIGHT
设置
IMAGES_MIN_WIDTH
设置
IMAGES_RESULT_FIELD
设置
IMAGES_STORE
设置
IMAGES_STORE_GCS_ACL
设置
IMAGES_STORE_S3_ACL
设置
IMAGES_THUMBS
设置
IMAGES_URLS_FIELD
设置
ImagesPipeline (scrapy.pipelines.images 模块中的类)
inc_value() (scrapy.statscollectors.StatsCollector 方法)
indent (scrapy.exporters.BaseItemExporter 属性)
install_reactor() (在 scrapy.utils.reactor 模块中)
ip_address (scrapy.http.Response 属性)
is_asyncio_reactor_installed() (在 scrapy.utils.reactor 模块中)
is_start_request
reqmeta
Item (scrapy 中的类)
item (scrapy.loader.ItemLoader 属性)
item_completed() (scrapy.pipelines.files.FilesPipeline 方法)
(scrapy.pipelines.images.ImagesPipeline 方法)
item_dropped
信号
item_dropped() (在 scrapy.signals 模块中)
item_error
信号
item_error() (在 scrapy.signals 模块中)
(scrapy.logformatter.LogFormatter 方法)
ITEM_PIPELINES
设置
ITEM_PIPELINES_BASE
设置
item_scraped
信号
item_scraped() (在 scrapy.signals 模块中)
ItemFilter (scrapy.extensions.feedexport 模块中的类)
ItemLoader (scrapy.loader 模块中的类)
ItemMeta (scrapy.item 模块中的类)
iter_all() (在 scrapy.utils.trackref 模块中)
iterator (scrapy.spiders.XMLFeedSpider 属性)
itertag (scrapy.spiders.XMLFeedSpider 属性)
J
jmespath() (scrapy.http.TextResponse 方法)
(scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)
JOBDIR
设置
join() (scrapy.crawler.CrawlerProcess 方法)
(scrapy.crawler.CrawlerRunner 方法)
json() (scrapy.http.TextResponse 方法)
JsonItemExporter (scrapy.exporters 模块中的类)
JsonLinesItemExporter (scrapy.exporters 模块中的类)
JsonRequest (scrapy.http 模块中的类)
JsonResponse (scrapy.http 模块中的类)
L
Link (scrapy.link 模块中的类)
list
命令
list() (scrapy.spiderloader.SpiderLoader 方法)
load() (scrapy.spiderloader.SpiderLoader 方法)
load_item() (scrapy.loader.ItemLoader 方法)
log() (scrapy.Spider 方法)
LOG_DATEFORMAT
设置
LOG_ENABLED
设置
LOG_ENCODING
设置
LOG_FILE
设置
LOG_FILE_APPEND
设置
LOG_FORMAT
设置
LOG_FORMATTER
设置
LOG_LEVEL
设置
LOG_SHORT_NAMES
设置
LOG_STDOUT
设置
LOG_VERSIONS
设置
LogFormatter (scrapy.logformatter 模块中的类)
logger (scrapy.Spider 属性)
LogStats (scrapy.extensions.logstats 模块中的类)
LOGSTATS_INTERVAL
设置
LxmlLinkExtractor (scrapy.linkextractors.lxmlhtml 模块中的类)
LZMAPlugin (scrapy.extensions.postprocessing 模块中的类)
M
MAIL_FROM
设置
MAIL_HOST
设置
MAIL_PASS
设置
MAIL_PORT
设置
MAIL_SSL
设置
MAIL_TLS
设置
MAIL_USER
设置
MailSender (scrapy.mail 模块中的类)
MarshalItemExporter (scrapy.exporters 模块中的类)
max_retry_times
reqmeta
max_value() (scrapy.statscollectors.StatsCollector 方法)
maxpriority() (scrapy.settings.BaseSettings 方法)
maybe_deferred_to_future() (在 scrapy.utils.defer 模块中)
MEDIA_ALLOW_REDIRECTS
设置
MEMDEBUG_ENABLED
设置
MEMDEBUG_NOTIFY
设置
MemoryDebugger (scrapy.extensions.memdebug 模块中的类)
MemoryStatsCollector (scrapy.statscollectors 模块中的类)
MemoryUsage (scrapy.extensions.memusage 模块中的类)
MEMUSAGE_CHECK_INTERVAL_SECONDS
设置
MEMUSAGE_ENABLED
设置
MEMUSAGE_LIMIT_MB
设置
MEMUSAGE_NOTIFY_MAIL
设置
MEMUSAGE_WARNING_MB
设置
meta (scrapy.http.Response 属性)
(scrapy.Request 属性)
MetadataContract (scrapy.contracts.default 模块中的类)
METAREFRESH_ENABLED
设置
METAREFRESH_IGNORE_TAGS
设置
METAREFRESH_MAXDELAY
设置
MetaRefreshMiddleware (scrapy.downloadermiddlewares.redirect 模块中的类)
method (scrapy.Request 属性)
min_value() (scrapy.statscollectors.StatsCollector 方法)
模块
scrapy.contracts
scrapy.contracts.default
scrapy.core.scheduler
scrapy.crawler
scrapy.downloadermiddlewares
scrapy.downloadermiddlewares.cookies
scrapy.downloadermiddlewares.defaultheaders
scrapy.downloadermiddlewares.downloadtimeout
scrapy.downloadermiddlewares.httpauth
scrapy.downloadermiddlewares.httpcache
scrapy.downloadermiddlewares.httpcompression
scrapy.downloadermiddlewares.httpproxy
scrapy.downloadermiddlewares.offsite
scrapy.downloadermiddlewares.redirect
scrapy.downloadermiddlewares.retry
scrapy.downloadermiddlewares.robotstxt
scrapy.downloadermiddlewares.stats
scrapy.downloadermiddlewares.useragent
scrapy.exceptions
scrapy.exporters
scrapy.extensions.closespider
scrapy.extensions.corestats
scrapy.extensions.debug
scrapy.extensions.httpcache
scrapy.extensions.logstats
scrapy.extensions.memdebug
scrapy.extensions.memusage
scrapy.extensions.periodic_log
scrapy.extensions.spiderstate
scrapy.extensions.statsmailer
scrapy.extensions.telnet
scrapy.http
scrapy.item
scrapy.link
scrapy.linkextractors
scrapy.linkextractors.lxmlhtml
scrapy.loader
scrapy.mail
scrapy.pipelines.files
scrapy.pipelines.images
scrapy.robotstxt
scrapy.selector
scrapy.settings
scrapy.signalmanager
scrapy.signals
scrapy.spiderloader
scrapy.spidermiddlewares
scrapy.spidermiddlewares.base
scrapy.spidermiddlewares.depth
scrapy.spidermiddlewares.httperror
scrapy.spidermiddlewares.referer
scrapy.spidermiddlewares.start
scrapy.spidermiddlewares.urllength
scrapy.statscollectors
scrapy.utils.log
scrapy.utils.trackref
N
name (scrapy.Spider 属性)
namespaces (scrapy.spiders.XMLFeedSpider 属性)
needs_backout() (scrapy.core.engine.ExecutionEngine 方法)
nested_css() (scrapy.loader.ItemLoader 方法)
nested_xpath() (scrapy.loader.ItemLoader 方法)
NEWSPIDER_MODULE
设置
next_request() (scrapy.core.scheduler.BaseScheduler 方法)
(scrapy.core.scheduler.Scheduler 方法)
NO_CALLBACK() (在 scrapy.http.request 模块中)
NoReferrerPolicy (scrapy.spidermiddlewares.referer 模块中的类)
NoReferrerWhenDowngradePolicy (scrapy.spidermiddlewares.referer 模块中的类)
NotConfigured
NotSupported
O
object_ref (scrapy.utils.trackref 模块中的类)
OffsiteMiddleware (scrapy.downloadermiddlewares.offsite 模块中的类)
open() (scrapy.core.scheduler.BaseScheduler 方法)
(scrapy.core.scheduler.Scheduler 方法)
open_in_browser() (在 scrapy.utils.response 模块中)
open_spider()
(scrapy.extensions.httpcache.CacheStorage 方法)
(scrapy.statscollectors.StatsCollector 方法)
OriginPolicy (scrapy.spidermiddlewares.referer 模块中的类)
OriginWhenCrossOriginPolicy (scrapy.spidermiddlewares.referer 模块中的类)
P
parse
命令
parse() (scrapy.Spider 方法)
parse_node() (scrapy.spiders.XMLFeedSpider 方法)
parse_row() (scrapy.spiders.CSVFeedSpider 方法)
parse_start_url() (scrapy.spiders.CrawlSpider 方法)
PERIODIC_LOG_DELTA
设置
PERIODIC_LOG_STATS
设置
PERIODIC_LOG_TIMING_ENABLED
设置
PeriodicLog (scrapy.extensions.periodic_log 模块中的类)
PickleItemExporter (scrapy.exporters 模块中的类)
pop() (scrapy.settings.BaseSettings 方法)
post_process() (scrapy.contracts.Contract 方法)
PprintItemExporter (scrapy.exporters 模块中的类)
pre_process() (scrapy.contracts.Contract 方法)
print_live_refs() (在 scrapy.utils.trackref 模块中)
priority (scrapy.Request 属性)
process_exception() (scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
process_item()
process_request() (scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
process_response() (scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
process_results() (scrapy.spiders.XMLFeedSpider 方法)
process_spider_exception() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_spider_input() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_spider_output() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_spider_output_async() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_start() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
protocol (scrapy.http.Response 属性)
proxy
reqmeta
Python 增强提案
PEP 8
PythonItemExporter (scrapy.exporters 模块中的类)
Q
quotechar (scrapy.spiders.CSVFeedSpider 属性)
R
RANDOMIZE_DOWNLOAD_DELAY
设置
re() (scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)
re_first() (scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)
REACTOR_THREADPOOL_MAXSIZE
设置
REDIRECT_ENABLED
设置
REDIRECT_MAX_TIMES
设置
REDIRECT_PRIORITY_ADJUST
设置
redirect_reasons
reqmeta
redirect_urls
reqmeta
RedirectMiddleware (scrapy.downloadermiddlewares.redirect 模块中的类)
REFERER_ENABLED
设置
RefererMiddleware (scrapy.spidermiddlewares.referer 模块中的类)
REFERRER_POLICY
设置
referrer_policy
reqmeta
register_namespace() (scrapy.Selector 方法)
remove_from_list() (scrapy.settings.BaseSettings 方法)
remove_namespaces() (scrapy.Selector 方法)
replace() (scrapy.http.Response 方法)
(scrapy.Request 方法)
replace_css() (scrapy.loader.ItemLoader 方法)
replace_in_component_priority_dict() (scrapy.settings.BaseSettings 方法)
replace_jmes() (scrapy.loader.ItemLoader 方法)
replace_value() (scrapy.loader.ItemLoader 方法)
replace_xpath() (scrapy.loader.ItemLoader 方法)
reqmeta
allow_offsite
autothrottle_dont_adjust_delay
bindaddress
cookiejar
dont_cache
dont_merge_cookies
dont_obey_robotstxt
dont_redirect
dont_retry
download_fail_on_dataloss
download_latency
download_maxsize
download_timeout
download_warnsize
ftp_password
ftp_user
handle_httpstatus_all
handle_httpstatus_list
is_start_request
max_retry_times
proxy
redirect_reasons
redirect_urls
referrer_policy
Request (scrapy 中的类)
request (scrapy.http.Response 属性)
request_dropped
信号
request_dropped() (在 scrapy.signals 模块中)
request_fingerprinter (scrapy.crawler.Crawler 属性)
REQUEST_FINGERPRINTER_CLASS
设置
request_from_dict() (在 scrapy.utils.request 模块中)
request_left_downloader
信号
request_left_downloader() (在 scrapy.signals 模块中)
request_reached_downloader
信号
request_reached_downloader() (在 scrapy.signals 模块中)
request_scheduled
信号
request_scheduled() (在 scrapy.signals 模块中)
RequestFingerprinter (scrapy.utils.request 模块中的类)
Response (scrapy.http 模块中的类)
response_downloaded
信号
response_downloaded() (在 scrapy.signals 模块中)
response_received
信号
response_received() (在 scrapy.signals 模块中)
retrieve_response() (scrapy.extensions.httpcache.CacheStorage 方法)
RETRY_ENABLED
设置
RETRY_EXCEPTIONS
设置
RETRY_HTTP_CODES
设置
RETRY_PRIORITY_ADJUST
设置
RETRY_TIMES
设置
RetryMiddleware (scrapy.downloadermiddlewares.retry 模块中的类)
ReturnsContract (scrapy.contracts.default 模块中的类)
RFC2616Policy (scrapy.extensions.httpcache 模块中的类)
RFPDupeFilter (scrapy.dupefilters 模块中的类)
RobotParser (scrapy.robotstxt 模块中的类)
ROBOTSTXT_OBEY
设置
ROBOTSTXT_PARSER
设置
ROBOTSTXT_USER_AGENT
设置
RobotsTxtMiddleware (scrapy.downloadermiddlewares.robotstxt 模块中的类)
Rule (scrapy.spiders 模块中的类)
rules (scrapy.spiders.CrawlSpider 属性)
runspider
命令
S
SameOriginPolicy (scrapy.spidermiddlewares.referer 模块中的类)
SCHEDULER
设置
Scheduler (scrapy.core.scheduler 模块中的类)
SCHEDULER_DEBUG
设置
SCHEDULER_DISK_QUEUE
设置
scheduler_empty
信号
scheduler_empty() (在 scrapy.signals 模块中)
SCHEDULER_MEMORY_QUEUE
设置
SCHEDULER_PRIORITY_QUEUE
设置
SCHEDULER_START_DISK_QUEUE
设置
SCHEDULER_START_MEMORY_QUEUE
设置
scraped() (scrapy.logformatter.LogFormatter 方法)
SCRAPER_SLOT_MAX_ACTIVE_SIZE
设置
ScrapesContract (scrapy.contracts.default 模块中的类)
scrapy.contracts
模块
scrapy.contracts.default
模块
scrapy.core.scheduler
模块
scrapy.crawler
模块
scrapy.downloadermiddlewares
模块
scrapy.downloadermiddlewares.cookies
模块
scrapy.downloadermiddlewares.defaultheaders
模块
scrapy.downloadermiddlewares.downloadtimeout
模块
scrapy.downloadermiddlewares.httpauth
模块
scrapy.downloadermiddlewares.httpcache
模块
scrapy.downloadermiddlewares.httpcompression
模块
scrapy.downloadermiddlewares.httpproxy
模块
scrapy.downloadermiddlewares.offsite
模块
scrapy.downloadermiddlewares.redirect
模块
scrapy.downloadermiddlewares.retry
模块
scrapy.downloadermiddlewares.robotstxt
模块
scrapy.downloadermiddlewares.stats
模块
scrapy.downloadermiddlewares.useragent
模块
scrapy.exceptions
模块
scrapy.exporters
模块
scrapy.extensions.closespider
模块
scrapy.extensions.corestats
模块
scrapy.extensions.debug
模块
scrapy.extensions.httpcache
模块
scrapy.extensions.logstats
模块
scrapy.extensions.memdebug
模块
scrapy.extensions.memusage
模块
scrapy.extensions.periodic_log
模块
scrapy.extensions.spiderstate
模块
scrapy.extensions.statsmailer
模块
scrapy.extensions.telnet
模块
scrapy.FormRequest (内置类)
scrapy.http
模块
scrapy.item
模块
scrapy.link
模块
scrapy.linkextractors
模块
scrapy.linkextractors.lxmlhtml
模块
scrapy.loader
模块
scrapy.mail
模块
scrapy.pipelines.files
模块
scrapy.pipelines.images
模块
scrapy.robotstxt
模块
scrapy.selector
模块
scrapy.settings
模块
scrapy.signalmanager
模块
scrapy.signals
模块
scrapy.spiderloader
模块
scrapy.spidermiddlewares
模块
scrapy.spidermiddlewares.base
模块
scrapy.spidermiddlewares.depth
模块
scrapy.spidermiddlewares.httperror
模块
scrapy.spidermiddlewares.referer
模块
scrapy.spidermiddlewares.start
模块
scrapy.spidermiddlewares.urllength
模块
scrapy.spiders.Spider (内置类)
scrapy.statscollectors
模块
scrapy.utils.log
模块
scrapy.utils.trackref
模块
Selector (scrapy 中的类)
selector (scrapy.http.TextResponse 属性)
(scrapy.loader.ItemLoader 属性)
SelectorList (scrapy.selector 模块中的类)
send() (scrapy.mail.MailSender 方法)
send_catch_log() (scrapy.signalmanager.SignalManager 方法)
send_catch_log_deferred() (scrapy.signalmanager.SignalManager 方法)
serialize_field() (scrapy.exporters.BaseItemExporter 方法)
set() (scrapy.settings.BaseSettings 方法)
set_in_component_priority_dict() (scrapy.settings.BaseSettings 方法)
set_stats() (scrapy.statscollectors.StatsCollector 方法)
set_value() (scrapy.statscollectors.StatsCollector 方法)
setdefault() (scrapy.settings.BaseSettings 方法)
setdefault_in_component_priority_dict() (scrapy.settings.BaseSettings 方法)
setmodule() (scrapy.settings.BaseSettings 方法)
设置
ADDONS
ASYNCIO_EVENT_LOOP
AUTOTHROTTLE_DEBUG
AUTOTHROTTLE_ENABLED
AUTOTHROTTLE_MAX_DELAY
AUTOTHROTTLE_START_DELAY
AUTOTHROTTLE_TARGET_CONCURRENCY
AWS_ACCESS_KEY_ID
AWS_ENDPOINT_URL
AWS_REGION_NAME
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_USE_SSL
AWS_VERIFY
BOT_NAME
CLOSESPIDER_ERRORCOUNT
CLOSESPIDER_ITEMCOUNT
CLOSESPIDER_PAGECOUNT
CLOSESPIDER_PAGECOUNT_NO_ITEM
CLOSESPIDER_TIMEOUT
CLOSESPIDER_TIMEOUT_NO_ITEM
COMMANDS_MODULE
COMPRESSION_ENABLED
CONCURRENT_ITEMS
CONCURRENT_REQUESTS
CONCURRENT_REQUESTS_PER_DOMAIN
CONCURRENT_REQUESTS_PER_IP
COOKIES_DEBUG
COOKIES_ENABLED
DEFAULT_DROPITEM_LOG_LEVEL
DEFAULT_ITEM_CLASS
DEFAULT_REQUEST_HEADERS
DEPTH_LIMIT
DEPTH_PRIORITY
DEPTH_STATS_VERBOSE
DNS_RESOLVER
DNS_TIMEOUT
DNSCACHE_ENABLED
DNSCACHE_SIZE
DOWNLOAD_DELAY
DOWNLOAD_FAIL_ON_DATALOSS
DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS_BASE
DOWNLOAD_MAXSIZE
DOWNLOAD_SLOTS
DOWNLOAD_TIMEOUT
DOWNLOAD_WARNSIZE
DOWNLOADER
DOWNLOADER_CLIENT_TLS_CIPHERS
DOWNLOADER_CLIENT_TLS_METHOD
DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING
DOWNLOADER_CLIENTCONTEXTFACTORY
DOWNLOADER_HTTPCLIENTFACTORY
DOWNLOADER_MIDDLEWARES
DOWNLOADER_MIDDLEWARES_BASE
DOWNLOADER_STATS
DUPEFILTER_CLASS
DUPEFILTER_DEBUG
EDITOR
EXTENSIONS
EXTENSIONS_BASE
FEED_EXPORT_BATCH_ITEM_COUNT
FEED_EXPORT_ENCODING
FEED_EXPORT_FIELDS
FEED_EXPORT_INDENT
FEED_EXPORTERS
FEED_EXPORTERS_BASE
FEED_STORAGE_FTP_ACTIVE
FEED_STORAGE_GCS_ACL
FEED_STORAGE_S3_ACL
FEED_STORAGES
FEED_STORAGES_BASE
FEED_STORE_EMPTY
FEED_TEMPDIR
FEED_URI_PARAMS
FEEDS
FILES_EXPIRES
FILES_RESULT_FIELD
FILES_STORE
FILES_STORE_GCS_ACL
FILES_STORE_S3_ACL
FILES_URLS_FIELD
FTP_PASSIVE_MODE
FTP_PASSWORD
FTP_USER
GCS_PROJECT_ID
HTTPCACHE_ALWAYS_STORE
HTTPCACHE_DBM_MODULE
HTTPCACHE_DIR
HTTPCACHE_ENABLED
HTTPCACHE_EXPIRATION_SECS
HTTPCACHE_GZIP
HTTPCACHE_IGNORE_HTTP_CODES
HTTPCACHE_IGNORE_MISSING
HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS
HTTPCACHE_IGNORE_SCHEMES
HTTPCACHE_POLICY
HTTPCACHE_STORAGE
HTTPERROR_ALLOW_ALL
HTTPERROR_ALLOWED_CODES
HTTPPROXY_AUTH_ENCODING
HTTPPROXY_ENABLED
IMAGES_EXPIRES
IMAGES_MIN_HEIGHT
IMAGES_MIN_WIDTH
IMAGES_RESULT_FIELD
IMAGES_STORE
IMAGES_STORE_GCS_ACL
IMAGES_STORE_S3_ACL
IMAGES_THUMBS
IMAGES_URLS_FIELD
ITEM_PIPELINES
ITEM_PIPELINES_BASE
JOBDIR
LOG_DATEFORMAT
LOG_ENABLED
LOG_ENCODING
LOG_FILE
LOG_FILE_APPEND
LOG_FORMAT
LOG_FORMATTER
LOG_LEVEL
LOG_SHORT_NAMES
LOG_STDOUT
LOG_VERSIONS
LOGSTATS_INTERVAL
MAIL_FROM
MAIL_HOST
MAIL_PASS
MAIL_PORT
MAIL_SSL
MAIL_TLS
MAIL_USER
MEDIA_ALLOW_REDIRECTS
MEMDEBUG_ENABLED
MEMDEBUG_NOTIFY
MEMUSAGE_CHECK_INTERVAL_SECONDS
MEMUSAGE_ENABLED
MEMUSAGE_LIMIT_MB
MEMUSAGE_NOTIFY_MAIL
MEMUSAGE_WARNING_MB
METAREFRESH_ENABLED
METAREFRESH_IGNORE_TAGS
METAREFRESH_MAXDELAY
NEWSPIDER_MODULE
PERIODIC_LOG_DELTA
PERIODIC_LOG_STATS
PERIODIC_LOG_TIMING_ENABLED
RANDOMIZE_DOWNLOAD_DELAY
REACTOR_THREADPOOL_MAXSIZE
REDIRECT_ENABLED
REDIRECT_MAX_TIMES
REDIRECT_PRIORITY_ADJUST
REFERER_ENABLED
REFERRER_POLICY
REQUEST_FINGERPRINTER_CLASS
RETRY_ENABLED
RETRY_EXCEPTIONS
RETRY_HTTP_CODES
RETRY_PRIORITY_ADJUST
RETRY_TIMES
ROBOTSTXT_OBEY
ROBOTSTXT_PARSER
ROBOTSTXT_USER_AGENT
SCHEDULER
SCHEDULER_DEBUG
SCHEDULER_DISK_QUEUE
SCHEDULER_MEMORY_QUEUE
SCHEDULER_PRIORITY_QUEUE
SCHEDULER_START_DISK_QUEUE
SCHEDULER_START_MEMORY_QUEUE
SCRAPER_SLOT_MAX_ACTIVE_SIZE
SPIDER_CONTRACTS
SPIDER_CONTRACTS_BASE
SPIDER_LOADER_CLASS
SPIDER_LOADER_WARN_ONLY
SPIDER_MIDDLEWARES
SPIDER_MIDDLEWARES_BASE
SPIDER_MODULES
STATS_CLASS
STATS_DUMP
STATSMAILER_RCPTS
TELNETCONSOLE_ENABLED
TELNETCONSOLE_HOST
TELNETCONSOLE_PASSWORD
TELNETCONSOLE_PORT
TELNETCONSOLE_USERNAME
TEMPLATES_DIR
TWISTED_REACTOR
URLLENGTH_LIMIT
USER_AGENT
WARN_ON_GENERATOR_RETURN_VALUE
settings
命令
Settings (scrapy.settings 模块中的类)
settings (scrapy.crawler.Crawler 属性)
(scrapy.Spider 属性)
SETTINGS_PRIORITIES (在 scrapy.settings 模块中)
shell
命令
信号
bytes_received
engine_started
engine_stopped
feed_exporter_closed
feed_slot_closed
headers_received
item_dropped
item_error
item_scraped
request_dropped
request_left_downloader
request_reached_downloader
request_scheduled
response_downloaded
response_received
scheduler_empty
spider_closed
spider_error
spider_idle
spider_opened
update_telnet_vars
SignalManager (scrapy.signalmanager 模块中的类)
signals (scrapy.crawler.Crawler 属性)
sitemap_alternate_links (scrapy.spiders.SitemapSpider 属性)
sitemap_filter() (scrapy.spiders.SitemapSpider 方法)
sitemap_follow (scrapy.spiders.SitemapSpider 属性)
sitemap_rules (scrapy.spiders.SitemapSpider 属性)
sitemap_urls (scrapy.spiders.SitemapSpider 属性)
SitemapSpider (scrapy.spiders 模块中的类)
Spider (scrapy 中的类)
spider (scrapy.crawler.Crawler 属性)
spider_closed
信号
spider_closed() (在 scrapy.signals 模块中)
SPIDER_CONTRACTS
设置
SPIDER_CONTRACTS_BASE
设置
spider_error
信号
spider_error() (在 scrapy.signals 模块中)
(scrapy.logformatter.LogFormatter 方法)
spider_idle
信号
spider_idle() (在 scrapy.signals 模块中)
SPIDER_LOADER_CLASS
设置
SPIDER_LOADER_WARN_ONLY
设置
SPIDER_MIDDLEWARES
设置
SPIDER_MIDDLEWARES_BASE
设置
SPIDER_MODULES
设置
spider_opened
信号
spider_opened() (在 scrapy.signals 模块中)
spider_stats (scrapy.statscollectors.MemoryStatsCollector 属性)
SpiderLoader (scrapy.spiderloader 模块中的类)
SpiderMiddleware (scrapy.spidermiddlewares 模块中的类)
SpiderState (scrapy.extensions.spiderstate 模块中的类)
StackTraceDump (scrapy.extensions.periodic_log 模块中的类)
start() (scrapy.crawler.CrawlerProcess 方法)
(scrapy.Spider 方法)
start_exporting() (scrapy.exporters.BaseItemExporter 方法)
start_urls (scrapy.Spider 属性)
startproject
命令
StartSpiderMiddleware (scrapy.spidermiddlewares.start 模块中的类)
state (scrapy.Spider 属性)
stats (scrapy.crawler.Crawler 属性)
STATS_CLASS
设置
STATS_DUMP
设置
StatsCollector (scrapy.statscollectors.StatsCollector 模块中的类)
StatsMailer (scrapy.extensions.statsmailer 模块中的类)
STATSMAILER_RCPTS
设置
status (scrapy.http.Response 属性)
stop() (scrapy.crawler.Crawler 方法)
(scrapy.crawler.CrawlerProcess 方法)
(scrapy.crawler.CrawlerRunner 方法)
StopDownload
store_response() (scrapy.extensions.httpcache.CacheStorage 方法)
StrictOriginPolicy (scrapy.spidermiddlewares.referer 模块中的类)
StrictOriginWhenCrossOriginPolicy (scrapy.spidermiddlewares.referer 模块中的类)
T
TelnetConsole (scrapy.extensions.telnet 模块中的类)
TELNETCONSOLE_ENABLED
设置
TELNETCONSOLE_HOST
设置
TELNETCONSOLE_PASSWORD
设置
TELNETCONSOLE_PORT
设置
TELNETCONSOLE_USERNAME
设置
TEMPLATES_DIR
设置
text (scrapy.http.TextResponse 属性)
TextResponse (scrapy.http 模块中的类)
thumb_path() (scrapy.pipelines.images.ImagesPipeline 方法)
to_dict() (scrapy.Request 方法)
TWISTED_REACTOR
设置
U
UnsafeUrlPolicy (scrapy.spidermiddlewares.referer 模块中的类)
update() (scrapy.settings.BaseSettings 方法)
update_pre_crawler_settings()
update_settings()
(scrapy.Spider 类方法)
update_telnet_vars
信号
update_telnet_vars() (在 scrapy.extensions.telnet 模块中)
uri_params() (在 scrapy.extensions.feedexport 模块中)
url (scrapy.http.Response 属性)
(scrapy.Request 属性)
UrlContract (scrapy.contracts.default 模块中的类)
urljoin() (scrapy.http.Response 方法)
(scrapy.http.TextResponse 方法)
URLLENGTH_LIMIT
设置
UrlLengthMiddleware (scrapy.spidermiddlewares.urllength 模块中的类)
USER_AGENT
设置
UserAgentMiddleware (scrapy.downloadermiddlewares.useragent 模块中的类)
V
version
命令
view
命令
W
wait_for() (scrapy.signalmanager.SignalManager 方法)
WARN_ON_GENERATOR_RETURN_VALUE
设置
write()
X
XMLFeedSpider (scrapy.spiders 模块中的类)
XmlItemExporter (scrapy.exporters 模块中的类)
xmliter_lxml() (在 scrapy.utils.iterators 模块中)
XmlResponse (scrapy.http 模块中的类)
xpath() (scrapy.http.TextResponse 方法)
(scrapy.Selector 方法)
(scrapy.selector.SelectorList 方法)