python - Scrapy CrawlOnce Middleware: struct.error: unpack requires a buffer of 4 bytes on Job Restart - Stack Overflow

admin•2025-04-17 18:08:06•questions•阅读0

I'm using Scrapy's built-in CrawlOnce middleware, and when I stop and restart a job, I someti

I'm using Scrapy's built-in CrawlOnce middleware, and when I stop and restart a job, I sometimes get this error:

Traceback (most recent call last):
  File "/app/tasks/crawl.py", line 250, in crawl
    crawler_proc.start()
  File "/usr/local/lib/python3.9/site-packages/scrapy/crawler.py", line 346, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/base.py", line 1318, in run
    self.mainLoop()
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/base.py", line 1328, in mainLoop
    reactorBaseSelf.runUntilCurrent()
--- <exception caught here> ---
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/base.py", line 994, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/local/lib/python3.9/site-packages/scrapy/utils/reactor.py", line 51, in __call__
    return self._func(*self._a, **self._kw)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 157, in _next_request
    self.crawl(request)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 247, in crawl
    self._schedule_request(request, self.spider)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 252, in _schedule_request
    if not self.slot.scheduler.enqueue_request(request):  # type: ignore[union-attr]
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/scheduler.py", line 241, in enqueue_request
    dqok = self._dqpush(request)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/scheduler.py", line 280, in _dqpush
    self.dqs.push(request)
  File "/usr/local/lib/python3.9/site-packages/scrapy/pqueues.py", line 89, in push
    self.queues[priority] = self.qfactory(priority)
  File "/usr/local/lib/python3.9/site-packages/scrapy/pqueues.py", line 76, in qfactory
    return create_instance(
  File "/usr/local/lib/python3.9/site-packages/scrapy/utils/misc.py", line 166, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scrapy/squeues.py", line 68, in from_crawler
    return cls(crawler, key)
  File "/usr/local/lib/python3.9/site-packages/scrapy/squeues.py", line 64, in __init__
    super().__init__(key)
  File "/usr/local/lib/python3.9/site-packages/scrapy/squeues.py", line 23, in __init__
    super().__init__(path, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/queuelib/queue.py", line 208, in __init__
    (self.size,) = struct.unpack(self.SIZE_FORMAT, qsize)
struct.error: unpack requires a buffer of 4 bytes

Deleting the request queue fixes it, but this causes another issue: If pagination exists in the request flow, stopping the spider at an intermediate paginated URL and then deleting the queue results in losing the pagination state. Since my crawler starts only with base URLs provided in an input file, it loses track of subsequent pagination requests that would have been generated dynamically.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744146677a4560456.html

admin

questions
javascript - ionic avoid view title flickering on Android - Stack Overflow
I am trying to do tab content page with the "standard" way as suggested by the ionic template
admin
24分钟前
00
questions
javascript - Having difficulty working with global variables and $.getScript - Stack Overflow
I have the following script which does not work properly:function getWidgetContent( widget ) {if(widget
admin
23分钟前
10
questions
plugin development - Default media uploader is not showing in wordpress website
There is a custom post type events in the website. When I create a new post under events i am facing issue with media up
admin
22分钟前
00
questions
javascript - AngularJS - change href Attribute based on current path - Stack Overflow
How can I change the value of the href-Attribute based on the current path?My if clause in AngularJS lo
admin
22分钟前
10
questions
javascript - Why does my function return undefined? - Stack Overflow
With easeljs and box2d I have created several objects which collide with each other. Using the followin
admin
22分钟前
10
questions
customization - Is it possible to use Gutenberg on the front-end?
Now that WordPress 5 has the new Gutenberg editor, I'm interested in the possibility of using it programmatically o
admin
21分钟前
00
questions
javascript - Nodejs and webSockets, triggering events? - Stack Overflow
I am new to this, I built a standard web chat application and I see the power of nodejs, express, socke
admin
21分钟前
00
questions
Sort by last 7 days in post list
My post list design is good. But the list sorted by popular and it not working. Please see my code below and shed some l
admin
18分钟前
10
questions
javascript - Age validation using ddmmyyyy - Stack Overflow
I am trying to validate a form I have for age validating using javascript but it doesn't seem to b
admin
17分钟前
10
questions
Concatenating functions in Javascript - Stack Overflow
I'm currently working in a js-ponent and i was wondering if there's a better way to concatena
admin
16分钟前
10
questions
c++ - Visual Studio error C2953 when redefining operator << with template parameter packs - Stack Overflow
All,I ran into a C++20 Visual Studio error C2593 ("Operator << is ambiguous") when com
admin
13分钟前
10
questions
ansible - Set become password for all hosts in an inventory without using the all group - Stack Overflow
TL;DR Is there an easy way to set the become password for all hosts in a particular inventory file with
admin
12分钟前
10
questions
javascript - Import Constants in Vuex Store - Stack Overflow
The Vuex store of my Vue.js application is growing and getting a little bit messy with a lot of constan
admin
9分钟前
10
questions
asp.net - How to HideShow a Button upon clicking another Button using JavaScript - Stack Overflow
I have a button that is hidden by default but would like to make the button visible only after clicking
admin
6分钟前
10
questions
javascript - why this way "this.foo = new (function () {..})();" vs. "this.foo = function (){...};&am
Is there any difference in the two definitions and assignments of functions?this.foo = new (function ()
admin
4分钟前
00
questions
javascript - Why does Google unescape their Analytics tracking code? - Stack Overflow
Just getting off my JavaScript training wheels.Why does Google choose to unescape the document.write li
admin
4分钟前
10
questions
javascript - How can I cancel a users radiobutton selection, after a confirm dialog cancel selection? - Stack Overflow
In a bigger project, I have a group of radiobuttons, which updates the text in a textarea. Sometimes ho
admin
3分钟前
10
questions
duckdb - How to show all columns of a data table on Duck Python API? - Stack Overflow
I have a clarification question to the associated question (How to see all columns of of DuckDb relatio
admin
2分钟前
00
questions
javascript - Name html blob urls for easy reference - Stack Overflow
Within our web application we load a lot of content from package files (zipped packages containing html
admin
2分钟前
10
questions
javascript - Using Keyup Event Globally - Stack Overflow
How would I go about using JavaScript on my web page page to call a function anytime there is a keyup e
admin
1分钟前
00

发表回复

评论列表（0条）

暂无评论

python - Scrapy CrawlOnce Middleware: struct.error: unpack requires a buffer of 4 bytes on Job Restart - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Scrapy CrawlOnce Middleware: struct.error: unpack requires a buffer of 4 bytes on Job Restart - Stack Overflow

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888