Today I wrote a crawler and encountered a pit, prompt
[scrapy.core.scraper] ERROR: Spider must return request, item, or None, got'Tag' in <GET https://www.
In fact, the reason is very unexpected. I used item in the code, and scrapy used item to pass data. As a result, this problem was caused.
for item in soup.select(".job-list-item"):
uu=item.select_one("a").get('href').split("?")[0]
if uu is not None:
item['wz']=uu
yield item
If you change the yield
to return
, it does not report an error, but the data cannot be transmitted, you need to change it to the following code
for itema in soup.select(".job-list-item"):
uu=itema.select_one("a").get('href').split("?")[0]
if uu is not None:
item['wz']=uu
yield item