Responsive Ad Area

Share This Post

test

Scrapy_splash can not get data?

When I want to scrapy the ‘https://book.douban.com/annual/2016‘, I can not get the data. Code is in the bottom.

It show “Hello IT, have you tried turning it off and on again?”.

But I run the script in “http://localhost:8050“, it show me what I want.

lua_script = """
function main(splash)
  assert(splash:autoload("https://img3.doubanio.com/f/ithil/31683c94fc5c3d40cb6e3d541825be4956a1220d/js/lib/es5-shim.min.js"))
  assert(splash:autoload("https://img3.doubanio.com/f/ithil/a7de8db438da176dd0eeb59efe46306b39f1261f/js/lib/es6-shim.min.js"))
  assert(splash:autoload("https://img3.doubanio.com/dae/cdnlib/libs/jweixin/1.0.0/jweixin.js"))
  assert(splash:autoload("https://img3.doubanio.com/f/ithil/dd4fe4440669275cafde939df8cfdd32ca1252e5/gen/ithil.bundle.js"))
  assert(splash:autoload("https://hm.baidu.com/hm.js?16a14f3002af32bf3a75dfe352478639"))
  assert(splash:go(splash.args.url))
  assert(splash:wait(0.5))
  return splash:html()
end
"""

class DoubanbookSpider(scrapy.Spider):
    name = 'doubanBook-2016'
    allowed_domains = ['book.douban.com']
    start_urls = ['http://book.douban.com/']

    def start_requests(self):

        base_url = 'https://book.douban.com/annual/2016'
        #yield SplashRequest(base_url)        
        yield SplashRequest(base_url, endpoint='execute', args={'lua_source': lua_script},
                                cache_args=['lua_source'])

    def parse(self, response):       


############################################

    print(response.body)  # result contain "Hello IT, have you tried turning it off and on again?".
    listname= response.css('h1 div::text').extract_first()        
    print(listname)   # result is None


Scrapy_splash can not get data?
Scrapy_splash can not get data?
test
{$excerpt:n}

Share This Post

Leave a Reply

Your email address will not be Publishedd. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Skip to toolbar