文章/答案/技术大牛

发布

社区首页 >专栏 >019：Selenium操作Boss直聘进行一键职位投递

019：Selenium操作Boss直聘进行一键职位投递

李玺

发布于 2021-11-22 10:57:40

3.3K66

代码可运行

文章被收录于专栏：爬虫逆向案例爬虫逆向案例

运行总次数：6

代码可运行

这里不是打广告，好的招聘平台有很多，Boss直聘是一个。虽然Boss直聘上面可以跟 HR 直接沟通很实用，但是投递职位非常麻烦，需要一个一个的手动去点击，大多数沟通了还没有反应。所以我今天就用 Selenium + Python 写了一个自动沟通的脚本。写的时候发现，Boss直聘上面反 Selenium 措施也是很到位的。

下面我就介绍下代码实现的具体步骤吧。

首先模拟登陆：

Boss直聘官网：https://www.zhipin.com/ 我直接访问的登陆界面的url：https://login.zhipin.com/?ka=header-login

他这里打开直接是账号密码登陆，看起来倒是可以省些功夫。

我想用class_name直接获取input的时候，发现有三个手机号输入框，这里需要使用xpath来定位元素，

所以直接右键点击input这里，选择copy，点击copy中的copy-Xpath。

密码框也是一样。直接copy-Xpath

driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[3]/span[2]/input').click()
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[3]/span[2]/input').send_keys('你手机号')
time.sleep(1)
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[4]/span/input').click()
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[4]/span/input').send_keys('你的密码')
time.sleep(1)
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[6]/button').click()
time.sleep(2)

接着就是这个令人头疼的滑块了。尽管这个滑块看起来十分简单。只需要拖动到最右边。

我刚开始就直接使用 ActionChains 来拖动鼠标，这里试了很久，每次拖动到最后都会报错，这里特别坑。

然后再pycharm里面报错：stale element reference: element is not attached to the page document

说明获取到的元素是在变化的 =、= 这里可以通过他固定的一个id来对应我们的xpath去匹配。

# id = driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[5]/div[1]').get_attribute('data-nc-idx')
# print(id)
# time.sleep(0.5)
# huakuai = driver.find_element_by_xpath('//*[@id="nc_{}_n1z"]'.format(id))

然而搞了很久发现这里不仅仅是元素过期问题，我用手动拖动的时候也是会报错。

这样就应该是我们的 WebDriver 被Boss给检测出来了。

So,那我们要隐藏下自己的webdriver属性。有几种方式都可以，比如添加mitmproxy代理等，这里我使用简单的把selenium改为开发者模式，就能防基于webdriver屏蔽了。

from selenium.webdriver import ChromeOptions
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
url = 'https://login.zhipin.com/?ka=header-login'
driver = webdriver.Chrome(executable_path=r'C:\Users\lenovo\Desktop\chromedriver_win32\chromedriver.exe',options=option)
driver.get(url)

2020/05/03更新： 网友发现之前的开发者模式不能用了，因为今年谷歌新版本取消了开发者模式对webdriver = flase的设置。现在添加下面的js即可：

    driver = webdriver.Chrome(executable_path=r'C:\Users\lenovo\Desktop\chromedriver_win32\chromedriver.exe',options=option)

    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", 
                       { "source": """ Object.defineProperty(navigator, 'webdriver', { get: () => undefined }) """ 
                         })
    driver.get(url)

但是在操作滑块的时候，还是很容易报错。那咋办呢。。。干脆就过滤掉这个滑块，反正我们的目标是来自动化投递简历，并不是一直登陆。

所以就有了标题中的一键操作这个概念，我们登陆时来手动拖动验证码。其他的就不需要操作了。（暂时先这样）

自动沟通：

登陆过后会自动挑战到默认地区的默认搜索界面：这里没啥难度，直接发送关键词点击搜索。

driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[1]/div[1]/div/form/div[1]/p/input').click()
driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[1]/div[1]/div/form/div[1]/p/input').send_keys('爬虫')
driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[1]/div[1]/div/form/button').click()
time.sleep(1)

然后记录下当前页面的url： liebiao_url = driver.current_url

这里我先操作鼠标移入我们要沟通的那一项：

ActionChains(driver).move_to_element(driver.find_element_by_xpath('//*
[@id="main"]/div/div[3]/ul/li[{}]/div/div[3]'.format(i))).perform()      # i 在外面给的循环，完整代码贴到最后
	time.sleep(0.5)

然后获取详情页的url：

xiangqing_url =  driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/ul/li[{}]/div/div[1]/h3/a'.format(i)).get_attribute('href')
print(xiangqing_url)
driver.get(xiangqing_url)
time.sleep(1)

点击详情页中的与我沟通：

driver.find_element_by_xpath('//*[@id="main"]/div[1]/div/div/div[3]/div[1]/a').click()
time.sleep(1)

返回之前的列表页：

driver.back()
time.sleep(1)
driver.back()
time.sleep(1)
if driver.current_url != liebiao_url:
   driver.get(liebiao_url)

有时候会出现下来这个玩意，再back两次就会错了。所以我们要判断下，如果出现这个，只back一次就好了。

 if 'https://www.zhipin.com/job_detail' in str(driver.current_url):
     driver.back()
     time.sleep(1.5)
 else:
     time.sleep(1.5)
     driver.back()
     time.sleep(2)
     driver.back()
     time.sleep(2)

测试结果：

大功告成了，可以一边看别的一边等回复了！如果需要知道自己投递了多少和跟什么职位打招呼了，可以在详情页获取下元素，来获取详细的信息，这个就不多说了。

完整代码：

from selenium import webdriver
from selenium.webdriver import ActionChains
import time
from selenium.webdriver import ChromeOptions

option = ChromeOptions()
# driver = Chrome(options=option)
url = 'https://login.zhipin.com/?ka=header-login'
driver = webdriver.Chrome(executable_path=r'C:\Users\lenovo\Desktop\chromedriver_win32\chromedriver.exe',options=option)

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", 
                   { "source": """ Object.defineProperty(navigator, 'webdriver', { get: () => undefined }) """ 
                     })
driver.get(url)

def huakuai():
# id = driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[5]/div[1]').get_attribute('data-nc-idx')
# print(id)
# time.sleep(0.5)
# huakuai = driver.find_element_by_xpath('//*[@id="nc_{}_n1z"]'.format(id))
# action = ActionChains(driver)
# action.click_and_hold(huakuai).perform()
# time.sleep(1)
# for i in range(12):
#     action.move_by_offset(i, 0).perform()
# action.release().perform()
# action.release(on_element=huakuai).perform()
    pass
time.sleep(5)

driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[3]/span[2]/input').click()
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[3]/span[2]/input').send_keys('13401108846')
time.sleep(1)
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[4]/span/input').click()
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[4]/span/input').send_keys('ying5338619')
time.sleep(1)
driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div[1]/div[2]/div/form/div[6]/button').click()
time.sleep(2)

driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[1]/div[1]/div/form/div[1]/p/input').click()
driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[1]/div[1]/div/form/div[1]/p/input').send_keys('爬虫')
driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[1]/div[1]/div/form/button').click()
time.sleep(1)
liebiao_url = driver.current_url+'&page={}'
time.sleep(1)
for j in range(1,10):
    liebiao_url=liebiao_url.format(j)
    print(liebiao_url)
    driver.get(liebiao_url)
    time.sleep(1)
    print(driver.current_url)
    print("==========================")
    # print("列表链接",liebiao_url)
    # try:
    for i in range(1,30):
        if j==1:
            ActionChains(driver).move_to_element(driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/ul/li[{}]/div/div[3]'.format(i))).perform()
            time.sleep(0.5)
            xiangqing_url = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/ul/li[{}]/div/div[1]/h3/a'.format(i)).get_attribute('href')
            # print(xiangqing_url)
        else:
            ActionChains(driver).move_to_element(
                driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/ul/li[{}]/div/div[3]'.format(i))).perform()
            time.sleep(0.5)
            xiangqing_url = driver.find_element_by_xpath('//*[@id="main"]/div/div[2]/ul/li[{}]/div/div[1]/h3/a'.format(i)).get_attribute('href')

        driver.get(xiangqing_url)
        time.sleep(1)
        driver.find_element_by_xpath('//*[@id="main"]/div[1]/div/div/div[3]/div[1]/a').click()
        time.sleep(0.5)
        print(driver.current_url)
        if 'https://www.zhipin.com/job_detail' in str(driver.current_url):
            driver.back()
            time.sleep(1.5)
        else:
            time.sleep(1.5)
            driver.back()
            time.sleep(2)
            driver.back()
            time.sleep(2)