下午的朋友们
从上一个问题开始,
我已经创建了一个小的python应用程序,将为我从网站上抓取招聘信息。
我将它们保存在一个小的sql数据库中。
列为Job_id / job_title / job company / job_salary / job_location / job_post_date
我计划每天运行一次脚本,希望忽略重复的条目。
哪种类型的查询可以检查2列,即标题和公司,以确保它不会再次插入,发布的日期将始终每天递增1
Python代码
import mysql.connector
import requests
from bs4 import BeautifulSoup
my_db = mysql.connector.connect(
    host="192.168.1.1",
    user='job_user',
    password='job1',
    database='job_db'
)
my_cursor = my_db.cursor()
radius = "10"
val1 = input("Enter Job: ")
val2 = input("Enter Location: ")
val3 = input("Enter Radius default:(10): ")
url = "https://www.indeed.co.uk/jobs?q={}&l={}&radius={}".format(val1, val2, val3)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
result1 = soup.find_all(class_="jobsearch-SerpJobCard")
for results in result1:
    job = results.find('a', attrs={'data-tn-element': 'jobTitle'})
    company = results.find('span', attrs={'class': 'company'})
    location = results.find('span', attrs={'class': 'location accessible-contrast-color-location'})
    salary = results.find('span', attrs={'class': 'salaryText'})
    date_pos = results.find('span', attrs={'class': 'date'})
    i1 = job.text.strip()
    i2 = company.text.strip()
    if location is not None:
        i3 = location.text.strip()
    else:
        i3 = "N/A"
    if salary is not None:
        i4 = salary.text.strip()
    else:
        i4 = "N/A"
    i5 = date_pos.text.strip()[:1]
    print(i1)
    print(i2)
    print(i3)
    print(i4)
    print("\n")
    sql = "INSERT INTO job_tbl (job_title, job_company, job_salary, job_location, job_posted) \
    VALUES (%s, %s, %s, %s, %s)"
    val = (i1, i2, i3, i4, i5)
    my_cursor.execute(sql, val)
my_db.commit()SQL查询
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
| job_id | job_title                                                         | job_company                   | job_salary     | job_location               | job_posted |
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
|      1 | IT Technician                                                     | Strathallan School            | N/A            | £19,000 - £23,000 a year   | 3          |
|      2 | English into Romanian IT/Technical Translator (relocation to...   | Alpha CRC Ltd.                | N/A            | £22,000 - £25,000 a year   | 7          |
|      3 | IT/Trainee IT Support Analyst                                     | CJ Lang & Son Limited         | Dundee DD4 8JU | N/A                        | 3          |
|      4 | IT Technical Support Apprentice                                   | GP Strategies Training Ltd    | Dundee         | £10,000 - £12,000 a year   | 1          |
|      5 | IT Operations Manager - IRC84524                                  | Scottish Government           | Dundee DD1     | £48,930 - £61,006 a year   | 3          |
|      6 | Temporary IT Buyer                                                | brightsolid                   | Dundee         | N/A                        | 7          |
|      7 | IT Site Support Analyst                                           | Thermo Fisher Scientific      | Perth          | N/A                        | 6          |
|      8 | Network and System Administrator                                  | Solutions Driven              | Forfar         | £30,000 - £35,000 a year   | 3          |
|      9 | IT Service Desk Team Leader                                       | Cross Resourcing              | Dundee         | N/A                        | 3          |
|     10 | Senior Network Engineer                                           | Raytheon Intelligence & Space | Glenrothes     | N/A                        | 3          |
|     11 | Solutions Architect                                               | NCR                           | Dundee         | N/A                        | 3          |
|     12 | Technical Support Specialist                                      | The Army                      | N/A            | £15,985 - £20,400 a year   | 3          |
|     13 | Pre-Sales Solutions Architect – 12 Month Graduate Internship...   | DELL                          | N/A            | N/A                        | 3          |
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
13 rows in set (0.002 sec)如果我再次运行相同的应用程序,它将添加相同的结果,我想要的是匹配标题和公司,并检查它是否已经添加
发布于 2020-11-13 23:52:18
通常,您会对该列的元组设置一个唯一约束,并在插入时使用update ... on duplicate key,这样就不会插入重复的列,而是更新当前行的日期。
所以就像这样:
create table mytable (
    id int primary key auto_increment,
    title       varchar(50),   -- ajust the size as needed
    company     varchar(50),
    salary      int,
    location    varchar(50),
    post_date   datetime 
        default current_timestamp,  -- not mandatory, but maybe helpful?
    unique (title, company)
);然后:
insert into mytable (title, company, salary, location)
values (?, ?, ?, ?)
on duplicate key update post_date = current_timestamp;https://stackoverflow.com/questions/64823407
复制相似问题