需求場景:
老大讓我利用爬蟲爬取的數據寫到或更新到mysql數據庫中,百度了兩種方法
1 是使用pymysql連接mysql,通過操作原生的sql語句進行增刪改查數據;
2 是使用sqlalchemy連接mysql,通過ORM模型建表并操作數據庫,不需要寫原生的sql語句,相對簡單些;
以下就是本次使用sqlalchemy的經驗之談。
實現流程:連接數據庫》通過模型類創建表》建立會話》執行創建表語句》通過會話進行增刪改查
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
|
from sqlalchemy import exists, Column, Integer, String, ForeignKey, exists from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker # 創建的數據庫引擎 engine = create_engine( "mysql+pymysql://user:pwd@ip/數據庫名?charset=utf8" ) #創建session類型 DBSession = sessionmaker(bind = engine) # 實例化官宣模型 - Base 就是 ORM 模型 Base = declarative_base() # 創建服務單表 class ServiceOrder(Base): __tablename__ = 'serviceOrderTable' id = Column(Integer, primary_key = True , autoincrement = True ) serviceOrderId = Column(String( 32 ), nullable = False , index = True , comment = '服務單ID' ) serviceDesc = Column(String( 268 ), comment = '服務說明' ) oneLevelName = Column(String( 32 ), comment = 'C類別' ) twoLevelName = Column(String( 32 ), comment = 'T子類' ) threeLevelName = Column(String( 32 ), comment = 'I項目' ) fourLevelName = Column(String( 32 ), comment = 'S子項' ) transferTimes = Column(String( 32 ), comment = '轉派次數' ) overDueStatus = Column(String( 32 ), comment = '過期狀態' ) serviceTimeLimit = Column(String( 32 ), comment = '服務時限' ) serTimeLimitTypeName = Column(String( 16 ), comment = '時限類型' ) # 一對多: # serviceWorkOrder = relationship("ServiceWorkOrder", backref="serviceorder") # 多對一:多個服務工單可以屬于服務單 class ServiceWorkOrder(Base): __tablename__ = 'serviceWorkOrderTable' id = Column(Integer, primary_key = True , autoincrement = True ) serviceWorkOrderId = Column(String( 32 ), nullable = False , index = True , comment = '服務工單ID' ) workOrderName = Column(String( 268 ), comment = '工單名稱' ) fromId = Column(String( 32 ), comment = '服務單ID' ) createUserSectionName = Column(String( 32 ), comment = '創建人室' ) createUserName = Column(String( 32 ), comment = '創建人' ) handlerName = Column(String( 32 ), comment = '處理人' ) statusName = Column(String( 32 ), comment = '工單狀態' ) createTime = Column(String( 32 ), comment = '創建時間' ) # “多”的一方的book表是通過外鍵關聯到user表的: # serviceOrder_id = Column(Integer, ForeignKey('serviceOrderTable.id')) # 創建數據庫 如果數據庫已存在 則不會創建 會根據庫名直接連接已有的庫 def init_db(): Base.metadata.create_all(engine) def drop_db(): Base.metadata.drop_all(engine) def insert_update(): # all_needed_data_lists 是需要插入數據庫的數據 格式[{key: value, ... }, { }, { }...] for item in all_needed_data_lists: ServiceOrderRow = ServiceOrder(serviceOrderId = item[ 'serviceOrderId' ], serviceDesc = item[ 'serviceDesc' ], oneLevelName = item[ 'oneLevelName' ], twoLevelName = item[ 'twoLevelName' ], threeLevelName = item[ 'threeLevelName' ], fourLevelName = item[ 'fourLevelName' ], transferTimes = item[ 'transferTimes' ], overDueStatus = item[ 'overDueStatus' ], serviceTimeLimit = item[ 'serviceTimeLimit' ], serTimeLimitTypeName = item[ 'serTimeLimitTypeName' ], ) try : # 利用exists判斷目標對象是否存在,返回True或Faults it_exists = session.query( exists().where(ServiceOrder.serviceOrderId = = item[ 'serviceOrderId' ] ) ).scalar() except Exception as e: self .log.error(e) break try : # 如果不存在,進行新增;存在的話就更新現存的數據 if not it_exists: session.add(ServiceOrderRow) else : session.query(ServiceOrder). filter (ServiceOrder.serviceOrderId = = item[ 'serviceOrderId' ])\ .update(item) except Exception as e: self .log.error(e) break try : session.commit() self .log.info( '數據更新成功!' ) except : session.rollback() self .log.info( '數據更新失??!' ) if __name__ = = "__main__" : # 創建數據庫 如果數據庫已存在 則不會創建 會根據庫名直接連接已有的庫 init_db() # 創建session對象,進行增刪改查: session = DBSession() # 利用session 增 改數據 記得提交 insert_update() |
以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支持服務器之家。
原文鏈接:https://www.cnblogs.com/We612/p/12105135.html