現(xiàn)在很多的app都很喜歡在微信或者支付寶的小程序內(nèi)做開發(fā),畢竟比較方便、安全、有流量、不需要再次下載app,好多人會(huì)因?yàn)榧尤肽阕屗螺dapp他會(huì)扭頭就走不用你的app,畢竟做類似產(chǎn)品的不是你一家。
之前做過很多微信小程序的爬蟲任務(wù),今天做下記錄,防止很久不用后就會(huì)忘記,微信小程序分為兩大類:
1、是不需要登錄的(這種的話不做分析,畢竟沒什么反爬)
2、需要登錄的
2.1 登錄一次之后token永久有效
2.2 登錄一次token幾分鐘內(nèi)到幾小時(shí)內(nèi)失效
2.2.1 登錄后一段時(shí)間后token時(shí)候需要再次調(diào)用微信內(nèi)部方法生成code去換取token(本次主要做的)
2.2.2 跟2.2.1類似,然后又加了一道校驗(yàn),比如圖片驗(yàn)證碼,這個(gè)類似于微信公眾號(hào)的茅臺(tái)預(yù)約那種(本次不做分析)
微信小程序的登錄其實(shí)跟其他的web登錄不太一樣,一般的web登錄或者是app登錄基本上就是用戶名+密碼+驗(yàn)證碼(圖片或者短信)就可以,微信的邏輯是假如你需要登錄的話需要獲得用戶的授權(quán),之后調(diào)用微信的內(nèi)部方法生成一個(gè)code,code只能用一次之后就實(shí)效,微信解釋這個(gè)code有效期是5分鐘左右。
這里是具體流程:https://developers.weixin.qq.com/community/develop/doc/000c2424654c40bd9c960e71e5b009?highLine=code
之前爬取過的一個(gè)小程序他的反爬是token有效期一個(gè)小時(shí),然后單次token可用大概100次左右,當(dāng)單個(gè)token使用次數(shù)或者單小時(shí)內(nèi)使用次數(shù)超過100次就直接封號(hào)處理,24小時(shí)內(nèi)也有頻率控制,所以就需要我每小時(shí)一次每小時(shí)一次的去獲取token,當(dāng)然,因?yàn)槲沂莻€(gè)程序猿,所以我不能每小時(shí)手動(dòng)的去獲取這個(gè)token,比較這不是我們的風(fēng)格。
這里需要的是python+fiddler+appium+模擬器,大致的思路是通過appium去操控模擬器模擬點(diǎn)擊微信的小程序,定期的去做點(diǎn)擊,然后fiddler去從請(qǐng)求的頭部信息中獲取到token,之后寫到本地文件中,然后python程序定時(shí)的去判斷這個(gè)本地文件是否進(jìn)行了更新,更新了的話通過正則來獲取到token_list之后去最后一個(gè),因?yàn)橛锌赡苁钱?dāng)前保存的token已經(jīng)失效了,小程序還會(huì)再次去拿這個(gè)token嘗試請(qǐng)求一下,假如失效了會(huì)調(diào)用微信的內(nèi)部方法生成code來?yè)Q取token,我這里的爬蟲主代碼是運(yùn)行在服務(wù)器的,所有又增加了Redis來存儲(chǔ)token。
一、微信模擬點(diǎn)擊
微信按照需求條件時(shí)間頻率模擬點(diǎn)擊、滑動(dòng)、退出等操作,以下的ding_talk的send_msg是增加的釘釘發(fā)送消息,此處不再添加,有需求的可以自己查看釘釘機(jī)器人文檔或者依據(jù)自己的需求調(diào)整自己的消息提醒。
weixin.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
|
import time import logging from appium import webdriver from ding_talk import send_msg from handle_file import EnToken from conf.dbr import RedisClient from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from config import * LOG_FORMAT = "%(asctime)s - %(levelname)s - line:%(lineno)s - msg:%(message)s" logging.basicConfig(level = logging.INFO, format = LOG_FORMAT) # logging.FileHandler(filename='app.log', encoding='utf-8') # 微信獲取en token class WeChat( object ): def __init__( self ): """ 初始化 """ # 驅(qū)動(dòng)配置 self .desired_caps = { 'platformName' : PLATFORM, 'deviceName' : DEVICE_NAME, 'appPackage' : APP_PACKAGE, 'appActivity' : APP_ACTIVITY, 'noReset' : True } self .driver = webdriver.Remote(DRIVER_SERVER, self .desired_caps) self .wait = WebDriverWait( self .driver, TIMEOUT) self .hours_en = 60 * 60 * 1.1 # en控制1.1小時(shí)模擬點(diǎn)擊一次 self .date_start_en = time.time() # en開始時(shí)間 self .date_end_en = 0 # en超過此時(shí)間后再次運(yùn)行 # self.date_end_en = self.date_start_en + self.hours_en # en超過此時(shí)間后再次運(yùn)行 self .week = 60 * 60 * 24 * 7 # 按照周的頻率對(duì)xd進(jìn)行token更新 self .week_start_xd = time.time() # xd的開始時(shí)間 self .week_end_xd = 0 # 根據(jù)周控制頻率控制再次開啟時(shí)間 self .week_start_xiu = time.time() # xd的開始時(shí)間 self .week_end_xiu = 0 # 根據(jù)周控制頻率控制再次開啟時(shí)間 def login( self ): """ 登錄微信 :return: """ # 登錄按鈕 a = time.time() try : login = self .wait.until(EC.presence_of_element_located((By. ID , 'com.tencent.mm:id/f34' ))) login.click() except Exception as e: # print(e) logging.info(f 'failed login {e}' ) b = time.time() - a # print('點(diǎn)擊登錄', b) logging.info(f 'click login,use time {b}' ) # 手機(jī)輸入 try : phone = self .wait.until(EC.presence_of_element_located((By. ID , 'com.tencent.mm:id/bem' ))) phone.set_text(USERNAME) except Exception as e: # print(e) logging.info(f 'something wrong{e}' ) c = time.time() - a - b # print('手機(jī)號(hào)輸入', c) logging.info(f 'send keys phone nums use time {c}' ) # 下一步 try : next = self .wait.until(EC.element_to_be_clickable((By. ID , 'com.tencent.mm:id/dw1' ))) next .click() except Exception as e: logging.info(f 'something wrong{e}' ) d = time.time() - a - b - c logging.info(f 'click next bottom use time {c}' ) # 密碼 password = self .wait.until(EC.presence_of_element_located((By.XPATH, '//*[@text="請(qǐng)?zhí)顚懳⑿琶艽a"]' ))) password.set_text(PASSWORD) e = time.time() - a - b - c - d logging.info(f 'send keys password use time {e}' ) # 提交 # submit = self.wait.until(EC.element_to_be_clickable((By.ID, 'com.tencent.mm:id/dw1'))) submit = self .wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@text="登錄"]' ))) submit.click() f = time.time() - a - b - c - d - e logging.info(f 'commit password use time {f}' ) def run( self ): """ 入口 :return: """ # 滑動(dòng)之后等待出現(xiàn)en小程序 self .slide_down() time.sleep( 10 ) # 點(diǎn)擊進(jìn)入en小程序 self .touch_en() if self .week_end_xd < self .week_start_xd: self .week_start_xd = time.time() self .week_end_xd = self .week_start_xd + self .week print ( 'xd點(diǎn)擊' ) self .touch_xd() elif self .week_end_xiu < self .week_start_xiu: self .week_end_xiu = time.time() + self .week print ( 'xiu' ) self .touch_xiu() time.sleep( 10 ) # 退出小程序 self .driver_closed() print ( 'driver closed' ) emt = EnToken() token_res = emt.token_2_redis() if not token_res: print ( '需要發(fā)送失敗消息' ) return False return True def slide_down( self ): """ 滑動(dòng)微信屏幕之后點(diǎn)擊小程序 :return: """ window_size_phone = self .driver.get_window_size() # print(window_size_phone) phone_width = window_size_phone.get( 'width' ) phone_height = window_size_phone.get( 'height' ) # print(phone_width, phone_height) time.sleep( 15 ) x1 = phone_width * 0.5 y1 = phone_height * 0.7 y2 = phone_height * 0.26 # print('準(zhǔn)備向下滑動(dòng)') logging.info(f 'prepare slide down' ) a = time.time() self .driver.swipe(x1, y2, x1, y1, 2050 ) # print('向下滑動(dòng)完成', time.time() - a) logging.info(f 'slide down success use time {time.time() - a}' ) def touch_en( self ): """ 每次進(jìn)來之后都需要判斷是否到了時(shí)間,若時(shí)間到了之后才可執(zhí)行點(diǎn)擊操作 :param : en 代表en; xd 代表xd; xiu 代表xiu. :return: None 無返回值 """ print ( self .date_end_en, time.time()) if self .date_end_en < time.time(): # 此時(shí)的時(shí)候已經(jīng)超時(shí),需要再次從新進(jìn)行點(diǎn)擊 print ( 'en模擬點(diǎn)擊' ) # 從新定義開始結(jié)束時(shí)間 print ( self .date_end_en, time.time()) self .date_end_en = time.time() + self .hours_en # 再次更改end time為n小時(shí)后 print ( self .date_end_en, time.time()) try : # print('id定位en') en_app = self .wait.until( EC.presence_of_element_located((By.XPATH, f "//android.widget.TextView[@text='textname…']" ))) # en_master = self.wait.until(EC.presence_of_element_located((By.ID, 'com.tencent.mm:id/hu'))) # en_master = self.wait.until( # EC.presence_of_element_located((By.XPATH, "//android.widget.TextView[@text='textname']"))) en_app.click() logging.info(f 'located by app_name en' ) except Exception as error: # print(e, 'id定位失敗') logging.info(f 'failed located by id:{error}' ) time.sleep( 20 ) # 關(guān)閉小程序按鈕點(diǎn)擊 print ( 'close the en app' ) close_button = self .wait.until(EC.presence_of_element_located((By.XPATH, f "//android.widget.FrameLayout[2]/android.widget.ImageButton" ))) close_button.click() print ( '點(diǎn)擊了關(guān)閉小程序' ) def touch_xd( self ): """ 需要考慮是否已經(jīng)登錄狀態(tài)還是需要再次登錄 :return: """ # 點(diǎn)擊后進(jìn)入到小程序 logging.info( 'click app xd' ) xd_app = self .wait.until(EC.presence_of_element_located((By.XPATH, "//android.widget.TextView[@text='textname']" ))) xd_app.click() time.sleep( 20 ) # 頁(yè)面出現(xiàn)需要獲取到你的定位的時(shí)候需要點(diǎn)擊允許 print ( '點(diǎn)擊確認(rèn)獲取當(dāng)前位置' ) self .driver.tap([( 510 , 679 )], 500 ) # 點(diǎn)擊進(jìn)入到個(gè)人中心 time.sleep( 10 ) logging.info( 'click personal xd' ) self .driver.tap([( 540 , 1154 )], 500 ) # 點(diǎn)擊快速登錄進(jìn)行登錄 time.sleep( 10 ) logging.info( 'click login xd' ) self .driver.tap([( 270 , 1030 )], 500 ) # 點(diǎn)擊同意獲取頭像信息 time.sleep( 10 ) logging.info( '同意獲取頭像等相關(guān)信息' ) self .driver.tap([( 510 , 775 )], 500 ) time.sleep( 20 ) # 關(guān)閉小程序按鈕點(diǎn)擊 print ( 'close the guaishou app' ) close_button = self .wait.until( EC.presence_of_element_located((By.XPATH, f "//android.widget.FrameLayout[2]/android.widget.ImageButton" ))) close_button.click() print ( '結(jié)束' ) time.sleep( 30 ) def touch_xiu( self ): """ xiu模擬點(diǎn)擊,需要考慮是否需要登錄狀態(tài)下 :return: """ # 點(diǎn)擊后進(jìn)入到小程序 logging.info( 'click app xiu' ) xiu_app = self .wait.until(EC.presence_of_element_located((By.XPATH, "//android.widget.TextView[@text='xiu']" ))) xiu_app.click() # 若頁(yè)面顯示需要確認(rèn)獲取當(dāng)前位置的話需要點(diǎn)擊確認(rèn) logging.info( 'click confirm xiu' ) time.sleep( 15 ) confirm_loc = self .wait.until( EC.presence_of_element_located((By.XPATH, "//android.widget.Button[@text='確定']" ))) confirm_loc.click() # 點(diǎn)擊個(gè)人中心 logging.info( 'click personal xiu' ) time.sleep( 5 ) try : personal = self .wait.until( EC.presence_of_element_located((By.XPATH, "//android.view.View[@content-desc='個(gè)人中心']" ))) personal.click() except Exception as e: print (e) # 點(diǎn)擊快速登錄進(jìn)行登錄 logging.info( 'click login xiu' ) time.sleep( 5 ) try : login = self .wait.until(EC.presence_of_element_located((By.XPATH, "//android.view.View[@content-desc='立即登錄']" ))) login.click() except Exception as e: print ( 'xiu已經(jīng)登錄,不需要再次點(diǎn)擊確認(rèn)登錄' ) time.sleep( 30 ) def driver_closed( self ): self .driver.quit() if __name__ = = '__main__' : conn_r = RedisClient(db = 10 ) count_1 = 0 # start_time = time.time() # end_time = time.time() + 60 * 60 * 1 we_chat = WeChat() try : while 1 : if conn_r.r_size() < 3 : # 監(jiān)控Redis情況,當(dāng)Redis中無數(shù)據(jù)后開始運(yùn)行一次 res = we_chat.run() # 操作微信做操作點(diǎn)擊en小程序生成token if not res: count_1 + = 1 if count_1 > 10 : break # 當(dāng)失敗十次之后跳出循環(huán) # 此處增加限制,每次生成token之后一個(gè)小時(shí)后才會(huì)產(chǎn)生新的token,防止一個(gè)token多次使用導(dǎo)致被封號(hào) time.sleep( 60 * 60 ) else : time.sleep( 60 * 60 ) # 當(dāng)有數(shù)據(jù)的時(shí)候等待五分鐘 we_chat.driver = webdriver.Remote(DRIVER_SERVER, we_chat.desired_caps) we_chat.wait = WebDriverWait(we_chat.driver, TIMEOUT) except Exception as e: msg = f '業(yè)務(wù)報(bào)警:' \ f '\n en獲取token出現(xiàn)問題' \ f '\n{e}' send_msg(msg) # print(e, type(e)) |
config.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
import os # 平臺(tái) PLATFORM = 'Android' # 設(shè)備名稱 通過 adb devices -l 獲取 DEVICE_NAME = 'MI_9' # APP路徑 APP = os.path.abspath( '.' ) + '/weixin.apk' # APP包名 APP_PACKAGE = 'com.tencent.mm' # 入口類名 APP_ACTIVITY = '.ui.LauncherUI' # Appium地址 DRIVER_SERVER = 'http://localhost:4723/wd/hub' # 等待元素加載時(shí)間 TIMEOUT = 10 # 微信手機(jī)號(hào)密碼 USERNAME = 'wechatname' PASSWORD = 'wechatpwd' # 滑動(dòng)點(diǎn) FLICK_START_X = 300 FLICK_START_Y = 300 FLICK_DISTANCE = 700 |
以下是處理文件,將token獲取到后放到Redis中,或者你可以依照你的想法調(diào)整
handle_file.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
|
import re import os import logging from conf.dbr import RedisClient LOG_FORMAT = "%(asctime)s - %(levelname)s - line:%(lineno)s - msg:%(message)s" logging.basicConfig(level = logging.DEBUG, format = LOG_FORMAT) # 處理en token到Redis class EnToken( object ): def __init__( self ): # self.token_path = 'F:\\en.txt' # self.token_path = 'F:\\xiu.txt' # self.token_path = 'F:\\xd.txt' self .conn = RedisClient(db = 10 ) # 解析日維度價(jià)格 self .conn_en = RedisClient(db = 9 ) # 解析當(dāng)前經(jīng)緯度范圍內(nèi)店鋪點(diǎn)位 # 處理en token文件,從文件中讀取到token之后只取最后一個(gè),取到之后刪除本地文件 @staticmethod def handle_en_txt(): token_dict = {} path_token_list = [ ( 'en' , '>(e.*?)-->' ), ( 'xd' , 'headers-->(.*?)-->' ), ( 'xiu' , r '>(\d+)-->' ), ] for i in path_token_list: token_path = f 'F:\\{i[0]}.txt' token_re = i[ - 1 ] if os.path.exists(token_path): with open (token_path, mode = 'r' , encoding = 'utf-8' ) as f: token_str = f.read() # print(token_str) # token_list = re.findall('>(e.*?)-->', token_str) # token_list = re.findall('>(Q.*?)-->', token_str) # token_list = re.findall('>(\d+)-->', token_str) token_list = re.findall(token_re, token_str) print (token_list) if token_list: token = token_list[ - 1 ] print (token) token_dict[i[ 0 ]] = token os.remove(token_path) # 刪除掉 # return token else : # print('file_en_dont_exit') logging.info( 'file_en_dont_exit' ) return token_dict # 將token放到Redis中 def token_2_redis( self ): """ 假如token存在的話 則根據(jù)token的最后幾位做key放入到Redis中 :return: """ token_dict = self .handle_en_txt() print (token_dict) if token_dict: for token_items in token_dict.items(): token_key = token_items[ 0 ] token_val = token_items[ - 1 ] self .conn. set (token_key, token_val, over_time = None ) # self.conn.set(token_key, token, over_time=60*65) # 設(shè)置有效時(shí)長(zhǎng)65分鐘之后失效 # self.conn_en.set(token_key, token, over_time=60*65) # 設(shè)置有效時(shí)長(zhǎng)65分鐘之后失效 logging.info(f 'token success {token_key,token_val}' ) return True else : logging.info( 'token dons"t exist' ) self .conn.close() self .conn_en.close() if __name__ = = '__main__' : en = EnToken() en.token_2_redis() |
二、配置fiddler獲取請(qǐng)求頭的信息寫到本地文件
修改fiddlerscript添加以下內(nèi)容,在做數(shù)據(jù)請(qǐng)求的以下增加下面內(nèi)容
fiddler
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
if (oSession.oRequest[ "Host" ] = = "這里是請(qǐng)求的host" ) { var filename = "F:\en.txt" ; var curDate = new Date(); var logContent = 'en' + "[" + curDate.toLocaleString() + "]" ; var sw : System.IO.StreamWriter; if (System.IO. File .Exists(filename)){ sw = System.IO. File .AppendText(filename); sw.Write(logContent + 'oSession.oRequest.headers-->' + oSession.oRequest.headers[ 'x-wx-token' ] + '-->' + oSession.oRequest.headers + '\n' ); / / sw.Write( "Request header:" + "\n" + oSession.oRequest.headers); / / sw.Write(wap_s + '\n\n' ) } else { sw = System.IO. File .CreateText(filename); sw.Write(logContent + 'oSession.oRequest.headers-->' + oSession.oRequest.headers[ 'x-wx-token' ] + '-->' + '\n' ); / / sw.Write( "Request header:" + "\n" + oSession.oRequest.headers); / / sw.Write(wap_s + '\n\n' ) } sw.Close(); sw.Dispose(); } |
三、主爬蟲業(yè)務(wù)代碼
此處按照自己的需求邏輯調(diào)整自己的業(yè)務(wù)代碼。
到此這篇關(guān)于關(guān)于微信小程序爬蟲token自動(dòng)更新問題的文章就介紹到這了,更多相關(guān)小程序爬蟲token自動(dòng)更新內(nèi)容請(qǐng)搜索服務(wù)器之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持服務(wù)器之家!
原文鏈接:https://www.cnblogs.com/blackball9/p/15337868.html