卡一卡二卡三精品app下载,亚洲精品视频在线观看视频,黄色小视频在线免费看

一、前言

春節(jié)檔賀歲片《你好，李煥英》，于2月23日最新數(shù)據(jù)出來后，票房已經(jīng)突破42億，并且趕超其他賀歲片，成為2021的一匹黑馬。

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

從小品演員再到導(dǎo)演，賈玲處女作《你好李煥英》，為何能這么火？接下來榮仔帶你運用Python借助電影網(wǎng)站從各個角度剖析這部電影喜得高票房的原因。

二、影評爬取并詞云分析

毫無疑問, 中國的電影評論伴隨著整個社會文化語境的變遷以及不同場域和載體的更迭正發(fā)生著明顯的變化。在紙質(zhì)類影評統(tǒng)御了中國電影評論一百年后，又分別出現(xiàn)了電視影評、網(wǎng)絡(luò)影評、新媒體影評等不同業(yè)態(tài)相結(jié)合的批評話語形式。電影評論的生產(chǎn)與傳播確實已經(jīng)進(jìn)入一個民主多元化的時代。

電影評論的目的在于分析、鑒定和評價蘊含在銀幕中的審美價值、認(rèn)識價值、社會意義、鏡頭語等方面，達(dá)到拍攝影片的目的，解釋影片中所表達(dá)的主題，既能通過分析影片的成敗得失，幫助導(dǎo)演開闊視野,提高創(chuàng)作水平，以促進(jìn)電影藝術(shù)的繁榮和發(fā)展;又能通過分析和評價,影響觀眾對影片的理解和鑒賞,提高觀眾的欣賞水平，從而間接促進(jìn)電影藝術(shù)的發(fā)展。

2.1 網(wǎng)站選取

python爬蟲實戰(zhàn)――爬取豆瓣影評數(shù)據(jù)

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

2.2 爬取思路

爬取豆瓣影評數(shù)據(jù)步驟：1、獲取網(wǎng)頁請求
2、解析獲取的網(wǎng)頁
3、提取影評數(shù)據(jù)
4、保存文件
5、詞云分析

2.2.1 獲取網(wǎng)頁請求

該實例選擇采用selenium庫進(jìn)行編碼。

導(dǎo)庫

# 導(dǎo)入庫
from selenium import webdriver

瀏覽器驅(qū)動

# 瀏覽驅(qū)動器路徑
chromedriver = "E:/software/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(chromedriver)

打開網(wǎng)頁

driver.get("此處填寫網(wǎng)址")

2.2.2解析獲取的網(wǎng)頁

F12鍵進(jìn)入開發(fā)者工具，并確定數(shù)據(jù)提取位置，copy其中的XPath路徑

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

2.2.3提取影評數(shù)據(jù)

采用XPath進(jìn)行影評數(shù)據(jù)提取

driver.find_element_by_xpath("//*[@id="comments"]/div[{}]/div[2]/p/span")

2.2.4保存文件

# 新建文件夾及文件
basePathDirectory = "Hudong_Coding"
if not os.path.exists(basePathDirectory):
        os.makedirs(basePathDirectory)
baiduFile = os.path.join(basePathDirectory, "hudongSpider.txt")
# 若文件不存在則新建，若存在則追加寫入
if not os.path.exists(baiduFile):
        info = codecs.open(baiduFile, "w", "utf-8")
else:
        info = codecs.open(baiduFile, "a", "utf-8")

txt文件寫入

info.writelines(elem.text + "
")

2.2.5 詞云分析

詞云分析用到了jieba庫和worldcloud庫。

值得注意的是，下圖顯示了文字的選取路徑方法。

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

2.3 代碼總觀

2.3.1 爬取代碼

# -*- coding: utf-8 -*-
# !/usr/bin/env python
import os
import codecs
from selenium import webdriver
 
# 獲取摘要信息
def getFilmReview():
    try:
        # 新建文件夾及文件
        basePathDirectory = "DouBan_FilmReview"
        if not os.path.exists(basePathDirectory):
            os.makedirs(basePathDirectory)
        baiduFile = os.path.join(basePathDirectory, "DouBan_FilmReviews.txt")
        # 若文件不存在則新建，若存在則追加寫入
        if not os.path.exists(baiduFile):
            info = codecs.open(baiduFile, "w", "utf-8")
        else:
            info = codecs.open(baiduFile, "a", "utf-8")
 
        # 瀏覽驅(qū)動器路徑
        chromedriver = "E:/software/chromedriver_win32/chromedriver.exe"
        os.environ["webdriver.chrome.driver"] = chromedriver
        driver = webdriver.Chrome(chromedriver)
        # 打開網(wǎng)頁
        for k in range(15000):  # 大約有15000頁
            k = k + 1
            g = 2 * k
            driver.get("https://movie.douban.com/subject/34841067/comments?start={}".format(g))
            try:
                # 自動搜索
                for i in range(21):
                    elem = driver.find_element_by_xpath("//*[@id="comments"]/div[{}]/div[2]/p/span".format(i+1))
                    print(elem.text)
                    info.writelines(elem.text + "
")
            except:
                pass
 
    except Exception as e:
        print("Error:", e)
 
    finally:
        print("
")
        driver.close()
 
# 主函數(shù)
def main():
    print("開始爬取")
    getFilmReview()
    print("結(jié)束爬取")
 
if __name__ == "__main__":
    main()

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

2.3.2 詞云分析代碼

# -*- coding: utf-8 -*-
# !/usr/bin/env python
 
import jieba                #中文分詞
import wordcloud            #繪制詞云
 
# 顯示數(shù)據(jù)
 
f = open("E:/software/PythonProject/DouBan_FilmReview/DouBan_FilmReviews.txt", encoding="utf-8")
 
txt = f.read()
txt_list = jieba.lcut(txt)
# print(txt_list)
string = " ".join((txt_list))
print(string)
 
# 很據(jù)得到的彈幕數(shù)據(jù)繪制詞云圖
# mk = imageio.imread(r"圖片路徑")
 
w = wordcloud.WordCloud(width=1000,
                        height=700,
                        background_color="white",
                        font_path="C:/Windows/Fonts/simsun.ttc",
                        #mask=mk,
                        scale=15,
                        stopwords={" "},
                        contour_width=5,
                        contour_color="red"
                        )
 
w.generate(string)
w.to_file("DouBan_FilmReviews.png")

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

三、實時票房搜集

3.1 網(wǎng)站選擇

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

3.2 代碼編寫

# -*- coding: utf-8 -*-
# !/usr/bin/env python
import os
import time
import datetime
import requests
 
class PF(object):
    def __init__(self):
        self.url = "https://piaofang.maoyan.com/dashboard-ajax?orderType=0&uuid=173d6dd20a2c8-0559692f1032d2-393e5b09-1fa400-173d6dd20a2c8&riskLevel=71&optimusCode=10"
        self.headers = {
            "Referer": "https://piaofang.maoyan.com/dashboard",
            "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36",
        }
 
    def main(self):
        while True:
            # 需在dos命令下運行此文件，才能清屏
            os.system("cls")
            result_json = self.get_parse()
            if not result_json:
                break
            results = self.parse(result_json)
            # 獲取時間
            calendar = result_json["calendar"]["serverTimestamp"]
            t = calendar.split(".")[0].split("T")
            t = t[0] + " " + (datetime.datetime.strptime(t[1], "%H:%M:%S") + datetime.timedelta(hours=8)).strftime("%H:%M:%S")
            print("北京時間:", t)
            x_line = "-" * 155
            # 總票房
            total_box = result_json["movieList"]["data"]["nationBoxInfo"]["nationBoxSplitUnit"]["num"]
            # 總票房單位
            total_box_unit = result_json["movieList"]["data"]["nationBoxInfo"]["nationBoxSplitUnit"]["unit"]
            print(f"今日總票房: {total_box} {total_box_unit}", end=f"
{x_line}
")
            print("電影名稱".ljust(14), "綜合票房".ljust(11), "票房占比".ljust(13), "場均上座率".ljust(11), "場均人次".ljust(11),"排片場次".ljust(12),"排片占比".ljust(12), "累積總票房".ljust(11), "上映天數(shù)", sep="	", end=f"
{x_line}
")
            for result in results:
                print(
                    result["movieName"][:10].ljust(9),  # 電影名稱
                    result["boxSplitUnit"][:8].rjust(10),  # 綜合票房
                    result["boxRate"][:8].rjust(13),  # 票房占比
                    result["avgSeatView"][:8].rjust(13),  # 場均上座率
                    result["avgShowView"][:8].rjust(13),  # 場均人次
                    result["showCount"][:8].rjust(13),  # "排片場次"
                    result["showCountRate"][:8].rjust(13),  # 排片占比
                    result["sumBoxDesc"][:8].rjust(13),  # 累積總票房
                    result["releaseInfo"][:8].rjust(13),  # 上映信息
                    sep="	", end="

"
                )
                break
            time.sleep(4)
 
    def get_parse(self):
        try:
            response = requests.get(self.url, headers=self.headers)
            if response.status_code == 200:
                return response.json()
        except requests.ConnectionError as e:
            print("ERROR:", e)
            return None
 
    def parse(self, result_json):
        if result_json:
            movies = result_json["movieList"]["data"]["list"]
            # 場均上座率, 場均人次, 票房占比, 電影名稱,
            # 上映信息（上映天數(shù)）, 排片場次, 排片占比, 綜合票房,累積總票房
            ticks = ["avgSeatView", "avgShowView", "boxRate", "movieName",
                     "releaseInfo", "showCount", "showCountRate", "boxSplitUnit", "sumBoxDesc"]
            for movie in movies:
                self.piaofang = {}
                for tick in ticks:
                    # 數(shù)字和單位分開需要join
                    if tick == "boxSplitUnit":
                        movie[tick] = "".join([str(i) for i in movie[tick].values()])
                    # 多層字典嵌套
                    if tick == "movieName" or tick == "releaseInfo":
                        movie[tick] = movie["movieInfo"][tick]
                    if movie[tick] == "":
                        movie[tick] = "此項數(shù)據(jù)為空"
                    self.piaofang[tick] = str(movie[tick])
                yield self.piaofang
 
 
if __name__ == "__main__":
    while True:
        pf = PF()
        pf.main()

3.3 結(jié)果展示

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

四、劇組照片爬取

4.1 網(wǎng)站選擇

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

4.2 代碼編寫

# -*- coding: utf-8 -*-
# !/usr/bin/env python
import requests
from bs4 import BeautifulSoup
import re
from PIL import Image
 
def get_data(url):
    # 請求網(wǎng)頁
    resp = requests.get(url)
    # headers 參數(shù)確定
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
    }
        # 對于獲取到的 HTML 二進(jìn)制文件進(jìn)行 "utf-8" 轉(zhuǎn)碼成字符串文件
    html = resp.content.decode("utf-8")
    # BeautifulSoup縮小查找范圍
    soup = BeautifulSoup(html, "html.parser")
    # 獲取 <a> 的超鏈接
    for link in soup.find_all("a"):
        a = link.get("href")
        if type(a) == str:
            b = re.findall("(.*?)jpg", a)
            try:
                print(b[0]+"jpg")
                img_urls = b[0] + ".jpg"
                # 保存數(shù)據(jù)
                for img_url in img_urls:
                    # 發(fā)送圖片 URL 請求
                    image = requests.get(img_url, headers=headers).content
                    # 保存數(shù)據(jù)
                    with open(r"E:/IMAGES/" + image, "wb") as img_file:
                        img_file.write(image)
            except:
                pass
        else:
            pass
 
# 爬取目標(biāo)網(wǎng)頁
if __name__ == "__main__":
    get_data("https://www.1905.com/newgallery/hdpic/1495100.shtml")

4.3 效果展示

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

五、總結(jié)

看這部電影開始笑得有多開心，后面哭得就有多傷心，這部電影用孩子的視角，選取了母親在選擇愛情和婚姻期間所作出的選擇，通過對母親的觀察，體會母親所謂的幸福，并不是賈玲認(rèn)為的：嫁給廠長的兒子就能獲得的，這是他們共同的選擇，無論經(jīng)歷過多少次，母親都會義無反顧選擇適合自己的而不是別人認(rèn)為的那種幸福的人生，這也間接告訴我們：我們追求幸福的過程中，要憑借自己的走，而不是要過別人眼中和口中的幸福，畢竟人生的很多選擇只有一次。

到此這篇關(guān)于python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析的文章就介紹到這了,更多相關(guān)python爬取電影票房內(nèi)容請搜索服務(wù)器之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持服務(wù)器之家！

原文鏈接：https://blog.csdn.net/IT_charge/article/details/113979633

一区二区三区在线-一区二区三区亚洲视频-一区二区三区亚洲-一区二区三区午夜-一区二区三区四区在线视频-一区二区三区四区在线免费观看

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

一、前言

二、影評爬取并詞云分析

2.1 網(wǎng)站選取

2.2 爬取思路

2.2.1 獲取網(wǎng)頁請求

2.2.2解析獲取的網(wǎng)頁

2.2.3提取影評數(shù)據(jù)

2.2.4保存文件

2.2.5 詞云分析

2.3 代碼總觀

2.3.1 爬取代碼

2.3.2 詞云分析代碼

三、實時票房搜集

3.1 網(wǎng)站選擇

3.2 代碼編寫

3.3 結(jié)果展示

四、劇組照片爬取

4.1 網(wǎng)站選擇

4.2 代碼編寫

4.3 效果展示

五、總結(jié)

延伸 · 閱讀

在Windows系統(tǒng)上搭建Nginx+Python+MySQL環(huán)境的教程

Python3以GitHub為例來實現(xiàn)模擬登錄和爬取的實例講解

使用NumPy和pandas對CSV文件進(jìn)行寫操作的實例

Python的dict字典結(jié)構(gòu)操作方法學(xué)習(xí)筆記

python 插入Null值數(shù)據(jù)到Postgresql的操作

Python實現(xiàn)ping指定IP的示例

python 列表轉(zhuǎn)為字典的兩個小方法(小結(jié))

python直接訪問私有屬性的簡單方法

PyCharm設(shè)置SSH遠(yuǎn)程調(diào)試的方法

Python安裝圖文教程 Pycharm安裝教程

python是什么意思？python有什么用？

使用Python抓取模板之家的CSS模板

Python 列表(List)操作方法詳解

python爬蟲之你好,李煥英電影票房數(shù)據(jù)分析

一、前言

二、影評爬取并詞云分析

2.1 網(wǎng)站選取

2.2 爬取思路

2.2.1 獲取網(wǎng)頁請求

2.2.2解析獲取的網(wǎng)頁

2.2.3提取影評數(shù)據(jù)

2.2.4保存文件

2.2.5 詞云分析

2.3 代碼總觀

2.3.1 爬取代碼

2.3.2 詞云分析代碼

三、 實時票房搜集

3.1 網(wǎng)站選擇

3.2 代碼編寫

3.3 結(jié)果展示

四、 劇組照片爬取

4.1 網(wǎng)站選擇

4.2 代碼編寫

4.3 效果展示

五、 總結(jié)

延伸 · 閱讀

三、實時票房搜集

四、劇組照片爬取

五、總結(jié)