Get and Analysis the result of Google Trends with Python
Google Trends (Google搜尋趨勢)
是由Google 提供的線上搜尋趨勢服務,可以簡單的看出最近哪些關鍵字
是熱門的。
但大規模分析 Google Trends
是很麻煩不切實際的,且有時候連打開網頁都懶,
那麼我們如何才能有夠有效的使用Google Trends
呢?
Unofficial API for Google Trends
Allows simple interface for automating downloading of reports from Google Trends. Only good until Google changes their backend again :-P. When that happens feel free to contribute!
這是一個非官方支援的API,允許從
Google Trends
下載資料(爬蟲)
Install pytrends
package
!pip3 install pytrends
Connect to Google
Python Data Analysis Library
import pandas as pd
from pytrends.request import TrendReq
pytrend = TrendReq()
Build Payload
設定我們想要搜尋的關鍵字、類別、時間區段、地區以及類型
"""Create the payload for related queries, interest over time anf interest by region"""
TrendReq.build_payload(self, kw_list, cat=0,
timeframe='today 5-y', geo='', gprop='')
- Parameters:
- kw_list:
- keywords to get data for
- Up to five terms in a list (最多五個)
- timeframe: Date to start from
- cat: Category to narrow resulta
- geo: Two letter country abbreviation
- gprop: What Google property to filter to
- kw_list:
kw_list=['tea', 'coffe', 'coke', 'milk', 'water']
# timeframe=today 12-m': one year data
# geo='US': specifying location with U.S.
pytrend.build_payload(kw_list, timeframe='today 12-m', geo='TW')
# gprop=yputube: only want to see Youtube search trends
#pytrend.build_payload(kw_list, timeframe='today 12-m', geo='TW', gprop=youtube)
# cat=71: category
#pytrend.build_payload(kw_list, timeframe='today 12-m', geo='TW', gprop=youtube, cat=71)
Request data (Get results)
- Interest Over Time
- Historical Hourly Interest
- Interest by REgion
- Related Topics
- Related Queries
- Trending Searches
- Top Charts
- Suggestions
Interest Over Time
"""Request data from Google's Interest Over Time section and return a dataframe"""
TrendReq.interest_over_time(self)
- Returns: pandas.Dataframe
interest_over_time_df = pytrend.interest_over_time()
interest_over_time_df.head()
Plot the result
Matplotlib 顯示中文請參考:
#!pip3 install matplotlib seaborn
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
plt.style.use('fivethirtyeight')
# 中文
plt.rcParams['font.sans-serif'] = ['Noto Sans Mono CJK TC', 'sans-serif']
plt.rcParams['axes.unicode_minus'] = False
%matplotlib inline
Make plots of DataFrame
using Matplotlib
DataFrame.plot.line(self, x=None, y=None, **kwargs)
axes = interest_over_time_df.plot.line(
figsize=(15,7),
title='Interest Over Time')
axes.set_xlabel('Date')
axes.set_ylabel('Trends Index')
axes.tick_params(axis='both', which='major', labelsize=13)
Google Keyword Suggestions
Return a list of additional suggested keywords that can be used to refine a trend search.
"""Request data from Google's Keyword Suggestions dropdown and return a dictionary"""
TrendReq.suggestions(self, keyword)
- Parameters:
keyword
:- keyword to get suggestions for
keywords = pytrend.suggestions(keyword='beer')
keywords_df = pd.DataFrame(keywords)
keywords_df.drop(columns='mid') # This column makes no sense
Related Queries
當使用者搜尋某個主題時,他們也會搜尋相關的內容
Return data for the related keywords to a provided keyword shown on Google Trends’ Related Queries section.
"""Request data from Google's Related Queries section and reutrn a dictionary of dataframe
If not top and/or rising related queries are found,
the value for the key "top" and/or "rising" will be None
"""
TrendReq.related_queries(self)
- Returns: dictionary of pandas.DataFrames
pytrend.build_payload(kw_list=['Coronavirus'])
# Related Queries, return a dictionary of dataframe
related_queries = pytrend.related_queries()
related_queries
{'Coronavirus': {'top': query value
0 taiwan coronavirus 100
1 taiwan 94
2 coronavirus update 64
3 coronavirus cases 52
4 coronavirus 中文 37
5 thank you coronavirus helpers 33
6 coronavirus news 31
7 corona 28
8 coronavirus us 27
9 coronavirus map 26
10 武漢 肺炎 26
11 china coronavirus 24
12 coronavirus tips 21
13 world coronavirus 20
14 coronavirus live 17
15 coronavirus usa 17
16 疫情 15
17 new coronavirus 15
18 coronavirus in taiwan 14
19 wuhan coronavirus 14
20 coronavirus worldometer 13
21 taiwan coronavirus cases 13
22 italy coronavirus 13
23 coronavirus symptoms 13
24 corona virus 13,
'rising': query value
0 taiwan coronavirus 806850
1 taiwan 760150
2 coronavirus update 520350
3 coronavirus cases 421850
4 thank you coronavirus helpers 264250
5 coronavirus us 217300
6 coronavirus map 210350
7 武漢 肺炎 207400
8 china coronavirus 190500
9 coronavirus tips 167400
10 world coronavirus 161150
11 coronavirus usa 135500
12 疫情 121000
13 coronavirus in taiwan 113550
14 wuhan coronavirus 111450
15 coronavirus worldometer 107600
16 taiwan coronavirus cases 105500
17 italy coronavirus 104450
18 coronavirus italy 100450
19 taiwan news coronavirus 96200
20 who coronavirus 95600
21 covid 94100
22 taiwan news 91750
23 who 87750
24 coronavirus uk 87600}}
COVID_19 = related_queries['Coronavirus']['top']
COVID_19
axes = COVID_19.plot.barh(x='query', y='value', figsize=(10,15))
The Search Trends of COVID-19 in 2020
pytrend.build_payload(kw_list=['Coronavirus'], timeframe='2020-01-01 2020-06-04')
covid_19_interest_over_time_df = pytrend.interest_over_time()
covid_19_interest_over_time_df.head()
axes = covid_19_interest_over_time_df.plot.line(
figsize=(20,5),
title='The Search Trends of COVID-19 in 2020')
axes.set_yticks([0, 25, 50, 75, 100])
axes.set_xlabel('Date')
axes.set_ylabel('Trends Index')
axes.tick_params(axis='both', which='major', labelsize=13)