Social Media
In [1]:
Copied!
import sys
sys.path.append("../../FinNLP")
import sys
sys.path.append("../../FinNLP")
Eastmoney¶
In [2]:
Copied!
from finnlp.data_sources.social_media.eastmoney_streaming import Eastmoney_Streaming
from finnlp.data_sources.social_media.eastmoney_streaming import Eastmoney_Streaming
In [3]:
Copied!
pages = 3
stock = "600519"
pages = 3
stock = "600519"
In [4]:
Copied!
downloader = Eastmoney_Streaming()
downloader.download_streaming_stock(stock, pages)
downloader = Eastmoney_Streaming()
downloader.download_streaming_stock(stock, pages)
Downloading ... 0 1 2
In [5]:
Copied!
downloader.dataframe.shape
downloader.dataframe.shape
Out[5]:
(241, 92)
In [6]:
Copied!
downloader.dataframe.head(1)
downloader.dataframe.head(1)
Out[6]:
post_id | post_title | stockbar_code | stockbar_name | stockbar_type | user_id | user_nickname | user_extendinfos | post_click_count | post_forward_count | ... | relate_topic | zwpage_flag | source_post_comment_count | post_atuser | reply_list | content_type | repost_state | reptile_state | allow_likes_state | post_is_hot | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1324058647 | 贵州茅台:每股派25.911元 6月30日共计派发现金红利325.49亿元 | 600519 | 贵州茅台吧 | 100.0 | 7344113638256342 | 贵州茅台资讯 | {'user_accreditinfos': None, 'deactive': '0', ... | 3799 | 14 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 rows × 92 columns
In [10]:
Copied!
selected_columns = ["post_title","user_nickname", "stockbar_name" ,"post_click_count", "post_forward_count", "post_comment_count", "post_publish_time", "post_last_time", "post_display_time"]
downloader.dataframe[selected_columns].head(10)
selected_columns = ["post_title","user_nickname", "stockbar_name" ,"post_click_count", "post_forward_count", "post_comment_count", "post_publish_time", "post_last_time", "post_display_time"]
downloader.dataframe[selected_columns].head(10)
Out[10]:
post_title | user_nickname | stockbar_name | post_click_count | post_forward_count | post_comment_count | post_publish_time | post_last_time | post_display_time | |
---|---|---|---|---|---|---|---|---|---|
0 | 贵州茅台:每股派25.911元 6月30日共计派发现金红利325.49亿元 | 贵州茅台资讯 | 贵州茅台吧 | 3799 | 14 | 15 | 2023-06-25 22:17:50 | 2023-06-26 03:12:47 | 2023-06-25 22:17:50 |
1 | 贵州茅台:贵州茅台2022年年度权益分派实施公告 | 贵州茅台资讯 | 贵州茅台吧 | 6423 | 47 | 17 | 2023-06-25 15:32:42 | 2023-06-26 00:57:39 | 2023-06-26 00:00:00 |
2 | 将派发现金红利325.49亿元!贵州茅台上市以来累计分红超2000亿元 | 贵州茅台资讯 | 贵州茅台吧 | 460 | 1 | 0 | 2023-06-25 23:49:07 | 2023-06-25 23:49:07 | 2023-06-25 23:49:07 |
3 | 茅台冰淇淋悄然卖数亿 年轻市场真被抓住了吗 | 贵州茅台资讯 | 贵州茅台吧 | 2612 | 15 | 11 | 2023-06-24 07:03:53 | 2023-06-25 18:48:21 | 2023-06-24 07:03:53 |
4 | 白酒本周跌5.49%原因是什么?下周怎么看? | NaN | NaN | 10197 | 4 | 25 | 2023-06-24 12:29:53 | 2023-06-25 23:12:49 | 2023-06-24 12:29:53 |
5 | 本周持仓与下周交易计划 | 满仓日记 | 财富号评论吧 | 547 | 2 | 1 | 2023-06-25 20:30:54 | 2023-06-26 03:19:08 | 2023-06-25 20:30:54 |
6 | 茅台酒的估值真的是高 | 菩萨小跟班888 | 贵州茅台吧 | 33 | 0 | 0 | 2023-06-26 03:02:14 | 2023-06-26 03:02:14 | 2023-06-26 03:02:14 |
7 | 茅台里面的资金估计要出来支持一些中小微企业政策导向[吃瓜] | 菩萨小跟班888 | 贵州茅台吧 | 24 | 0 | 0 | 2023-06-26 01:50:12 | 2023-06-26 01:50:12 | 2023-06-26 01:50:12 |
8 | 每股市值收益率,还没有银行定期利息高呢。(远离泡沫浮云地震带) | 章鱼帝的智慧 | 贵州茅台吧 | 33 | 0 | 1 | 2023-06-25 22:48:49 | 2023-06-26 01:20:04 | 2023-06-25 22:48:49 |
9 | 6月最后的倔强(浪潮信息,昆仑万维,鸿博股份)赛道复苏。 | 夏夏爱美丽 | 财富号评论吧 | 2459 | 0 | 34 | 2023-06-25 22:16:03 | 2023-06-26 00:45:53 | 2023-06-25 22:16:03 |
Facebook get cookies¶
In [ ]:
Copied!
from selenium import webdriver
import json
browser = webdriver.ChromiumEdge()
browser.get('https://www.facebook.com')
from selenium import webdriver
import json
browser = webdriver.ChromiumEdge()
browser.get('https://www.facebook.com')
Please login your account in the brower¶
In [ ]:
Copied!
cookies = browser.get_cookies()
with open("cookies.json", "w", encoding="utf-8") as cks:
json.dump(cookies, cks)
cookies = browser.get_cookies()
with open("cookies.json", "w", encoding="utf-8") as cks:
json.dump(cookies, cks)
Facebook¶
In [2]:
Copied!
from finnlp.data_sources.social_media.facebook_streaming import Facebook_Streaming
import json
from finnlp.data_sources.social_media.facebook_streaming import Facebook_Streaming
import json
In [4]:
Copied!
# load cookies
with open("cookies.json", "r", encoding="utf-8") as cks:
cookies = json.load(cks)
# load cookies
with open("cookies.json", "r", encoding="utf-8") as cks:
cookies = json.load(cks)
In [5]:
Copied!
config = {
"cookies":cookies,
"headless": False,
"stealth_path":"../../FinNLP/finnlp/data_sources/social_media/stealth.min.js"
}
pages = 3
stock = "AAPL"
config = {
"cookies":cookies,
"headless": False,
"stealth_path":"../../FinNLP/finnlp/data_sources/social_media/stealth.min.js"
}
pages = 3
stock = "AAPL"
In [6]:
Copied!
downloader = Facebook_Streaming(config)
downloader.download_streaming_stock(stock, pages)
downloader = Facebook_Streaming(config)
downloader.download_streaming_stock(stock, pages)
100%|██████████| 17/17 [00:57<00:00, 3.37s/it]
Only support the first page now!
In [7]:
Copied!
downloader.dataframe
downloader.dataframe
Out[7]:
content | date | |
---|---|---|
6 | AAPL (Stock Market) | 4h |
8 | Day 7\nIntroduction to Stock Market\nWhat you ... | 6h |
11 | US: AAPL new high and breakout from two-year r... | 1d |
Xueqiu / 雪球¶
In [2]:
Copied!
from finnlp.data_sources.social_media.xueqiu_streaming import Xueqiu_Streaming
from finnlp.data_sources.social_media.xueqiu_streaming import Xueqiu_Streaming
In [3]:
Copied!
pages = 3
stock = "茅台"
pages = 3
stock = "茅台"
In [4]:
Copied!
downloader = Xueqiu_Streaming()
downloader.download_streaming_stock(stock, pages)
downloader = Xueqiu_Streaming()
downloader.download_streaming_stock(stock, pages)
Downloading ... 0 1 2
In [5]:
Copied!
downloader.dataframe.shape
downloader.dataframe.shape
Out[5]:
(29, 53)
In [6]:
Copied!
downloader.dataframe.head(1)
downloader.dataframe.head(1)
Out[6]:
blocked | blocking | canEdit | commentId | controversial | created_at | description | donate_count | donate_snowcoin | editable | ... | truncated_by | type | user | user_id | view_count | firstImg | pic_sizes | edited_at | quote_cards | symbol_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | False | True | 0 | False | 2023-06-25 12:15:07 | <a href="http://xueqiu.com/S/SZ000860" target=... | 0 | 0 | True | ... | 0 | 2 | {'allow_all_stock': False, 'block_status': 0, ... | 8364804052 | 471 | NaN | NaN | NaN | NaN | NaN |
1 rows × 53 columns
In [7]:
Copied!
selected_columns = ["created_at", "description", "title", "text", "target", "source", "user"]
downloader.dataframe[selected_columns].head(10)
selected_columns = ["created_at", "description", "title", "text", "target", "source", "user"]
downloader.dataframe[selected_columns].head(10)
Out[7]:
created_at | description | title | text | target | source | user | |
---|---|---|---|---|---|---|---|
0 | 2023-06-25 12:15:07 | <a href="http://xueqiu.com/S/SZ000860" target=... | <a href="http://xueqiu.com/S/SZ000860" target=... | /8364804052/253976413 | Android | {'allow_all_stock': False, 'block_status': 0, ... | |
1 | 2023-06-25 12:14:22 | <a href="http://xueqiu.com/S/SH600519" target=... | <p><a href="http://xueqiu.com/S/SH600519" targ... | /4631817224/253976390 | 雪球 | {'allow_all_stock': False, 'block_status': 0, ... | |
2 | 2023-06-25 12:13:01 | ...提高。白酒:五粮液、迎驾贡酒、<span class='highlight'>茅台</... | 6.25 赛道和白马的机会 | <p>这个假期外围的环境不太好,已经是基本共识了。明天开盘大A承压低开也基本是一致预期。这么... | /4322952939/253976335 | 雪球 | {'allow_all_stock': False, 'block_status': 0, ... |
3 | 2023-06-25 11:58:55 | 茅台发生活费了 | 茅台发生活费了<br/><img class="ke_img" src="https://x... | /4653939718/253975764 | iPhone | {'allow_all_stock': False, 'block_status': 0, ... | |
4 | 2023-06-25 11:54:05 | ...业绩及股价,形成正反馈。当年<span class='highlight'>茅台</s... | 持仓吹票,共同致富 | <p><a href="http://xueqiu.com/k?q=%23%E4%BB%A5... | /8113901491/253975613 | Android | {'allow_all_stock': False, 'block_status': 0, ... |
5 | 2023-06-25 11:50:11 | 微酒酒业快讯,6月25日,酒业新闻一览-·企业动态·-01<span class='high... | 6.25:<span class='highlight'>茅</span><span cla... | <p><img class="ke_img" src="https://xqimg.imed... | /3615583399/253975485 | 雪球 | {'allow_all_stock': False, 'block_status': 0, ... |
6 | 2023-06-25 11:48:42 | <a href="http://xueqiu.com/S/SH603027" target=... | <a href="http://xueqiu.com/S/SH603027" target=... | /2659542807/253975430 | iPhone | {'allow_all_stock': False, 'block_status': 0, ... | |
7 | 2023-06-25 11:45:54 | 段永平说:我不鼓励小散投<a href="https://xueqiu.com/S/AAPL... | 段永平说:我不鼓励小散投<a href="https://xueqiu.com/S/AAPL... | /9456980430/253975338 | iPhone | {'allow_all_stock': False, 'block_status': 0, ... | |
8 | 2023-06-25 11:33:01 | 泸州老窖酒传统酿制技艺第二十三代传承人·国窖1573·曾娜大师鉴藏版,端午举杯小酒。<br/... | 泸州老窖酒传统酿制技艺第二十三代传承人·国窖1573·曾娜大师鉴藏版,端午举杯小酒。<br/... | /9893982765/253974916 | Android | {'allow_all_stock': False, 'block_status': 0, ... | |
9 | 2023-06-25 11:25:44 | ...酒店中,白酒卖得最好的往往不是<span class='highlight'>茅台</... | 街头没生意的烟酒店,为什么不会倒闭 | <p><img class="ke_img" src="https://xqimg.imed... | /5497522856/253974630 | 雪球 | {'allow_all_stock': False, 'block_status': 0, ... |
Stocktwits Streaming¶
In [6]:
Copied!
from finnlp.data_sources.social_media.stocktwits_streaming import Stocktwits_Streaming
from finnlp.data_sources.social_media.stocktwits_streaming import Stocktwits_Streaming
In [9]:
Copied!
pages = 3
stock = "AAPL"
config = {
"use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 2,
}
pages = 3
stock = "AAPL"
config = {
"use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 2,
}
In [10]:
Copied!
downloader = Stocktwits_Streaming(config)
downloader.download_streaming_stock(stock, pages)
downloader = Stocktwits_Streaming(config)
downloader.download_streaming_stock(stock, pages)
Checking ips: 100%|██████████| 30/30 [01:07<00:00, 2.24s/it]
Get proxy ips: 30. Usable proxy ips: 29.
100%|██████████| 3/3 [00:05<00:00, 1.68s/it]
In [9]:
Copied!
df = downloader.dataframe
df.head(2)
df = downloader.dataframe
df.head(2)
Out[9]:
id | body | created_at | user | source | symbols | prices | mentioned_users | entities | liked_by_self | reshared_by_self | links | reshare_message | conversation | likes | reshares | network | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 522005335 | NANCY PELOSI JUST BOUGHT 10,000 SHARES OF APPL... | 2023-04-07T15:24:22Z | {'id': 4744627, 'username': 'JavierAyala', 'na... | {'id': 1149, 'title': 'StockTwits for iOS', 'u... | [{'id': 686, 'symbol': 'AAPL', 'symbol_mic': '... | [{'id': 686, 'symbol': 'AAPL', 'symbol_mic': '... | [] | {'sentiment': None} | False | False | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 522004768 | $AAPL $SPY \n \nhttps://amp.scmp.com/news/chi... | 2023-04-07T15:17:43Z | {'id': 6330207, 'username': 'PlainFacts_2121',... | {'id': 2269, 'title': 'StockTwits Web', 'url':... | [{'id': 686, 'symbol': 'AAPL', 'symbol_mic': '... | [{'id': 686, 'symbol': 'AAPL', 'symbol_mic': '... | [] | {'sentiment': None} | False | False | [{'title': 'China officials who abused health ... | NaN | NaN | NaN | NaN | NaN |
In [10]:
Copied!
selected_columns = ["created_at", "body"]
df[selected_columns].head(10)
selected_columns = ["created_at", "body"]
df[selected_columns].head(10)
Out[10]:
created_at | body | |
---|---|---|
0 | 2023-04-07T15:24:22Z | NANCY PELOSI JUST BOUGHT 10,000 SHARES OF APPL... |
1 | 2023-04-07T15:17:43Z | $AAPL $SPY \n \nhttps://amp.scmp.com/news/chi... |
2 | 2023-04-07T15:17:25Z | $AAPL $GOOG $AMZN I took a Trump today. \n\nH... |
3 | 2023-04-07T15:16:54Z | $SPY $AAPL will take this baby down, time for ... |
4 | 2023-04-07T15:11:37Z | $SPY $3T it ALREADY DID - look at the pre-COV... |
5 | 2023-04-07T15:10:29Z | $AAPL $QQQ $STUDY We are on to the next one! A... |
6 | 2023-04-07T15:06:00Z | $AAPL was analyzed by 48 analysts. The buy con... |
7 | 2023-04-07T14:54:29Z | $AAPL both retiring. \n \nCraig.... |
8 | 2023-04-07T14:40:06Z | $SPY $QQQ $TSLA $AAPL SPY 500 HAS STARTED🚀😍 BI... |
9 | 2023-04-07T14:38:57Z | Nancy 🩵 (Tim) $AAPL |
Reddit Wallstreetbets Streaming¶
In [2]:
Copied!
from finnlp.data_sources.social_media.reddit_streaming import Reddit_Streaming
from finnlp.data_sources.social_media.reddit_streaming import Reddit_Streaming
In [3]:
Copied!
pages = 3
config = {
# "use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 2,
}
pages = 3
config = {
# "use_proxy": "us_free",
"max_retry": 5,
"proxy_pages": 2,
}
In [17]:
Copied!
downloader = Reddit_Streaming(config)
downloader.download_streaming_all(pages)
downloader = Reddit_Streaming(config)
downloader.download_streaming_all(pages)
Downloading by pages...: 100%|██████████| 3/3 [00:08<00:00, 2.83s/it]
In [18]:
Copied!
df = downloader.dataframe
df.head(2)
df = downloader.dataframe
df.head(2)
Out[18]:
id | numComments | created | score | distinguishType | isLocked | isStickied | thumbnail | title | author | ... | postEventInfo | predictionTournament | reactedFrom | removedBy | removedByCategory | subreddit | suggestedCommentSort | topAwardedType | url | whitelistStatus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | t3_12epaq0 | 8 | 1680881974000 | 0 | None | False | False | {'url': 'https://b.thumbs.redditmedia.com/W8hd... | Y’all making me feel like spooderman | ghostwholags | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | t3_zr9v10 | 0 | 1671595782000 | 2 | None | True | False | {'url': 'https://b.thumbs.redditmedia.com/dJqb... | Do you track your investments in a spreadsheet... | sharesight | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 rows × 100 columns
In [20]:
Copied!
import pandas as pd
df["created"] = pd.to_datetime(df["created"], unit = "ms")
df.head(2)
import pandas as pd
df["created"] = pd.to_datetime(df["created"], unit = "ms")
df.head(2)
Out[20]:
id | numComments | created | score | distinguishType | isLocked | isStickied | thumbnail | title | author | ... | postEventInfo | predictionTournament | reactedFrom | removedBy | removedByCategory | subreddit | suggestedCommentSort | topAwardedType | url | whitelistStatus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | t3_12epaq0 | 8 | 2023-04-07 15:39:34 | 0 | None | False | False | {'url': 'https://b.thumbs.redditmedia.com/W8hd... | Y’all making me feel like spooderman | ghostwholags | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | t3_zr9v10 | 0 | 2022-12-21 04:09:42 | 2 | None | True | False | {'url': 'https://b.thumbs.redditmedia.com/dJqb... | Do you track your investments in a spreadsheet... | sharesight | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 rows × 100 columns
In [22]:
Copied!
selected_columns = ["created", "title"]
df[selected_columns].head(10)
selected_columns = ["created", "title"]
df[selected_columns].head(10)
Out[22]:
created | title | |
---|---|---|
0 | 2023-04-07 15:39:34 | Y’all making me feel like spooderman |
1 | 2022-12-21 04:09:42 | Do you track your investments in a spreadsheet... |
2 | 2022-12-21 04:09:42 | Do you track your investments in a spreadsheet... |
3 | 2023-04-07 15:29:23 | Can a Blackberry holder get some help 🥺 |
4 | 2023-04-07 14:49:55 | The week of CPI and FOMC Minutes… 4-6-23 SPY/ ... |
5 | 2023-04-07 14:19:22 | Well let’s hope your job likes you, thanks Jerome |
6 | 2023-04-07 14:06:32 | Does anyone else feel an overwhelming sense of... |
7 | 2023-04-07 13:47:59 | Watermarked Jesus explains the market being cl... |
8 | 2023-04-07 13:26:23 | Jobs report shows 236,000 gain in March. Hot l... |
9 | 2023-04-07 13:07:15 | The recession is over! Let's buy more stocks! |
Weibo Date Range¶
In [23]:
Copied!
from finnlp.data_sources.social_media.weibo_date_range import Weibo_Date_Range
from finnlp.data_sources.social_media.weibo_date_range import Weibo_Date_Range
In [24]:
Copied!
start_date = "2016-01-01"
end_date = "2016-01-02"
stock = "茅台"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 5,
"cookies": "Your_Login_Cookies",
}
start_date = "2016-01-01"
end_date = "2016-01-02"
stock = "茅台"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 5,
"cookies": "Your_Login_Cookies",
}
In [25]:
Copied!
downloader = Weibo_Date_Range(config)
downloader.download_date_range_stock(start_date, end_date, stock = stock)
downloader = Weibo_Date_Range(config)
downloader.download_date_range_stock(start_date, end_date, stock = stock)
Gathering free ips by pages...: 100%|██████████| 5/5 [00:09<00:00, 1.95s/it] Checking ips: 100%|██████████| 75/75 [01:23<00:00, 1.11s/it]
获取到的代理ip数量: 75 。Get proxy ips: 75. 能用的代理数量: 13。Usable proxy ips: 13.
Downloading by dates...: 100%|██████████| 2/2 [01:03<00:00, 31.56s/it]
In [31]:
Copied!
df = downloader.dataframe
df = df.drop_duplicates()
df.head(10)
df = downloader.dataframe
df = df.drop_duplicates()
df.head(10)
Out[31]:
date | date_content | source | content | |
---|---|---|---|---|
0 | 2016-01-01 | 2016年01月01日23:41 | Moto X | #舆论之锤#唯品会发声明证实销售假茅台-手机腾讯网O网页链接分享来自浏览器! |
2 | 2016-01-01 | 2016年01月01日22:57 | 新浪博客 | 2016元旦节快乐酒粮网官方新品首发,茅台镇老酒,酱香原浆酒:酒粮网茅台镇白酒酱香老酒纯粮原... |
6 | 2016-01-01 | 2016年01月01日22:56 | 新浪博客 | 2016元旦节快乐酒粮网官方新品首发,茅台镇老酒,酱香原浆酒:酒粮网茅台镇白酒酱香老酒纯粮原... |
17 | 2016-01-01 | 2016年01月01日22:40 | 五蕴皆崆Android | 开心,今天喝了两斤酒(茅台+扎二)三个人,开心! |
18 | 2016-01-01 | NaN | NaN | 一家专卖假货的网站某宝,你该学学了!//【唯品会售假茅台:供货商被刑拘顾客获十倍补偿】O唯品... |
19 | 2016-01-01 | NaN | NaN | 一家专卖假货的网站//【唯品会售假茅台:供货商被刑拘顾客获十倍补偿】O唯品会售假茅台:供货商... |
20 | 2016-01-01 | 2016年01月01日21:46 | 360安全浏览器 | 前几天说了几点不看好茅台的理由,今年过节喝点茅台支持下,个人口感,茅台比小五好喝,茅台依然是... |
21 | 2016-01-01 | 2016年01月01日21:44 | 华为P8 | 老杜酱酒已到货,从明天起正式在甘肃武威开卖。可以不相信我说的话,但一定不要怀疑@杜子建的为人... |
22 | 2016-01-01 | 2016年01月01日21:24 | 华为Ascend P7 | 【唯品会售假茅台后续:供货商被刑拘顾客获十倍补偿】此前,有网友投诉其在唯品会购买的茅台酒质量... |
23 | 2016-01-01 | 2016年01月01日21:16 | 实得惠省钱网 | 唯品会卖假茅台,供货商被刑拘,买家获十倍补偿8888元|此前,有网友在网络论坛发贴(唯品会宣... |
In [32]:
Copied!
df.shape
df.shape
Out[32]:
(60, 4)
Weibo Streaming¶
In [4]:
Copied!
from finnlp.data_sources.social_media.weibo_streaming import Weibo_Streaming
from finnlp.data_sources.social_media.weibo_streaming import Weibo_Streaming
In [5]:
Copied!
rounds = 3
stock = "茅台"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 5,
"cookies": "Your_Login_Cookies",
}
rounds = 3
stock = "茅台"
config = {
"use_proxy": "china_free",
"max_retry": 5,
"proxy_pages": 5,
"cookies": "Your_Login_Cookies",
}
In [6]:
Copied!
downloader = Weibo_Streaming(config)
downloader.download_streaming_stock(stock = stock, rounds = rounds)
downloader = Weibo_Streaming(config)
downloader.download_streaming_stock(stock = stock, rounds = rounds)
Gathering free ips by pages...: 100%|██████████| 5/5 [00:09<00:00, 1.98s/it] Checking ips: 100%|██████████| 75/75 [01:26<00:00, 1.15s/it]
获取到的代理ip数量: 75 。Get proxy ips: 75. 能用的代理数量: 19。Usable proxy ips: 19.
Processing the text content and downloading the full passage...: 100%|██████████| 9/9 [00:00<00:00, 64.89it/s] Processing the text content and downloading the full passage...: 100%|██████████| 10/10 [00:09<00:00, 1.07it/s] Processing the text content and downloading the full passage...: 100%|██████████| 10/10 [00:02<00:00, 4.93it/s] Downloading by page..: 100%|██████████| 3/3 [00:19<00:00, 6.46s/it]
In [10]:
Copied!
df = downloader.dataframe
df.head(2)
df = downloader.dataframe
df.head(2)
Out[10]:
card_type | display_followbtn | mblog | itemid | actionlog | cate_id | display_arrow | show_type | scheme | container_color | container_color_dark | content_short | content | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9 | False | {'attitudes_count': 0, 'can_edit': False, 'com... | seqid:187118896|type:61|t:|pos:1-0-0|q:茅台|srid... | {'act_code': 554, 'ext': 'seqid:187118896|type... | 31 | 0 | 1 | https://m.weibo.cn/status/MAWMprpPp?mblogid=MA... | #EEEEEE | #151515 | 事情做好做精,还可以赚大钱的生意才是好生意,而不是忙忙碌碌,最后一算账没赚多少!比如苹果的市... | 事情做好做精,还可以赚大钱的生意才是好生意,而不是忙忙碌碌,最后一算账没赚多少!比如苹果的市... |
1 | 9 | False | {'attitudes_count': 0, 'can_edit': False, 'com... | seqid:187118896|type:61|t:|pos:1-0-1|q:茅台|srid... | {'act_code': 554, 'ext': 'seqid:187118896|type... | 31 | 0 | 1 | https://m.weibo.cn/status/MAWHVDm0H?mblogid=MA... | #EEEEEE | #151515 | 茅台茅台成都收4瓶飞天,自提 | 茅台茅台成都收4瓶飞天,自提 |
In [11]:
Copied!
selected_columns = ["content_short", "content"]
df[selected_columns].head(10)
selected_columns = ["content_short", "content"]
df[selected_columns].head(10)
Out[11]:
content_short | content | |
---|---|---|
0 | 事情做好做精,还可以赚大钱的生意才是好生意,而不是忙忙碌碌,最后一算账没赚多少!比如苹果的市... | 事情做好做精,还可以赚大钱的生意才是好生意,而不是忙忙碌碌,最后一算账没赚多少!比如苹果的市... |
1 | 茅台茅台成都收4瓶飞天,自提 | 茅台茅台成都收4瓶飞天,自提 |
2 | 我可太喜欢茅台这个防伪了 | 我可太喜欢茅台这个防伪了 |
3 | 没想到 4S店的二楼 是卖茅台的吧 | 没想到 4S店的二楼 是卖茅台的吧 |
4 | 买不起茅台,砸锅卖铁也得买得起茅台冰淇淋 许昌·胖东来时代广场 | 买不起茅台,砸锅卖铁也得买得起茅台冰淇淋 许昌·胖东来时代广场 |
5 | xxx给我枇杷xxx给我蜂蜜 xxx偷茅台喝(假的)。我很喜欢自己家的产品,感觉很无害纯天然... | xxx给我枇杷xxx给我蜂蜜 xxx偷茅台喝(假的)。我很喜欢自己家的产品,感觉很无害纯天然... |
6 | 茅台 奎屯出一只兔茅 | 茅台 奎屯出一只兔茅 |
7 | 2022胡润酒类品牌榜发布 2022胡润酒类品牌榜发布点评:与我印象中的有点出入。不出茅台和... | 2022胡润酒类品牌榜发布 2022胡润酒类品牌榜发布点评:与我印象中的有点出入。不出茅台和... |
8 | 41岁,很美妙!“爸爸生日快乐,吃个蛋糕🍰”小奶音听着上头。爱人,亲戚,朋友,草莓🍓,茅台+... | 41岁,很美妙!“爸爸生日快乐,吃个蛋糕🍰”小奶音听着上头。爱人,亲戚,朋友,草莓🍓,茅台+... |
0 | 吃到了茅台冰激淋也 | 吃到了茅台冰激淋也 |