当前位置:网站首页>word frequency count
word frequency count
2022-04-23 18:00:00 【Round programmer】
# Import extension library
import re # Regular expression library
import collections # Word frequency database
import numpy as np # numpy Data processing library
import jieba # Stuttering participle
import wordcloud # Word cloud display library
from PIL import Image # Image processing library
import matplotlib.pyplot as plt # Image gallery
# Read the file
fn = open('article.txt') # Open file
string_data = fn.read() # Read out the whole file
fn.close() # Close file
# Text preprocessing
pattern = re.compile(u'\t|\n|\.|-|:|;|\)|\(|\?|"') # Define regular expression matching patterns
string_data = re.sub(pattern, '', string_data) # Remove the characters that match the pattern
# Text participle
seg_list_exact = jieba.cut(string_data, cut_all = False) # Precise pattern segmentation
object_list = []
remove_words = [u' Of ', u',',u' and ', u' yes ', u' With ', u' about ', u' Yes ',u' etc. ',u' can ',u' all ',u'.',u' ',u'、',u' in ',u' stay ',u' 了 ',
u' Usually ',u' If ',u' We ',u' need '] # Custom remove Thesaurus
for word in seg_list_exact: # Loop through each participle
if word not in remove_words: # If it's not in the lexicon
object_list.append(word) # The participle is appended to the list
# Word frequency statistics
word_counts = collections.Counter(object_list) # Do word frequency statistics for word segmentation
word_counts_top10 = word_counts.most_common(10) # Before acquisition 10 The most frequent words
print (word_counts_top10) # Output check
# Word frequency display
mask = np.array(Image.open('wordcloud.jpg')) # Define the word frequency background
wc = wordcloud.WordCloud(
font_path='C:/Windows/Fonts/simhei.ttf', # Set the font format
mask=mask, # Setting the background
max_words=200, # The maximum number of words displayed
max_font_size=100 # Font maximum
)
wc.generate_from_frequencies(word_counts) # Generating word clouds from dictionaries
image_colors = wordcloud.ImageColorGenerator(mask) # Create a color scheme from the background image
wc.recolor(color_func=image_colors) # Set the color of the word cloud as the background image scheme
plt.imshow(wc) # Show word cloud
plt.axis('off') # Turn off the axis
plt.show() # Display images
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545315832.html
边栏推荐
- The JS timestamp of wechat applet is converted to / 1000 seconds. After six hours and one day, this Friday option calculates the time
- 2022年茶艺师(初级)考试模拟100题及模拟考试
- Stanford machine learning course summary
- Anchor location - how to set the distance between the anchor and the top of the page. The anchor is located and offset from the top
- Add animation to the picture under V-for timing
- ROS package NMEA_ navsat_ Driver reads GPS and Beidou Positioning Information Notes
- Yolov4 pruning [with code]
- 极致体验,揭晓抖音背后的音视频技术
- Romance in C language
- Error in created hook: "referenceerror:" promise "undefined“
猜你喜欢
Go language JSON package usage
Solving the problem of displaying too many unique values in ArcGIS partition statistics failed
Examination question bank and online simulation examination of the third batch (main person in charge) of special operation certificate of safety officer a certificate in Guangdong Province in 2022
Element calculation distance and event object
2022 Shanghai safety officer C certificate operation certificate examination question bank and simulation examination
开源按键组件Multi_Button的使用,含测试工程
Anchor location - how to set the distance between the anchor and the top of the page. The anchor is located and offset from the top
[UDS unified diagnostic service] v. diagnostic application example: Flash bootloader
Flask项目的部署详解
In JS, t, = > Analysis of
随机推荐
Clion installation tutorial
Random number generation of C #
[UDS unified diagnostic service] IV. typical diagnostic service (6) - input / output control unit (0x2F)
Operation of 2022 mobile crane driver national question bank simulation examination platform
MySQL_ 01_ Simple data retrieval
Chrome浏览器的跨域设置----包含新老版本两种设置
Listen for click events other than an element
Implementation of k8s redis one master multi slave dynamic capacity expansion
Remember using Ali Font Icon Library for the first time
Leak detection and vacancy filling (VIII)
Halo 开源项目学习(二):实体类与数据表
2022年流动式起重机司机国家题库模拟考试平台操作
.104History
消费者灰度实现思路
587. 安装栅栏 / 剑指 Offer II 014. 字符串中的变位词
Compilation principle first set follow set select set prediction analysis table to judge whether the symbol string conforms to the grammar definition (with source code!!!)
2022 tea artist (primary) examination simulated 100 questions and simulated examination
2022制冷与空调设备运行操作判断题及答案
[appium] write scripts by designing Keyword Driven files
C1 notes [task training chapter I]