当前位置:网站首页>word frequency count
word frequency count
2022-04-23 18:00:00 【Round programmer】
# Import extension library
import re # Regular expression library
import collections # Word frequency database
import numpy as np # numpy Data processing library
import jieba # Stuttering participle
import wordcloud # Word cloud display library
from PIL import Image # Image processing library
import matplotlib.pyplot as plt # Image gallery
# Read the file
fn = open('article.txt') # Open file
string_data = fn.read() # Read out the whole file
fn.close() # Close file
# Text preprocessing
pattern = re.compile(u'\t|\n|\.|-|:|;|\)|\(|\?|"') # Define regular expression matching patterns
string_data = re.sub(pattern, '', string_data) # Remove the characters that match the pattern
# Text participle
seg_list_exact = jieba.cut(string_data, cut_all = False) # Precise pattern segmentation
object_list = []
remove_words = [u' Of ', u',',u' and ', u' yes ', u' With ', u' about ', u' Yes ',u' etc. ',u' can ',u' all ',u'.',u' ',u'、',u' in ',u' stay ',u' 了 ',
u' Usually ',u' If ',u' We ',u' need '] # Custom remove Thesaurus
for word in seg_list_exact: # Loop through each participle
if word not in remove_words: # If it's not in the lexicon
object_list.append(word) # The participle is appended to the list
# Word frequency statistics
word_counts = collections.Counter(object_list) # Do word frequency statistics for word segmentation
word_counts_top10 = word_counts.most_common(10) # Before acquisition 10 The most frequent words
print (word_counts_top10) # Output check
# Word frequency display
mask = np.array(Image.open('wordcloud.jpg')) # Define the word frequency background
wc = wordcloud.WordCloud(
font_path='C:/Windows/Fonts/simhei.ttf', # Set the font format
mask=mask, # Setting the background
max_words=200, # The maximum number of words displayed
max_font_size=100 # Font maximum
)
wc.generate_from_frequencies(word_counts) # Generating word clouds from dictionaries
image_colors = wordcloud.ImageColorGenerator(mask) # Create a color scheme from the background image
wc.recolor(color_func=image_colors) # Set the color of the word cloud as the background image scheme
plt.imshow(wc) # Show word cloud
plt.axis('off') # Turn off the axis
plt.show() # Display images
版权声明
本文为[Round programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230545315832.html
边栏推荐
- Calculation of fishing net road density
- proxy server
- C1小笔记【任务训练篇二】
- [UDS unified diagnostic service] IV. typical diagnostic service (4) - online programming function unit (0x34-0x38)
- Batch export ArcGIS attribute table
- Implementation of k8s redis one master multi slave dynamic capacity expansion
- Auto.js 自定义对话框
- Add animation to the picture under V-for timing
- Remember using Ali Font Icon Library for the first time
- C# 的数据流加密与解密
猜你喜欢
Eigen learning summary
[UDS unified diagnostic service] v. diagnostic application example: Flash bootloader
极致体验,揭晓抖音背后的音视频技术
Random number generation of C #
Logic regression principle and code implementation
Open source key component multi_ Button use, including test engineering
Laser slam theory and practice of dark blue College Chapter 3 laser radar distortion removal exercise
Go language JSON package usage
Uniapp custom search box adaptation applet alignment capsule
Auto. JS custom dialog box
随机推荐
2022 Jiangxi Photovoltaic Exhibition, China distributed Photovoltaic Exhibition, Nanchang solar energy utilization Exhibition
positioner
Data stream encryption and decryption of C
2022年流动式起重机司机国家题库模拟考试平台操作
Halo 开源项目学习(二):实体类与数据表
Arcpy adds fields and loop assignments to vector data
C byte array (byte []) and string are converted to each other
The JS timestamp of wechat applet is converted to / 1000 seconds. After six hours and one day, this Friday option calculates the time
Summary of floating point double precision, single precision and half precision knowledge
消费者灰度实现思路
k8s之实现redis一主多从动态扩缩容
Go language JSON package usage
.105Location
C [file operation] read TXT text by line
Leak detection and vacancy filling (VII)
MySQL 中的字符串函数
Go file operation
Element calculation distance and event object
On the method of outputting the complete name of typeID from GCC
Flask项目的部署详解