当前位置:网站首页>Redis Bloom Filter
Redis Bloom Filter
2022-08-08 20:34:00 【Li Bin's Blog】
Related Links
redis+ Bloom filter idea: https://kdocs.cn/l/sgVaSvSB3Z0s
A few small cases
①. Originally there were 1 billion numbers, but now there are 100,000 numbers. It is necessary to quickly and accurately determine whether these 100,000 numbers are in the 1 billion number database?
Solution 1: Store 1 billion numbers in the database and query the database, the accuracy is good, but the speed will be relatively slow.
Solution 2: Put 1 billion numbers into memory, such as Redis cache, here we calculate the memory size: 1 billion * 8 bytes = 8 GB, through memory query, accuracy and speed are available,But about 8gb of memory space, quite a waste of memory space.
②. Those who have been in contact with crawlers should have such a demand. There are thousands of websites that need crawlers. For a new website url, how do we judge whether we have already crawled the url?
The solution is still the above two, obviously, not very good.
③. Similarly, there is also the filtering of junk mailboxes.
So for such a large data collection, how to accurately and quickly determine whether a certain data is in a large data collection without occupying memory, the Bloom filter came into being.
Redis+Bloom filter simple architecture diagram
1, data structure
Bloom filter: A data structure that consists of a long string of binary vectors, which can be viewed as a binary array.Since it is binary, it stores either 0 or 1, but the initial default value is 0.
2, hash map
1. Bloom needs to record the data he has seen. The records here need to be hashed with the hash function to get the array subscript and store it in the BIT array as 1.Such a record only occupies 1BIT space
2. When judging whether it exists: give the Bloom filter a data, perform hash to get the subscript, and get the data from the BIT array. If it is 1, it means the data exists, if it is 0, it means it does not exist
3. Accuracy
The hash algorithm may collide, so different data may be hashed as a subscript data, so in order to improve the accuracy, it is necessary to use multiple hash algorithms to mark a data, and increase the size of the BIT array
Because of this, the Bloom filter judges that [data exists], maybe the data does not exist, but if it is judged to be [data does not exist], then the data must not exist.
4. Delete is not supported
Bloom filter can only insert data to judge whether it exists, but cannot delete it, and can only guarantee the absolute accuracy of [non-existent] judgment
It is not difficult to see from the above that if adds a counter to each BIT bit of the array, +1 when inserting and -1 when deleting can achieve deletion.
But the implementation of the up-counter is problematic:
Due to the hash collision problem, the Bloom filter cannot accurately determine whether the data exists, so it cannot be deleted at will.Secondly, the wraparound problem of the counter also needs to be considered.
5. Advantages and disadvantages of Bloom filter
Advantages: The advantages are obvious, the binary array occupies very little memory, and the insertion and query speed are fast enough.
Disadvantages: With the increase of data, the false positive rate will increase; there is no way to judge that the data must exist; another important disadvantage is that the data cannot be deleted.
Install bloom filter module for redis
1. Download:
Address: https://github.com/RedisBloom/RedisBloomDownload ZIP file, upload to linuxRedisBloom-master.zip
2, decompress and compile
Command:unzip RedisBloom-master.zipcd RedisBloom-mastermakeAfter scanning the above command, a file name is generated in the folder: redisbloom.so
3. Load the module when starting redis
Two waysFirst, modify the configuration file directly# Enter the reids directory, which is more convenient to configure in redis.conf>> vim redis.conf# Modify the redis configuration file and add the redisboom module>> loadmodule /path/to/redisbloom.soSecond, the module is loaded when redis starts# redis-server add Bloom filter# Reference documentation: https://github.com/RedisBloom/RedisBloom#building-and-loading-redisbloom>> redis-server --loadmodule /path/to/redisbloom.so
4, redis restart
# Kill the redis process[[email protected] etc]# ps aux|grep redisroot 16679 0.0 0.1 153896 2440 ? Ssl 15:10 0:00 ./redis-server 0.0.0.0:6379root 30971 0.0 0.0 112712 980 pts/0 R+ 15:17 0:00 grep --color=auto redis[[email protected] etc]# kill -9 16679[[email protected] etc]# ps aux|grep redisroot 31436 0.0 0.0 112712 980 pts/0 R+ 15:17 0:00 grep --color=auto redis# Start the redis service[[email protected] etc]# cd ../bin## start redis service[[email protected] bin]# ./redis-server /usr/local/redis/etc/redis.conf32294:C 22 Jan 2021 15:17:40.648 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo32294:C 22 Jan 2021 15:17:40.648 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=32294, just started32294:C 22 Jan 2021 15:17:40.648 # Configuration loaded[[email protected] bin]# ps aux|grep redisroot 32295 0.0 0.1 156020 2672 ? Ssl 15:17 0:00 ./redis-server 0.0.0.0:6379root 32637 0.0 0.0 112712 984 pts/0 R+ 15:17 0:00 grep --color=auto redis
5. Test whether the installation is successful
127.0.0.1:6379> bf.add mym meituan(integer) 1127.0.0.1:6379> bf.exists mym meituan(integer) 1127.0.0.1:6379> bf.exists mym baidu(integer) 0
Basic commands
bf.add adds elements to bloom filter, bf.add can only add onebf.exists determines whether the element is in the bloom filterbf.madd adds multiple elements to bloom filterbf.mexists determines whether multiple elements are in bloom filter
Example:
> bf.add rurl www.baidu.com> bf.exists rurl www.baidu.com> bf.madd rurl www.sougou.com www.jd.com> bf.mexists rurl www.jd.com www.taobao.com
边栏推荐
猜你喜欢
推荐系统如何可信?罗格斯大学最新《可信推荐系统》综述,43页pdf阐述可信RS组成与技术
CVPR 2022 ActivityNet竞赛冠军:中科院深圳先进院提出高低分双模态行为识别框架
Superman is coming!Flutter realizes full-screen power animation!
LeetCode_2_两数相加
图的几种存储方式
深度学习初步认知
com.alibaba.fastjson.JSONException: default constructor not found. class
方舟开服务器教程——开服配置常见问题及解决方法
How can recommender systems be trusted?A review of the latest "Trusted Recommender System" from Rutgers University, a 43-page pdf explaining the composition and technology of trusted RS
IJCAI 2022 | Can Graph Neural Networks Detect Anomalies?
随机推荐
深度学习初步认知
门外汉掌握数据分析处理技术的路线图
如何用WebSocket打造Web端IM即时通讯聊天
Some useful frameworks in Kotlin
记一次坎坷的调试|Mosquitto通过TLS连接EMQ时阻塞的问题
Web3到底是什么?
流媒体后视镜前装搭载小幅下滑,远峰与镜泰排位争夺白热化
0-1 背包问题
What are the latest developments in the handling of false information?KDD2022 "Fighting Misinformation and Responding to Media Bias" tutorial, 161 pages ppt
【翻译】用Argo CD揭开企业规模的持续交付的秘密成分
C语言初阶-指针
openEuler 资源利用率提升之道02:典型应用下的效果
LeetCode_2_两数相加
差点被ECCV错过的论文:视频理解新框架,仅用微调的「成本」,达到预训练的「全能」...
nacos作用
实践篇1:深度学习之----LetNet之tensorflow2的实现
自然堂品牌焕新升级,携手代言人王一博彰显美妆年轻新态度
IJCAI 2022 | 图神经网络可以检测到异常吗?
小白如何购买基金产品?
文件上传接入阿里云OSS