当前位置:网站首页>Redis Bloom Filter
Redis Bloom Filter
2022-08-08 20:34:00 【Li Bin's Blog】
Related Links
redis+ Bloom filter idea: https://kdocs.cn/l/sgVaSvSB3Z0s
A few small cases
①. Originally there were 1 billion numbers, but now there are 100,000 numbers. It is necessary to quickly and accurately determine whether these 100,000 numbers are in the 1 billion number database?
Solution 1: Store 1 billion numbers in the database and query the database, the accuracy is good, but the speed will be relatively slow.
Solution 2: Put 1 billion numbers into memory, such as Redis cache, here we calculate the memory size: 1 billion * 8 bytes = 8 GB, through memory query, accuracy and speed are available,But about 8gb of memory space, quite a waste of memory space.
②. Those who have been in contact with crawlers should have such a demand. There are thousands of websites that need crawlers. For a new website url, how do we judge whether we have already crawled the url?
The solution is still the above two, obviously, not very good.
③. Similarly, there is also the filtering of junk mailboxes.
So for such a large data collection, how to accurately and quickly determine whether a certain data is in a large data collection without occupying memory, the Bloom filter came into being.
Redis+Bloom filter simple architecture diagram
1, data structure
Bloom filter: A data structure that consists of a long string of binary vectors, which can be viewed as a binary array.Since it is binary, it stores either 0 or 1, but the initial default value is 0.
2, hash map
1. Bloom needs to record the data he has seen. The records here need to be hashed with the hash function to get the array subscript and store it in the BIT array as 1.Such a record only occupies 1BIT space
2. When judging whether it exists: give the Bloom filter a data, perform hash to get the subscript, and get the data from the BIT array. If it is 1, it means the data exists, if it is 0, it means it does not exist
3. Accuracy
The hash algorithm may collide, so different data may be hashed as a subscript data, so in order to improve the accuracy, it is necessary to use multiple hash algorithms to mark a data, and increase the size of the BIT array
Because of this, the Bloom filter judges that [data exists], maybe the data does not exist, but if it is judged to be [data does not exist], then the data must not exist.
4. Delete is not supported
Bloom filter can only insert data to judge whether it exists, but cannot delete it, and can only guarantee the absolute accuracy of [non-existent] judgment
It is not difficult to see from the above that if adds a counter to each BIT bit of the array, +1 when inserting and -1 when deleting can achieve deletion.
But the implementation of the up-counter is problematic:
Due to the hash collision problem, the Bloom filter cannot accurately determine whether the data exists, so it cannot be deleted at will.Secondly, the wraparound problem of the counter also needs to be considered.
5. Advantages and disadvantages of Bloom filter
Advantages: The advantages are obvious, the binary array occupies very little memory, and the insertion and query speed are fast enough.
Disadvantages: With the increase of data, the false positive rate will increase; there is no way to judge that the data must exist; another important disadvantage is that the data cannot be deleted.
Install bloom filter module for redis
1. Download:
Address: https://github.com/RedisBloom/RedisBloomDownload ZIP file, upload to linuxRedisBloom-master.zip
2, decompress and compile
Command:unzip RedisBloom-master.zipcd RedisBloom-mastermakeAfter scanning the above command, a file name is generated in the folder: redisbloom.so
3. Load the module when starting redis
Two waysFirst, modify the configuration file directly# Enter the reids directory, which is more convenient to configure in redis.conf>> vim redis.conf# Modify the redis configuration file and add the redisboom module>> loadmodule /path/to/redisbloom.soSecond, the module is loaded when redis starts# redis-server add Bloom filter# Reference documentation: https://github.com/RedisBloom/RedisBloom#building-and-loading-redisbloom>> redis-server --loadmodule /path/to/redisbloom.so
4, redis restart
# Kill the redis process[[email protected] etc]# ps aux|grep redisroot 16679 0.0 0.1 153896 2440 ? Ssl 15:10 0:00 ./redis-server 0.0.0.0:6379root 30971 0.0 0.0 112712 980 pts/0 R+ 15:17 0:00 grep --color=auto redis[[email protected] etc]# kill -9 16679[[email protected] etc]# ps aux|grep redisroot 31436 0.0 0.0 112712 980 pts/0 R+ 15:17 0:00 grep --color=auto redis# Start the redis service[[email protected] etc]# cd ../bin## start redis service[[email protected] bin]# ./redis-server /usr/local/redis/etc/redis.conf32294:C 22 Jan 2021 15:17:40.648 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo32294:C 22 Jan 2021 15:17:40.648 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=32294, just started32294:C 22 Jan 2021 15:17:40.648 # Configuration loaded[[email protected] bin]# ps aux|grep redisroot 32295 0.0 0.1 156020 2672 ? Ssl 15:17 0:00 ./redis-server 0.0.0.0:6379root 32637 0.0 0.0 112712 984 pts/0 R+ 15:17 0:00 grep --color=auto redis
5. Test whether the installation is successful
127.0.0.1:6379> bf.add mym meituan(integer) 1127.0.0.1:6379> bf.exists mym meituan(integer) 1127.0.0.1:6379> bf.exists mym baidu(integer) 0
Basic commands
bf.add adds elements to bloom filter, bf.add can only add onebf.exists determines whether the element is in the bloom filterbf.madd adds multiple elements to bloom filterbf.mexists determines whether multiple elements are in bloom filter
Example:
> bf.add rurl www.baidu.com> bf.exists rurl www.baidu.com> bf.madd rurl www.sougou.com www.jd.com> bf.mexists rurl www.jd.com www.taobao.com
边栏推荐
- 1259 Alice and Bob
- 【2023秋招面经】20220805安恒信息实习
- 实践篇2:深度学习之----LetNet之tensorflow2的实现
- 学习笔记:线性表的顺序表示和实现(二级指针实现)
- 自然堂品牌焕新升级,携手代言人王一博彰显美妆年轻新态度
- How can recommender systems be trusted?A review of the latest "Trusted Recommender System" from Rutgers University, a 43-page pdf explaining the composition and technology of trusted RS
- 亚洲首个!朱永官院士荣获2022年国际土壤科学联合会李比希奖
- 利用shell脚本同时编译生成多个cmake工程
- LeetCode_67_二进制求和
- Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception
猜你喜欢
关于Mac终端自定义命令和Mysql命令问题
我们仍未知道那天踩的MultipartFile file为null的大坑是为什么
【翻译】用Argo CD揭开企业规模的持续交付的秘密成分
DCT变换
头条二面:你确定ThreadLocal真的会造成内存泄露?
JSP第二篇 -----JSP浅聊EL表达式第二篇:EL表达式中的运算符
Gartner:2022年全球半导体收入增长预计将放缓至7%,远低于2021年26.3%
Mei cole studio OpenHarmony equipment development training notes - the first learn notes
SushiSwap「SUSHI」下降了 93%,但还没有完全消失
随手记:laravel、updateOrCreate 和 updateOrInsert 的区别
随机推荐
Kotlin委托属性知识点
文件上传接入阿里云OSS
rk3588使用npu进行模型转换和推理,加速AI应用落地
有幸与美团大佬共同探讨单节点连接数超1.5W的问题
Kotlin笔记-ForEach与ForEachIndexed区别
riscv-gnu-toolchain下载安装
推荐系统如何可信?罗格斯大学最新《可信推荐系统》综述,43页pdf阐述可信RS组成与技术
五大理由告诉你为什么开发人员选择代码质量静态分析工具Klocwork来实现软件安全
买股票安全吗 资金能取出来吗
JMeter测试接口并发场景
Categorized input and output, Go lang1.18 introductory refining tutorial, from Bai Ding to Hongru, go lang basic data types and input and output EP03
源码分析Canal专栏
Kotlin-学习的第五天之Handler
【翻译】宣布Kubernetes策略管理论文
学习笔记:XML
tar zcf是单线程瓶颈
Yarn 总结(未完待续)
Notes: The difference between laravel, updateOrCreate and updateOrInsert
Cesium中自定义材质material
监控工具普罗米修斯(Prometheus)的介绍与安装