当前位置:网站首页>Grafana series (IX): introduction to Loki, an open source cloud native logging solution
Grafana series (IX): introduction to Loki, an open source cloud native logging solution
2022-04-22 15:54:00 【East wind whistling】
️URL: https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/
brief introduction
Grafana Labs brief introduction
Grafana Is a de facto dashboard solution for time series data . It supports nearly 100 data sources .Grafana Labs Want to change from a dashboard solution to an observability (observability) platform , Become your first choice when you need to debug the system .
Complete observability
Observability . About what this means , There are many definitions . Observability is the visibility of your systems and their behavior and performance . Typically, this model , Observability can be divided into three parts ( Or pillar ): indicators (Metrics)、 journal (Logs) And tracking (Traces); Each part complements each other , Help you find the problem quickly .
The following is the Grafana Labs A recurring picture in blogs and speeches :
Today's reality : Different systems , Different data
Slack Warn me , Say there's a problem , I'll open it Grafana Relevant dashboards for services on the . If I find an exception in a panel or chart , I will be in Prometheus Open the query in the user interface of , Do more in-depth research . for example , If I find that one of the services throws 500 A mistake , I'll try to find out if it's a particular handler / The route threw this error , Or whether all instances throw this error , wait .
Next , Once I have a fuzzy mental model , Know what went wrong , I'll look at the log ( For example splunk On ). stay Loki Before , I'm used to using kubectl To get relevant logs , See what the mistake is , And if I can do something . This works well for mistakes , But sometimes I give up because of high delay . after , I started from traces ( such as AppD) Get more information from , About what is slow , Which way / operation / The function is slow . Or use Jaeger To get tracking information .
Although they don't always tell me directly what's wrong , But they usually let me look at the code close enough to find out what went wrong . then , I can extend the service ( If the service is overloaded ) Or deploy repair .
Loki Project background
Prometheus Well done ,Jaeger It's getting better , and kubectl It's also very good . label (label) The model is powerful , Enough for me to find the root cause of the wrong service . If I find out ingester Service error , I will do :kubectl --namespace prod logs -l name=ingester | grep XXX, To get relevant logs , And through them grep.
If I find something wrong with a particular instance , Or I want to track the log of a service , I have to use a separate pod To keep track of , because kubectl You are not allowed to track... According to the tag selector . It's not ideal , But it is feasible for most use cases .
as long as pod Not broken or replaced , That's it . If pod Or the node is terminated , The log will be lost forever . in addition ,kubectl Store only recent logs , So when we want to log the day before or earlier , We are blind . Besides , Have to go from Grafana Jump to the CLI Jumping back is not ideal . We need a solution that can reduce context switching , And many of the solutions we explore are very expensive , Or it doesn't scale well .
This is to be expected , Because they are more than select + grep To do more , And that's what we need . Look at the existing solutions ,Grafana Labs Decide to build your own .
Loki
Because I'm not satisfied with any open source solution ,Grafana Labs Start talking to people , I found that many people have the same problem . in fact ,Grafana Labs Have realized that , Even today , Many developers are still SSH and grep/tail Logs on the machine . The solutions they use are either too expensive , Or it's not stable enough . in fact , People are asked to reduce their logs ,Grafana Labs Think this is an anti pattern log .Grafana Labs I think we can establish some Grafana Labs What can be used internally and by the wider open source community .Grafana Labs There is one main goal :
• Keep it simple . Only support grep!
This one comes from @alicegoldfuss Your tweets don't support Loki, Just to illustrate Loki Try to solve the problem
Grafana Labs Also aimed at other targets :
• Logs should be cheap . No one should be required to keep fewer logs .
• Easy to operate and expand
• indicators (Metrics)、 journal (Logs)( And later tracking (traces)) Need to work together
The last point is very important .Grafana Labs Has gone from Prometheus The metadata of indicators is collected , So I want to use these metadata for log Association . for example ,Prometheus use namespace、service name、 example IP And so on to mark each indicator . When you get the alarm , Use metadata to find where to look for logs . If you try to mark the log with the same metadata , We can seamlessly switch between measurement and logging . You can here [1] notice Grafana Labs Written internal design documents . Here is Loki Demo video link :
️Loki Demo video [2]
framework
according to Grafana Labs Build and run Cortex Experience -- As a running service Prometheus Horizontally scalable distributed version -- Came up with the following Architecture :
Loki framework
Metadata matching between indicators and logs is very important for us ,Grafana Labs The initial decision was only for Kubernetes. The idea is to run a log collection agent on each node , Use it to collect logs , And kubernetes Of API dialogue , Find the correct metadata for the log , And send them to a central service , You can use it to display in Grafana Logs collected in .
The agent supports and Prometheus Same configuration (relabelling rules), To ensure that the metadata matches . We call this agent promtail.
thorough Loki —— Scalable log collection engine :
Loki Internal architecture
Write path and read path ( Inquire about ) Are decoupled from each other , Explain separately :
Loki Write path
Distributor( The dispenser )
once promtail Collect and send logs to Loki,Distributor Is the first component to receive logs . Now? ,Loki May receive millions of writes per second , We don't want to write them to the database when they come in . That will bring down any database . You need to batch and compress the data when it enters .
Grafana Labs By building compressed data blocks (chunks), adopt gzip Compress logs to achieve this .ingester( collector ) A component is a stateful component , Responsible for building blocks , Then refresh the block .Loki There are many. ingester, The logs belonging to each stream should always be in the same ingester In the end , Because all related entries end in the same block . By building a ingester Ring (ring) And use consistent hashes to do this . When an entry enters , branch Distributor Hash the tag of the log , Then find out which... To send the entry according to the hash value ingester.
Loki Distributor Components
Besides , In order to achieve redundancy and flexibility ,Loki Copy it n Time ( The default is 3 Time ).
Ingester( collector )
Now? ,Ingester Will receive the entry and start building the block .
Loki Ingester structure chunks
This is basically a log gzip Process and append . Once block " fill " 了 , We'll brush it into the database . We're a block (ObjectStorage) And indexes use different databases , Because they store different types of data .
Loki Ingester Build up chunks, take index Brush to the index library , take chunks Brush to chunks library
After brushing a block ,Ingester A new empty block is created , And add a new entry to the block .
Querier( Inquire device )
The read path is very simple , from Querier To do most of the heavy work . Given a time range and label selector , It looks at the index to find matching blocks , And search through them , Give you the result . It also works with ingesters dialogue , To get the latest data that has not been brushed into the library .
Please note that , stay 2019 In the Year Edition , For each query , One Ingester Search all relevant logs for you .Grafana Labs Already in Cortex The front end is used to realize query parallelization , The same method can be extended to Loki, To provide distributed grep, This will make large queries fast enough .
Loki Querier Components
Scalability
1.Loki Put the block data into the object storage , So you can expand .
2.Loki Put the index in Cassandra/Bigtable/DynamoDB or Loki Built in index db in , This is also extensible .
3.Distributors and Queriers It's a stateless component , Can expand horizontally .
Speaking of ingester, It is a stateful component , but Loki The life cycle of complete fragmentation and re fragmentation has been included . When rollout After the work is done , Or when ingester Be enlarged or shrunk , The ring topology will change ,ingester Will redistribute their blocks , To match the new topology . This is mainly from Cortex Code for , It's already running in production 5 More years .
summary
Loki: like Prometheus, but for logs.
Loki It's a level of scalability 、 High availability 、 Multi tenant log aggregation system , It was inspired by Prometheus. It is designed to be very cost-effective and easy to operate . It does not index the contents of the log , Instead, provide a set of tags for each log stream .
Grafana Series articles
Grafana Series articles [3]
References
[1] here : https://docs.google.com/document/d/11tjK_lvp1-SVsFZjgOTr1vV3-q6vBAsZYIQ5ZeYBkyM/edit#heading=h.c90a30a5yw3i [2] ️Loki Demo video : https://youtu.be/7n342UsAMo0 [3] Grafana Series articles : https://ewhisper.cn/tags/Grafana/
版权声明
本文为[East wind whistling]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204221544198950.html
边栏推荐
- 一文学会JVM性能优化
- 性能飙升66%的秘密:AMD 2.5万元768MB 3D缓存霄龙首次开盖
- Spark basic learning notes 23: dataframe and dataset
- Who will do the school's fixed asset management system? Yunna RFID fixed asset management system
- This API hub is powerful. It contains open APIs such as nailing enterprise wechat, and can be debugged directly!
- Interviewer: please talk about = = operator and equals (), from the perspective of basic data type and reference data type
- Send custom fields using Tencent cloud custom alarm SMS interface
- 关键字精确匹配的优缺点
- 【虹科技术分享】ntopng是如何进行攻击者和受害者检测
- Grafana series articles - "translation" based on grafana's full stack observability demo
猜你喜欢
![[in depth understanding of tcallusdb technology] example code - asynchronous call interface](/img/7b/8c4f1549054ee8c0184495d9e8e378.png)
[in depth understanding of tcallusdb technology] example code - asynchronous call interface
![[in depth understanding of tcallusdb technology] sample code for reading the data of the specified location in the list - [list table]](/img/7b/8c4f1549054ee8c0184495d9e8e378.png)
[in depth understanding of tcallusdb technology] sample code for reading the data of the specified location in the list - [list table]

Servlet基础

哈希表篇(二)
![[in depth understanding of tcallusdb technology] sample code for reading data - [generic table]](/img/7b/8c4f1549054ee8c0184495d9e8e378.png)
[in depth understanding of tcallusdb technology] sample code for reading data - [generic table]

企业级知识管理(KM)建设方法及过程

Build your own web site (8)

For the professional development of teacher Guo, write down your experience

太卷了~(2022版)大厂面经 + 详细笔记帮你搞定面试
How redis solves the performance bottleneck caused by frequent command round trips!
随机推荐
Malware analysis – ursnif Trojan
Construction method and process of enterprise level knowledge management (km)
CASIA webface of dataset: a detailed introduction to the introduction, installation and use of CASIA webface dataset
Spark basic learning notes 23: dataframe and dataset
Build your own web site (8)
Ansible practical tips - batch patrol site URL status
[in depth understanding of tcallusdb technology] example code for deleting data - [generic table]
Commitfailedexception exception, reason and solution
Will NFT impact the native culture of the Internet?
Reverse linked list (and) the intermediate node of the linked list
Frequently asked questions about recent BSN development
引入文件路径问题-$_SERVER[‘DOCUMENT_ROOT‘]代表网站根目录
一文学会JVM性能优化
网站优化后如何降低阿里云国际版服务器成本
Sed in shell script
报名开启|QKE 容器引擎托管版暨容器生态发布会!
Quickly build your own wordpress blog site [play with Huawei cloud]
Xcode 13如何使用本地Swift包(Swift Package)
Tencent cloud fortress machine opens OTP authentication
SAP UI5 应用开发教程之七十一 - SAP UI5 页面的嵌套路由试读版