当前位置:网站首页>How to quickly get through the mirror release process?

How to quickly get through the mirror release process?

2022-08-09 10:15:00 InfoQ

近期Q-eyePublished a sharing of mirror slimming,引起大家的关注,Many people come to inquire about the plan of mirror slimming.This sharing starts with an example,Prove the importance and effectiveness of image optimization,Played a good role in attracting jade,But the content is shorter,less rationale,I hope that through this article, I will sort out the mirror slimming process.,In actual combat case to further clarify the plan.

Containerized distribution releases the application and the environment on which the application depends(比如JRE、动态库,环境变量,系统目录等)Packaged together as an image,Resolved publishing consistency issues,即能够Build Once,Run Everywhere,But this scheme also bring a side effect,It's just too big a mirror,Container mirror provides hierarchical mechanism,Just need to change in transmission layer at a time,To a certain extent, the transmission volume has been reduced.

In order to explain the compression and transmission methods of container images,First a brief introduction of container stratification:

Container Hierarchical File System


When we create a new container on the host,A separate filesystem will be created for this container,This layered file system treats the file layer in the image as a read-only layer,And create a new readable and writable file layer superimposed on the read-only layer of the mirror,All operations within the container take place at the read and write layer.这样做可以实现:
  • Image files are read-only,Can be created using an imageN个容器,Each container is isolated from each other
  • Multiple containers share a read-only layer,Ability to utilize the caching mechanism of the file system,加速数据的读取

when we make a new image,过程也是类似的,A read-write layer will be superimposed on the base image file layer,And in reading and writing level into the new content,Then turn the new read-write layer into a new read-only layer,form a new mirror.So one of the problems that newbies are prone to encounter is:我在Dockerfile里面使用rmRemoved some files in the base image,But why the final mirror is not reduced?If you understand the mechanism of this read and write layer, you will know,Any operation happens at the read and write layer,So when you delete a file you can't really delete the previous file,Just read and write in to do a special mark on the layer,Make the file system invisible,So the mirror won't shrink.

Therefore, the layered mechanism has the following disadvantages:
  • Every time you make a change, a new layer is created,There are so many layers
  • Once the file is written and mirrored,Subsequent mirrors based on this mirror,Can't delete this file corresponds to the space

Mirror Layered Transport Mechanism


A container of mirror is composed of description file and a series of data files,Each data file corresponds to a file layer.When we pull or push images,First get the description file,Then, according to the description file, determine which layers already exist locally(Or distal already has),Then only the layers that do not exist can be transmitted,Dramatically reduce transfer volume.但是,If cannot be directly connected between two systems,It is impossible to determine which mirror layer peers already have,So the mechanism will fail.

After understanding these principles,We can further discuss how to optimize the size of the image..

Dockerfile上的优化


To reduce the size of the container image,在编写DockerfileWhen referring to the following experience:

  • Teams try to use a unified base image.Establish and maintain the company's unified base mirror list.
  • 减少Dockerfile的行数,使用&&连接多个命令,Because each line of command generates a layer.
  • Put the actions of adding files and cleaning files into one line,比如yum install和yum clean all,If divided into two lines,The second cleanup action cannot actually delete the file.
  • 只复制需要的文件,If the entire directory is copied,Be sure to double check the directory for hidden files、Unwanted content such as temporary files.
  • The container image itself has a compression mechanism,So compress the file into a zip file and put it into the container,Container startup extract method does not have what effect.
  • To avoid the production image into some unnecessary tools,Such as some team scoredsshd,不应该使用这种方案,增加安全风险.
  • Minimize the content of the installation as much as possible,For example, only the runtime of the tool is installed,Do not need to bring help documentation、源码、sample etc..

Optimization on image layering


一个典型的javaThe size of the applied image is in500M左右,其中300MLeft and right are base images(包含OS/JRE等),还有200Mare application related files.Although it doesn't look like much,但是在微服务架构下,There are many applications,Suppose we release each version30个镜像,The total throughput300M + 200M* 30,大概要6G左右,still bigger.

Can reduce the number of releases per release by further layering the application,There are two ways to divide the layers:

1、Several similar products share a base image,Put the bag onto public base image layer
假设A/B/CAll three products use the same technical architecture,such as using20个相同的jar包,这些jarPackages in total100M.If the three products create a shared mid-tier base image,Then based on this base image, make the respective image,Then the amount of data published by the application layer will be changed from the original 200M * 3 变成 100M + 100M * 3,This can reduce the release quantity.

2、Separation based on hot and cold,Make the unchanged part a base image
假设AApply a total of200M,但是三方jar包就有180M,own application only20M,Then you can create an intermediate layer base image,So every time a version is released,If the third partyjarno changes to the package,then the middle layer does not need to retransmit,只传递20M's own application,You can also reduce the amount of publishing.

I usually call this layering to reduce the volume of releasesoffload层.The effect is very direct,但是offloadLayers also bring a lot of management problems:

对于第一类,Multiple products share oneoffload层,Need to maintain close communication with these products,Suppose it needs to be updatedoffloadThe version of a component in the layer,Then several products need to be changed and released together at the same time..May introduce problems if there are inconsistencies.

The second will also existoffload层更新的问题.比如:目前JavaApp popular usagemaven来管理依赖,如何根据mavenThe output is updated in timeoffload层?如果更新不及时,also cause inconsistencies.

OffloadLayer update problems if depend on the management process or artificial inspection is not reliable,我们建议将offloadlayer ascache一样使用,实现方案如下:
  • Extract public documents and cold documents according to business characteristicsoffload层.
  • 在制作镜像时,利用multi-stage builds机制,First copy the latest full content toStage中,然后在Stage进行一次比对,如果offloadlayer is up to date,如果不是最新的,Using the latest replacement.

以Java为例,Dockerfile写法如下:

Stage 复制所有的lib包到/app/lib/目录下RUN一个脚本,检查下/app/lib/下的文件和offload层中/app/lib_shared/下的文件,如果两个文件一致,Delete the file and set up a soft link to replace the actual file​Build 从Stage中复制/app/lib/The directory makes the final real layer
In this way, even if the dependency package is updated,而offloadLayers are not updated in a timely manner,only mirror imagesoffload失败,The mirror is bigger,不会造成故障.

注意: 目前发现tomcatIf you want to support soft links,需要打开allowLinking开关,否则会失败.

Mirror transfer optimization


前文提到过,The image layered transmission must be the case that the network of the source warehouse and the target warehouse can communicate with each other,But the actual scene is often more complicated,For example, many projects have strict network management and control,Do not allow the server to directly access the external network;Many international projects have poor network connections to China,Slow and frequent packet loss.So we need to discuss the situation.

Online Compression and Transmission Solutions


Suppose a machine can be found,This machine has access to both the source and target repositories,Can be installed in the machine line oneDocker,然后直接使用docker pull/push的方式,The process of pulling and uploading is incremental transfer;But this way requires installationDocker,And will take up file system space(The image will be temporarily stored locally).It is recommended to use our open source image-transmit (https://github.com/wct-devops/image-transmit)工具,This tool connects to the source repository at one end,One end connects to the target warehouse,Directly forward the incremental data layer to the past,Intermediate data is not placed on the disk,效率是最高的.At the same time, this tool is a green version of the interface tool,使用简单(也支持命令行),压缩后只有几M大小,资源消耗很少,Can run stably on some safety springboards.

Offline compression and transmission scheme


In many scenarios, we cannot achieve online transmission,You need to save the image as a file first,And then sent to the scene by various means,For example, through Baidu cloud disk transfer、存到UDisk and then courier to wait.In the offline transmission mode, the focus is to consider how to compress the image package to the minimum,and the time to compress and decompress.下面介绍几种模式:

  • docker save|gzip方式,这种是最基本的方法,找一台安装有Docker的机器,将镜像拉取到本地,然后使用saveCommand to save and compression.这种方式非常耗时,At the same time, the compressed package is also the largest.
  • 使用上文提到的image-transmitTools for offline packaging,默认使用tar算法.This tool can download mirrored data files directly from the repository,merged into compressed package,Than on a way to reduce the saved to the local and then export and the process of compression and decompression,速度可以提升20倍,At the same time, if you compress multiple images at once,The same image layer will only keep one copy,This can reduce the size of the compressed package,In our own mirror version, for example,Use tools to get package isdocker save方法的1/3到1/5大小.
  • 同时image-transmit还提供了squashfs算法,This compression algorithm in a way on the basis of further,Each image layer decompression,Solid compression for each file,举个例子,A产品和BProducts are useddemo.jar,但是这个jarnot in the base image layer,The previous way is unable to identify this duplicate,但是squashfsCompression method can be recognized and compressed into one copy,This can further reduce the size of the compressed package,Take our own image package as an example,This compression method can be reduced on the basis of the previous method30%~50%.However, this compression algorithm needs to decompress all image layers and then compress them,resulting in a very long compression time,previous way5Packages that can be compressed in minutes,It takes about an hour this way.

The choice of compression algorithm can be selected according to the actual situation,even try both ways,Comprehensively choose a suitable.Offline mode because unable to direct the warehouse,So the above method is to save all image layers to the offline package,So how to achieve incremental release in offline mode??

image-transmitImplements such an incremental release method:when making a zip,Can automatically according to the information of the last package skip has already sent a mirror image of the layer,Just send a part of the incremental change,This will further reduce the size of the package.However, this approach also has disadvantages,For example, the versions must be downloaded in strict order,Missing a version could leave the on-site warehouse missing some data layers,造成失败.If there are more releases,Similar to the following scheme can be used to avoid the risk:
  • A full version package is released every month
  • 每日/The weekly version makes incremental packages based on the full package at the beginning of the month

The increment of the quasi real-time synchronization with the full amount of synchronization on a regular basis,That can reduce the amount of synchronization and avoid transmission failure caused by missing some layers.

Summary and follow-up planning


There are still some optimization directions for mirror transmission to be studied,比如:
  • There are many mirrors in the industry,Its container contains only application binaries,Therefore, its image size is no different from traditional application publishing.But this kind of mirror is doing problem analysis when,Need to rely on external tools,The threshold of the program can improve the fault analysis,可以通过sidecarway to provide troubleshooting tools.
  • Too large an image affects more than just publishing,在版本升级、In the scenario of container re-creation such as container switching,Consume a lot of network bandwidth and diskIO,Mirror warehouse can easily become a bottleneck,You need to consider similarDragonfly的P2P分发方案.
  • 目前我们主要使用Centos作为基础镜像,We are considering replacing the lighter weight,Base images designed specifically for containers.

Mirror slimming is actually a systematic project,Requires multiple teams to work together,From technology platform to business applications to deliver together with the ground.
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/221/202208090947463647.html