如何不去做运行3.5G-docker镜像的工程师

最后更新于:2022-04-01 21:48:09

##译者信息: * email:hanxiaotongtong@163.com * 博客:[www.zimug.com](http://www.zimug.com) * 原文地址:https://www.datawire.io/not-engineer-running-3-5gb-docker-images/ * 转载请注明译者及出处 ##译文 让我们切入正题:你采用微服务架构,你打算使用Docker。原因是它如此的时髦 -它解决了很多很多的问题,对我们的项目零的负面影响,是么? 真的么? 正如每一个工具,技术和范例(paradigm),推到我们面前,我们尽量保持我们的理智来保证从光明走向光明;我们需要首先了解陷阱(gotchas)。 要做到这一点,我想先提出一个简单的问题:这个新怪物如何去咬我的屁股,而我能做些什么以避免被它咬到? 我想解决这个问题,这个问题我已经与采用Docker的团队或组织进行了一次又一次的探讨。 ### 巨兽 Docker Images ``` $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE awesome-micro-service latest 61562a134d38 About a minute ago 3.5 GB ``` 哇! 看那个镜像的大小。3.5GB的awesome-micro-service!这么有多么的微。我相信你已经看到了这个...我敢肯定,你想改变它。没关系,每个人的第一次都不太成功。 ### Docker image 到底是个什么东西? 为了搞懂为什么我们的镜像这么大,首先我们需要了解镜像是什么。 一个docker image是一次docker构建的输出。构建过程中运行每一个Dockerfile中的指令。执行每条指令创建一个Layer(层)。层封装了文件系统的变化,该指令造成的。一个Docker 镜像是一组层的集合。 让我们来更深入的看下,这样我们可以更详细地描述了一个docker 镜像。 * 例子: 假设我们打算把docker引入到我们的PHP的工作流程。为了运行我们的PHP应用程序,我们需要用安装了PHP的Debian的系统。 我们需要描述运行我们应用的Docker容器的必须环境。 ``` # Dockerfile FROM debian:jessie RUN echo "Building ..." RUN DEBIAN_FRONTEND=noninteractive apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install php5-cli ``` 超级简单。超级直观。超级棒。虽然我们构建它之前是完全没用的。构建过程需要Dockerfile和关联环境(context),并产生一个Docker 镜像。 这个关联环境是一个目录,目录里的文件可以发送到Dockerfile来满足需求。如使用ADD或者COPY命令等 ``` # docker build -t -f # 如果Dockerfile文件在context的根目录下,可以忽略-f参数 ``` ``` $ docker build -t my-debian-php:latest -f Dockerfile . ... $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE my-docker-php latest 61562a134d38 About a minute ago 163.5 MB ``` So what’s actually going on? What’s inside my Docker image? It’s a file system. Nothing fancy. When you run an apt-get install vim, all you’re telling the computer to do is put some files on your hard drive. The Docker image encapsulates that and keeps track of all new / modified / deleted files. These file system changes are tracked in layers. Each layer is the the encapsulation of the file system changes for each instruction in your Dockerfile. Docker provides a command to visualise our Docker images. As you’ll see in the output below: We have no control over the size of our base image, other than changing base image. This is the “” layer at the bottom of the list. Some keywords cost us nothing. Examples include CMD, USER, WORKDIR, etc. ``` $ docker history my-docker-php IMAGE CREATED CREATED BY SIZE COMMENT b4e7e4004eeb 4 seconds ago /bin/sh -c #(nop) CMD ["vim"] 0 B d2a8ad35f9f4 4 seconds ago /bin/sh -c echo 0 B 6fc559885751 36 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 38.37 MB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB ``` Note: If your command makes no changes to the file-system (Like our RUN echo “Building …”), a layer is still created. It just has a zero-byte size. So in-order to keep our images micro, we need to keep the output of our layers to a minimum Gotcha’s ## 1. File Ownership & Permissions Never, and I mean it, never change the ownership or permissions of a file inside a Dockerfile unless you absolutely NEED to. When you do NEED to, try to modify as few files as possible. Although comparisons can be made, Docker isn’t like Git. It doesn’t know what changes have happened inside your layer, only which files are affected. As such, this will cause Docker to create a new layer, replicating / replacing the files. This can potentially cause your image to double in size if you’re modifying particularly large files, or worse, every file! Example: ``` # Dockerfile FROM debian:jessie ADD large_file /var/wwwlarge_file RUN chown www-data /var/www/large_file RUN chmod 756 /var/www/large_file $ docker build -t gotcha-1 . ... $ docker images gotcha-1 REPOSITORY TAG IMAGE ID CREATED SIZE gotcha-1 latest 49b4a4ea228a About a minute ago 3.346 GB $ docker history gotcha-1 IMAGE CREATED CREATED BY SIZE COMMENT 49b4a4ea228a 36 seconds ago /bin/sh -c chmod 756 /var/www/large_file 1.074 GB 09d77316932b 2 minutes ago /bin/sh -c chown www-data /var/www/large_file 1.074 GB 7adb7c72c3ef 2 minutes ago /bin/sh -c #(nop) ADD file:a86f6dedfb4ba54972 1.074 GB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB ``` Tip: If you’re having problems with permissions inside your container, modify them using your entrypoint script, or modify the user id to reflect what you need. Do not modify the files. Example Changing the user-id of www-data to match yours. Tweak as necessary: RUN usermod -u 1000 www-data Or run your container with an entrypoint script: ``` $ cat my-script #!/bin/bash chown www-data -R /var/www/ apache2 $ docker run my-debian-php --entrypoint=/bin/my-script ``` ## 2. Clean up after untidy commands Sometimes other commands leave a trail of garbage at their sides and couldn’t care about the size of your images. We accept this on our desktops and preach “cache” and “performance”. Inside our images, it’s just pure filth. Example: ``` # Dockerfile FROM debian:jessie RUN DEBIAN_FRONTEND=noninteractive apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim $ docker build -t debian . ... $ docker history debian IMAGE CREATED CREATED BY SIZE COMMENT ae5a25410c0d 10 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MB aaf5660234d3 21 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 9.694 MB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB ``` As you can see from the output above, our apt-get update costs us about 10MB and out apt-get install costs us about 30MB. Obviously these are trivial examples, but in larger builds this space will accumulate! First, lets examine and see what each command is doing to our image. To do this, create an interactive Docker image and bash in: ``` $ docker run -ti --rm --name live debian:jessie bash ``` You’ll be live inside the innards of a Debian container and at a bash prompt. Next, let’s get a second terminal window open and inspect the container: ``` $ docker diff live $ ``` No output. That’s good, because we’ve not done anything yet. docker diff allows us to see what’s changed inside our container. So lets run our first command: Note: “$ ” is my local prompt and “root@4552beab7001:/#” is inside the container. ``` root@4552beab7001:/# apt-get update $ docker diff live C /var C /var/lib C /var/lib/apt C /var/lib/apt/lists A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_main_binary-amd64_Packages.gz A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_main_binary-amd64_Packages.gz A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_main_binary-amd64_Packages.gz A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release.gpg A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_InRelease A /var/lib/apt/lists/lock A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_InRelease ``` Oooh, we’ve just discovered where our 10MB is going. Lets fix it by tweaking our Dockerfile to delete our apt cache after installing vim. Your initial thought may be to tweak as: ``` # Dockerfile FROM debian:jessie RUN DEBIAN_FRONTEND=noninteractive apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim RUN rm -rf /var/lib/apt ``` Unfortunately, this will only add another layer and not affect the previous layers. So although we’re deleting files, the previous layer still has knowledge of them. The common trick in is to chain our commands at the shell level. This way, the files don’t exist when the RUN is finished and they never exist in our history. ``` # Dockerfile FROM debian:jessie RUN DEBIAN_FRONTEND=noninteractive apt-get update \ && apt-get install -y vim \ && rm -rf /var/lib/apt $ docker history debian IMAGE CREATED CREATED BY SIZE COMMENT be6afc32bd37 5 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB ``` Much better You can repeat that process for every RUN inside your Dockerfile and really cut the fat out of your image. Just be careful not to delete anything that you need! Tips ### Tip #1. Create and maintain your own base images, preferably on Alpine! Alpine Linux is tiny (Under 5MB!) and has a really strong package manager. If you can, use it and keep your base images lean. Why is creating / maintaining your own base image ideal? Most “official” images are quite bloated and try to be as general as possible. You know what you need. It’s like compiling your own kernel, only not as dangerous 😀 ### Tip #2. ONBUILD. Use it. When crafting base images, ONBUILD gives you a great way to reuse this image for both development and production. ONBUILD tells Docker that when the image is used as a base, we should perform some extra instructions, such as the following, which puts our code into the container for a production build. ONBUILD ADD . /var/www As this only runs when being used as a base, our docker-compose.yml, used for development, can instead mount a volume into the container, for getting our code changes into the container without a rebuild ``` services: application: image: my-base volumes: - .:/var/www ``` ### Tip #3. Be careful using community images. They disappear. Often. Fork and maintain your own if it’s mission critical. You’re also putting your trust in the maintainer to protect your attach surface, but that’s a security issue and another post for next time.
';