How I reduced the size of my very first published docker image by 40% - A lesson in dockerizing shell scripts
I interact with Dockerfiles every day at work, have written a few myself, built containers, and all that. But never published one on the docker hub registry. I wanted to make ugit - a tool to undo git commands (written as a shell script) available to folks who donât like installing random shell scripts from the internet.
Yeah, I know, I know. REWRITE IT IN GO/RUST/MAGICLANG. The script is now more than 500+ lines of bash. I am not rewriting it in any other language unless someone holds a gun to my head (or maybe ends up sponsoring??). Moreover, ugit is close to being feature complete (only a few commands left to undo, which are not that commonly used).
Anyway, the rest of the article is about how I went about writing the official Dockerfile for ugit (a shell script) and reduced the image size by almost 40% (going from 31.4 MB to 17.6 MB) by performing step-by-step guided optimization attempts. I hope this motivates other shell enthusiasts to also publish their scripts as docker images!
PS: I am not a DevOps or Docker expert, so if you are and see something wrong or something that could be done better, please let me know in the comments below or reachout somewhere. The final docker image is available on docker hub
- The very first
Dockerfile
~attempt~ - Path to optimization - reducing image size by 40%
- 2nd attempt -
alpine
onalpine
- 3rd attempt - Using
scratch
at 2nd stage - Everything needed to run
ugit
- Could we reduce the size further?
- You didnât try
docker-slim
? - You didnât try
docker-squash
? - Learnings
- Acknowledgements
- Resources
The very first Dockerfile
~attempt~
# Use an official Alpine runtime as a parent image
FROM alpine:3.18
# Set the working directory in the container to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install dependencies
RUN apk add --no-cache \
bash \
gawk \
findutils \
coreutils \
git \
ncurses \
fzf
# Set permissions and move the script to path
RUN chmod +x ugit && mv ugit /usr/local/bin/
# Run ugit when the container launches
CMD ["ugit"]
Looks pretty simple, right? It is. You could copy-paste this Dockerfile and build the image yourself assuming you have ugit cloned in the current directory,
docker build -t ugit .
docker run --rm -it -v $(pwd):/app ugit
This should successfully run ugit inside the container.
ugit
requires binaries like bash (>=4.0)
, awk
, xargs
, git
, fzf
, tput
, cut
, tr
, nl
.
- We had to install findutils because it ships with
xargs
. - We had to install coreutils because it ships with
tr
,cut
andnl
. ncurses
is required bytput
(which is used to get the terminal info).
Thatâs all we need to run ugit on a UNIX-based machine or well, a container. The image size sits at 31.4 MB
at this point. Not bad for a first attempt. Letâs see how we can reduce it further.
Path to optimization - reducing image size by 40%
During our (desperate) micro-optimization attempts in the upcoming sections, we will be covering the following high-level goals:
- Use multi-stage builds to reduce image size.
- Get rid of binaries like
sleep
,watch
,du
etc. Anything that is not required byugit
to run. - Get rid of unnecessary dependencies that these binaries bring in.
- Get a minimum version of all dependencies that are required by
ugit
to run. - Load up only the required binaries and their dependencies in the final image.
2nd attempt - alpine
on alpine
I now choose to create a multi-stage build. The 2nd stage will be used to copy only the required binaries and their dependencies. I again choose to use alpine
as the base image for this stage.
# First stage: Install packages
FROM alpine:3.18 as builder
RUN apk add --no-cache \
bash \
gawk \
findutils \
coreutils \
git \
ncurses \
fzf
# Copy only the ugit script into the container at /app
WORKDIR /app
COPY ugit .
# Set permissions and move the script to path
RUN chmod +x ugit && mv ugit /usr/local/bin/
# Second stage: Copy only necessary binaries and their dependencies
FROM alpine
COPY --from=builder /usr/local/bin/ugit /usr/bin/
COPY --from=builder /usr/bin/git /usr/bin/
COPY --from=builder /usr/bin/fzf /usr/bin/
COPY --from=builder /usr/bin/tput /usr/bin/
COPY --from=builder /usr/bin/cut /usr/bin/
COPY --from=builder /usr/bin/tr /usr/bin/
COPY --from=builder /usr/bin/nl /usr/bin/
COPY --from=builder /usr/bin/gawk /usr/bin/
COPY --from=builder /usr/bin/xargs /usr/bin
COPY --from=builder /usr/bin/env /bin/
COPY --from=builder /bin/bash /bin/
WORKDIR /app
# Run ugit when the container launches
CMD ["ugit"]
Just by a straightforward multi-stage build, we were able to reduce the image size to an impressive 20.6
MB. Now the image builds successfully, but it wonât run ugit yet.
Error loading shared library libreadline.so.8: No such file or directory (needed by /bin/bash)
Turns out, we are missing transitive dependencies. More about this on our 3rd attempt.
Looks like xargs
and awk
came free?
Turns out, that both xargs
and awk
are present by default on the Alpine image. You can verify this by running the following commands:
docker run -it alpine /bin/sh -c "awk --help"
docker run -it alpine /bin/sh -c "xargs --help"
Rookie mistake. Letâs scratch gawk
and findutils
from our Dockerfile
.
# First stage: Install packages
FROM alpine:3.18 as builder
RUN apk add --no-cache \
bash \
coreutils \
git \
ncurses \
fzf
# Copy only the ugit script into the container at /app
WORKDIR /app
COPY ugit .
# Set permissions and move the script to path
RUN chmod +x ugit && mv ugit /usr/local/bin/
# Second stage: Copy only necessary binaries and their dependencies
FROM alpine:3.18
COPY --from=builder /usr/local/bin/ugit /usr/bin/
COPY --from=builder /usr/bin/git /usr/bin/
COPY --from=builder /usr/bin/fzf /usr/bin/
COPY --from=builder /usr/bin/tput /usr/bin/
COPY --from=builder /usr/bin/cut /usr/bin/
COPY --from=builder /usr/bin/tr /usr/bin/
COPY --from=builder /usr/bin/nl /usr/bin/
COPY --from=builder /usr/bin/env /bin/
COPY --from=builder /bin/bash /bin/
WORKDIR /app
# Run ugit when the container launches
CMD ["ugit"]
The image size is now down to 20 MB
. We are getting there. ugit still wonât run, though.
3rd attempt - Using scratch
at 2nd stage
This is the part, that most folks donât attempt. Itâs a bit scary and requires a huge commitment. I was a bit scared to go this route as well because I knew everything could be SCRATCHED đ.
A SCRATCH docker image is just an empty file system. It doesnât have anything at all. To try out a SCRATCH image, you can refer to docker hubâs README on it.
The only thing you need to know is that everything has to be put together by us. Letâs keep our hands on our hearts and replace alpine
with scratch
.
# First stage: Install packages
FROM alpine:3.18 as builder
RUN apk add --no-cache \
bash \
coreutils \
git \
ncurses \
fzf
# Copy only the ugit script into the container at /app
COPY ugit .
# Set permissions and move the script to path
RUN chmod +x ugit && mv ugit /usr/local/bin/
# Second stage: Copy only necessary binaries and their dependencies
FROM scratch
COPY --from=builder /usr/local/bin/ugit /usr/bin/
COPY --from=builder /usr/bin/git /usr/bin/
COPY --from=builder /usr/bin/fzf /usr/bin/
COPY --from=builder /usr/bin/tput /usr/bin/
COPY --from=builder /usr/bin/cut /usr/bin/
COPY --from=builder /usr/bin/tr /usr/bin/
COPY --from=builder /usr/bin/nl /usr/bin/
COPY --from=builder /usr/bin/env /bin/
COPY --from=builder /bin/bash /bin/
WORKDIR /app
# Run ugit when the container launches
CMD ["ugit"]
Doing this reduced the size of our image to 12.4MB
, a 60% reduction?? Did we just rickroll ourselves? Letâs try to run ugit.
$ docker run --rm -it -v $(pwd):/app ugit-a3
exec /usr/bin/ugit: no such file or directory
$ docker run --rm -it --entrypoint /bin/bash ugit-a4
exec /bin/bash: no such file or directory
Turns out, we broke the bash binary by not shipping its dependencies. Letâs see what we can do about it.
Identifying transitive dependencies
Okay, time to talk about transitive dependencies. Our script relies on binaries like git
, tput
, bash
; now some of these utils may have their dependencies.
We technically call these dependencies, shared libraries. Shared libraries are .so
(or in Windows .dll
, or in OS X .dylib
) files. ldd is a great tool to identify these dependencies. It lists all the libraries needed by a binary to execute. For example, if we run ldd /bin/bash
on a fresh Alpine container, we get the following output:
/ # ldd /bin/ls
/lib/ld-musl-aarch64.so.1 (0xffff8c9c0000)
libc.musl-aarch64.so.1 => /lib/ld-musl-aarch64.so.1 (0xffff8c9c0000)
/ # ldd /bin/bash
/lib/ld-musl-aarch64.so.1 (0xffffb905a000)
libreadline.so.8 => /usr/lib/libreadline.so.8 (0xffffb8f06000)
libc.musl-aarch64.so.1 => /lib/ld-musl-aarch64.so.1 (0xffffb905a000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0xffffb8e95000)
Primer on shared libraries
- Each lib has a soname, which is the name of the library file without the version number. They start with the prefix
lib
and end with.so
. For example,libpcre2-8.so.0
has a sonamelibpcre2-8.so
. - A fully qualified lib includes the directory where it is located. For example,
/usr/lib/libpcre2-8.so.0
. - Each lib has a real name, which is the name of the library file with the version number. For example,
libpcre2-8.so.0.3.6
could be a real name. - The hexadecimal is the base memory address where this library is loaded in memory. Not useful for us. We are only interested in the library names. Letâs do it for all the binaries we need to run ugit.
- The path to the right-hand side of
=>
symbol indicates the real path to that shared library. - The lib name starting with
libc
is the C library for that architecture. Remember our âexec /bin/bash: no such file or directoryâ error? This is the reason we got it. We didnât ship the C library for our architecture.
Below is an excerpt from Douglas Creagerâs article on Shared library versions which sums up shared libraries, please read the full article if you want to learn more about shared libraries.
With a shared library, you compile the library once and install it into a shared location in the filesystem (typically
/usr/lib
on Linux systems). Any project that depends on that shared library can use that shared, already compiled representation as-is.
Most Linux distributions further reduce compile times by distributing binary packages of popular libraries, where the distributionâs packaging system has compiled the code for you. By installing the package, you download a (hopefully signed) copy of the compiled library, and place it into the shared location, all without ever having to invoke the compiler (or any other part of the build chain that produced the library).
We use a similar approach to identify all the unique dependencies for all the binaries we need to run ugit.
âšď¸ We did this by running a command like
for cmd in /bin/*; do echo $cmd; ldd $cmd; done
# copy lib files
COPY --from=ugit-ops /usr/lib/libncursesw.so.6 /usr/lib/
COPY --from=ugit-ops /usr/lib/libncursesw.so.6.4 /usr/lib/
COPY --from=ugit-ops /usr/lib/libpcre* /usr/lib/
COPY --from=ugit-ops /usr/lib/libreadline* /usr/lib/
COPY --from=ugit-ops /lib/libacl.so.1 /lib/
COPY --from=ugit-ops /lib/libattr.so.1 /lib/
COPY --from=ugit-ops /lib/libc.musl-* /lib/
COPY --from=ugit-ops /lib/ld-musl-* /lib/
COPY --from=ugit-ops /lib/libutmps.so.0.1 /lib/
COPY --from=ugit-ops /lib/libskarnet.so.2.13 /lib/
COPY --from=ugit-ops /lib/libz.so.1 /lib/
Notice, that we are copying libc.musl-*
and ld-musl-*
from the builder image. This is because the build for these libs depends on the architecture of the host machine.
Shebangs #!
are useless
If you look at the very first line of ugit, youâll see a shebang #!/usr/bin/env bash
. The use of env
is considered a good practice when writing shell scripts, used to tell the OS which shell interpreter to use to run the script, this is ideal in an everyday dev machine. Linux (and older versions of macOS) get shipped with sh
, bash
, and on top of it, folks install zsh
etc.
But since using shebangs is optional, and we already copy the bash
binary, we just need to invoke our script using it. This saves us a couple of bytes in the image size as well. Close to 1.9 MB to be precise.
# Run ugit when the container launches
CMD ["/bin/bash", "/bin/ugit"]
# or
# ENTRYPOINT ["/bin/bash", "/bin/ugit"]
A look at terminfo
db
ugit has colors, thanks to tput
. We load up a bare-bones Alpine image with bash
and head over to the /etc/terminfo
directory. This directory contains the terminal info database.
37a1a77f70ed:/app# cd /etc/terminfo/
37a1a77f70ed:/etc/terminfo# ls
a d g k l p r s t v x
Each of these letter-based âdirectoriesâ represents different $TERM
types. For example, xterm
is a terminal type. If you run tput -T xterm colors
on your local machine, youâll get the number of colors your terminal supports. For xterm
it should be 8
, and in the case of xterm-256color
it should be 256
.
Now hereâs our chance, to only support 1 terminal type amongst the 40+ that are present in the terminfo
db. We can get rid of the rest of the terminal types. This saves us another 97Kb, very little but needed to clear up the clutter.
# copy terminfo database for only xterm-256color
COPY --from=ugit-ops /etc/terminfo/x/xterm-256color /usr/share/terminfo/x/
# Gib me all the colors
ENV TERM=xterm-256color
Everything needed to run ugit
The final Docker image sits at 17.6 MB with no security vulnerabilities (as reported by docker scout, at the time of writing this article). We have successfully reduced the image size by 40% compare to our first attempt.
Hereâs the final Dockerfile:
FROM alpine:3.18 as ugit-ops
RUN apk add --no-cache \
bash \
coreutils \
git \
ncurses \
curl
# Download fzf binary from GitHub, pin to 0.46.0, ugit requires minimum 0.21.0
RUN curl -L -o fzf.tar.gz https://github.com/junegunn/fzf/releases/download/0.46.0/fzf-0.46.0-linux_amd64.tar.gz && \
tar -xzf fzf.tar.gz && \
mv fzf /usr/bin/
# Copy only the ugit script into the container at /app
COPY ugit .
# Set permissions and move the script to path
RUN chmod +x ugit && mv ugit /usr/bin/
# Second stage: Copy only necessary binaries and their dependencies
FROM scratch
COPY --from=ugit-ops /usr/bin/ugit /bin/
COPY --from=ugit-ops /usr/bin/git /usr/bin/
COPY --from=ugit-ops /usr/bin/fzf /usr/bin/
COPY --from=ugit-ops /usr/bin/tput /usr/bin/
COPY --from=ugit-ops /usr/bin/nl /usr/bin/
COPY --from=ugit-ops /usr/bin/awk /usr/bin/
COPY --from=ugit-ops /usr/bin/xargs /usr/bin/
COPY --from=ugit-ops /usr/bin/cut /usr/bin/cut
COPY --from=ugit-ops /usr/bin/tr /usr/bin/tr
COPY --from=ugit-ops /bin/bash /bin/
COPY --from=ugit-ops /bin/sh /bin/
# copy lib files
COPY --from=ugit-ops /usr/lib/libncursesw.so.6 /usr/lib/
COPY --from=ugit-ops /usr/lib/libncursesw.so.6.4 /usr/lib/
COPY --from=ugit-ops /usr/lib/libpcre* /usr/lib/
COPY --from=ugit-ops /usr/lib/libreadline* /usr/lib/
COPY --from=ugit-ops /lib/libacl.so.1 /lib/
COPY --from=ugit-ops /lib/libattr.so.1 /lib/
COPY --from=ugit-ops /lib/libc.musl-* /lib/
COPY --from=ugit-ops /lib/ld-musl-* /lib/
COPY --from=ugit-ops /lib/libutmps.so.0.1 /lib/
COPY --from=ugit-ops /lib/libskarnet.so.2.13 /lib/
COPY --from=ugit-ops /lib/libz.so.1 /lib/
# copy terminfo database
COPY --from=ugit-ops /etc/terminfo/x/xterm-256color /usr/share/terminfo/x/
# Gib me all the colors
ENV TERM=xterm-256color
WORKDIR /app
# Run ugit when the container launches
CMD ["/bin/bash", "/bin/ugit"]
I decided to pin the version of fzf
to 0.46.0
(the latest at the time of writing this article) because ugit requires a minimum 0.21.0
to run, and I figured what the heck, letâs pin it to the latest version.
âšď¸
docker run --rm -it -v $(pwd):/app ugit
will run the ugit container. Make sure your current directory is a git repo.
This is how the final file system tree looks like:
This is everything to make our shell app work. No more, no less. Time for a beer? đş
PS: The final docker image came down to 16.2 MB. You can find the updated Dockerfile here. For the sake of this article, I kept the image size at 17.6 MB.
Could we reduce the size further?
Yes, but there are 2 reasons why I didnât go any further:
Reason 1: Pin minimum version of fzf
?
At the time of writing ugit, the script relied on fzf
minimum version 0.20.0
, itâs granted that the latest version is going to be larger than the minimum required version. So we should pin the old version, right? No. because then it introduces security vulnerabilities with fzfâs dependencies, i.e., Golang. As reported by docker scout quickview
, the older version of Golang has a total of 66 security issues. Maybe they affect the image, maybe they donât. But I am not taking that risk, I want to keep the image as clean as possible.
âšď¸ In the Alpine ecosystem, it is generally not advised to pin minimum versions of packages.
Reason 2: Use the latest bash features?
At the time of writing ugit, I relied on bash
version 4.0
. Both tr and cut could be replaced, if I shifted to a newer version of bash. i.e, 5.0. And that my friends is a breaking change. Getting rid of and would have saved me a couple of bytes, but I didnât want to break the script for folks who are still on bash 4.0. It doesnât matter if I am the only one left, my machine still has bash.
You didnât try docker-slim
?
- I did, it did slim down my image, but it also broke the script with missing dependencies. Slim is great, the reader should check it out. Unfortunately, I couldnât get it to work for ugit in my limited time.
- Moreover, I wanted to learn how to do it with my own hands, rather than rely on a tool to do it for me.
You didnât try docker-squash
?
I did, and the size optimization was nearly negligible. Hereâs a log of the squash process which I ran on a Linux (AMD) machine (ignore the size of the image, since we are on different architecture, the image size is different):
Learnings
- Linux is awesome. Everything, every design decision, every tool inspiration that came out of it inherits the same awesomeness.
- Never shy away from going into deep đ rabbit holes of micro-optimizations. You learn a lot. Thereâs always something new to learn.
Acknowledgements
- Big thanks to the authors of
ldd
, and everyone in thedocker
&alpine
linux community. - Thanks to folks on the developersIndia discord for helping out with advice & suggestions.
- dive for helping me visualize the image layers.
Resources
- TLDP.org - Shared Libraries
- Dockerfile best practices
- About docker storage drivers
- How to reduce Docker image size for IoT devices
I hope you guys got something interesting to read today. Until next time, happy hacking & before you go please give ugit a star & a docker pull? đ§Ą
Read what the community is saying on hackernews & reddit.