ChatGPT解决这个技术问题 Extra ChatGPT

Docker COPY files using glob pattern?

I have a monorepo managed by Yarn, I'd like to take advantage of the Docker cache layers to speed up my builds, to do so I'd like to first copy the package.json and yarn.lock files, run yarn install and then copy the rest of the files.

This is my repo structure:

packages/one/package.json
packages/one/index.js
packages/two/package.json
packages/two/index.js
package.json
yarn.lock

And this is the interested part of the Dockerfile:

COPY package.json .
COPY yarn.lock .
COPY packages/**/package.json ./
RUN yarn install --pure-lockfile
COPY . .

The problem is that the 3rd COPY command doesn't copy anything, how can I achieve the expected result?

Basically what you'd like to do can't work as is, because the specified target folder is the same for several files that have the same name (package.json). Similarly, the Bash command cp packages/*/package.json ./ wouldn't yield something sensible. So I believe you should hard-code in your Dockerfile the path of folders one and two...
so, which solution did you choose?
I haven't chosen any of them. I can't use external scripts in my environment.

m
mbelsky

There is a solution based on multistage-build feature:

FROM node:12.18.2-alpine3.11

WORKDIR /app
COPY ["package.json", "yarn.lock", "./"]
# Step 2: Copy whole app
COPY packages packages

# Step 3: Find and remove non-package.json files
RUN find packages \! -name "package.json" -mindepth 2 -maxdepth 2 -print | xargs rm -rf

# Step 4: Define second build stage
FROM node:12.18.2-alpine3.11

WORKDIR /app
# Step 5: Copy files from the first build stage.
COPY --from=0 /app .

RUN yarn install --frozen-lockfile

COPY . .

# To restore workspaces symlinks
RUN yarn install --frozen-lockfile

CMD yarn start

On Step 5 the layer cache will be reused even if any file in packages directory has changed.


K
KYDronePilot

As mentioned in the official Dockerfile reference for COPY <src> <dest>

The COPY instruction copies new files or directories from and adds them to the filesystem of the container at the path .

For your case

Each may contain wildcards and matching will be done using Go’s filepath.Match rules.

These are the rules. They contain this:

'*' matches any sequence of non-Separator characters

So try to use * instead of ** in your pattern.


Thanks for the reply, I tried that as well but it does the same (nothing)
I just tried it with this dockerfile, and it works: FROM ubuntu WORKDIR /app COPY */*.csproj /app/ When I ran it, here is the correct output: $ docker run --rm -ti temp ls /app foo.csproj bar.csproj
Any idea how to make the folder structure match when it gets copied? Using just this makes it all go to the current directory
@GiovanniBassi, the script in your comment doesn’t work as expected. Each .csproj should be copied to appropriate subfolder(e.g. app/foo/foo.csproj) not to the root app/foo.csproj
E
ErikMD

If you can't technically enumerate all the subdirectories at stake in the Dockerfile (namely, writing COPY packages/one/package.json packages/one/ for each one), but want to copy all the files in two steps and take advantage of Docker's caching feature, you can try the following workaround:

Devise a wrapper script (say, in bash) that copies the required package.json files to a separate directory (say, .deps/) built with a similar hierarchy, then call docker build …

Adapt the Dockerfile to copy (and rename) the separate directory beforehand, and then call yarn install --pure-lockfile…

All things put together, this could lead to the following files:

./build.bash:

#!/bin/bash

tag="image-name:latest"

rm -f -r .deps  # optional, to be sure that there is
# no extraneous "package.json" from a previous build

find . -type d \( -path \*/.deps \) -prune -o \
  -type f \( -name "package.json" \) \
  -exec bash -c 'dest=".deps/$1" && \
    mkdir -p -- "$(dirname "$dest")" && \
    cp -av -- "$1" "$dest"' bash '{}' \;
# instead of mkdir + cp, you may also want to use
# rsync if it is available in your environment...

sudo docker build -t "$tag" .

and

./Dockerfile:

FROM …

WORKDIR /usr/src/app

# COPY package.json .  # subsumed by the following command
COPY .deps .
# and not "COPY .deps .deps", to avoid doing an extra "mv"
COPY yarn.lock .
RUN yarn install --pure-lockfile

COPY . .
# Notice that "COPY . ." will also copy the ".deps" folder; this is
# maybe a minor issue, but it could be avoided by passing more explicit
# paths than just "." (or by adapting the Dockerfile and the script and
# putting them in the parent folder of the Yarn application itself...)

J
Joost

Using Docker's new BuildKit executor it has become possible to use a bind mount into the Docker context, from which you can then copy any files as needed.

For example, the following snippet copies all package.json files from the Docker context into the image's /app/ directory (the workdir in the below example)

Unfortunately, changing any file in the mount still results in a layer cache miss. This can be worked around using the multi-stage approach as presented by @mbelsky, but this time the explicit deletion is no longer need.

# syntax = docker/dockerfile:1.2
FROM ... AS packages

WORKDIR /app/
RUN --mount=type=bind,target=/docker-context \
    cd /docker-context/; \
    find . -name "package.json" -mindepth 0 -maxdepth 4 -exec cp --parents "{}" /app/ \;

FROM ...

WORKDIR /app/
COPY --from=packages /app/ .

The mindepth/maxdepth arguments are specified to reduce the number of directories to search, this can be adjusted/removed as desirable for your use-case.

It may be necessary to enable the BuildKit executor using environment variable DOCKER_BUILDKIT=1, as the traditional executor silently ignores the bind mounts.

More information about BuildKit and bind bounds can be found here.


Thanks. I've tried it and it's worth mentioning that changing any file (not only package.json) will cause the copy step to run again so in that sense, it has no advantage over just copying the whole code and run npm install
@Arik oh, that is interesting! And a bit surprising to me; I would expect that the resulting image would have the same SHA, hence allowing subsequent layers to be reused. I have seen this work but you may be correct that it was only when nothing else changed. Needs more investigation to see if this can be made to work then!
@Arik Some experiments led me to believe that the multi-stage trick is still necessary to achieve the desired layer caching. I have updated to example accordingly. Thanks for your observation and comment!
I've added my solution as an answer
This is absolutely amazing!!! @Joost, this completely solved my problem and I will apply this to A LOT of images! thank you!
A
Arik

Following @Joost suggestion, I've created a dockerfile that utilizes the power of BuildKit to achieve the following:

Faster npm install by moving npm's cache directory to the build cache

Skipping npm install if nothing changed in package.json files since last successful build

Pseudo Code:

Get all package.json files from the build context

Compare them to the package.json files from the last successful build

If changes were found, run npm install and cache the package.json files + node_modules folder

Copy the node_modules (fresh or cached) to the desired location in the image

# syntax = docker/dockerfile:1.2
FROM node:14-alpine AS builder

# https://github.com/opencollective/opencollective/issues/1443
RUN apk add --no-cache ncurses

# must run as root
RUN npm config set unsafe-perm true

WORKDIR /app

# get a temporary copy of the package.json files from the build context
RUN --mount=id=website-packages,type=bind,target=/tmp/builder \
    cd /tmp/builder/ && \
    mkdir /tmp/packages && \
    chown 1000:1000 /tmp/packages && \
    find ./ -name "package.json" -mindepth 0 -maxdepth 6 -exec cp --parents "{}" /tmp/packages/ \;

# check if package.json files were changed since the last successful build
RUN --mount=id=website-build-cache,type=cache,target=/tmp/builder,uid=1000 \
    mkdir -p /tmp/builder/packages && \
    cd /tmp/builder/packages && \
    (diff -qr ./ /tmp/packages/ || (touch /tmp/builder/.rebuild && echo "Found an updated package.json"));

USER node

COPY --chown=node:node . /app

# run `npm install` if package.json files were changed, or use the cached node_modules/
RUN --mount=id=website-build-cache,type=cache,target=/tmp/builder,uid=1000 \
    echo "Creating NPM cache folders" && \
    mkdir -p /tmp/builder/.npm && \
    mkdir -p /tmp/builder/modules && \
    echo "Copying latest package.json files to NPM cache folders" && \
    /bin/cp -rf /tmp/packages/* /tmp/builder/modules && \
    cd /tmp/builder/modules && \
    echo "Using NPM cache folders" && \
    npm config set cache /tmp/builder/.npm && \
    if test -f /tmp/builder/.rebuild; then (echo "Installing NPM packages" && npm install --no-fund --no-audit --no-optional --loglevel verbose); fi && \
    echo "copy cached NPM packages" && \
    /bin/cp -rfT /tmp/builder/modules/node_modules /app/node_modules && \
    rm -rf /tmp/builder/packages && \
    mkdir -p /tmp/builder/packages && \
    cd /app && \
    echo "Caching package.json files" && \
    find ./ -name "package.json" -mindepth 0 -maxdepth 6 -exec cp --parents "{}" /tmp/builder/packages/ \; && \
    (rm /tmp/builder/.rebuild 2> /dev/null || true);

Note: I'm only using the node_modules of the root folder, as in my case, all the packages from inner folders are hoisted to the root


D
Darren Ha

just use .dockerignore to filter out not needed files. refer this reference

in your cases, add this to your .dockerignore.

*.js any file to skip copy

I assume your files are located like /home/package.json, and want to copy those files to /dest in docker.

Dockerfile would look like this. COPY /home /dest

this will copy all files to /home directory except list in .dockerignore


How am I supposed to copy the rest of the files then?
@FezVrasta It will recursively copy entire directory except listed in .dockerignore file.
Exactly. Please read the question. I need to copy that whole directory, but in two steps