Troubleshooting permission issues when building Docker containers

Docker containers run by default as root. In general, it is not a recommended practice as it poses a serious security risk. This risk can be mitigated by configuring a non-root user to run the container. One of the ways to achieve this is to use the USER instruction in the Dockerfile. While running a container as a non-root user is the right thing to do, it can often be problematic as insufficient permissions can lead to hard to diagnose errors. This post uses an example node application to discus a few permission-related issues that can pop up when building a non-root container along with some strategies that can help troubleshoot this kind of issues.

Example

Let’s start with a very simple Dockerfile for a node application:

FROM node:16
WORKDIR /usr/app
COPY . .
RUN npm install
CMD [ "node", "/usr/app/index.js" ]

The problem with this Docker file is that any Docker container created based on this file will run as root:

To fix that we can modify the Docker file to create a new user (let’s call it app-user) and move the application to a sub-directory in the user home directory like this:

FROM node:16
ENV HOME=/home/app-user
RUN useradd -m -d $HOME -s /bin/bash app-user
RUN chown -R app-user:app-user $HOME
USER app-user
WORKDIR $HOME/app
COPY . .
RUN npm install
CMD [ "node", "index.js" ] 

Unfortunately, introducing these changes makes it impossible to build a docker image – npm install now errors out due to insufficient permissions:

Step 8/9 : RUN npm install
 ---> Running in a0800340b850
npm ERR! code EACCES
npm ERR! syscall mkdir
npm ERR! path /home/app-user/app/node_modules
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, mkdir '/home/app-user/app/node_modules'
npm ERR!  [Error: EACCES: permission denied, mkdir '/home/app-user/app/node_modules'] {
npm ERR!   errno: -13,
npm ERR!   code: 'EACCES',
npm ERR!   syscall: 'mkdir',
npm ERR!   path: '/home/app-user/app/node_modules'
npm ERR! }
...

Inspecting the app directory shows that the owner of this directory is root and other users don’t haver the write permission:

app-user@d0b48aa18141:~$ ls -l ~
total 4
drwxr-xr-x 1 root root 4096 Jan 15 05:48 app

The error is related to using the WORKDIR instruction to set the working directory to $HOME/app. It’s not a problem by itself – it’s actually recommended to use WORKDIR to set the working directory. The problem is that because the directory didn’t exist, WORKDIR created one, but it made root the owner. The issue can be easily fixed by explicitly creating a working directory with the right permissions before the WORKDIR instruction runs to prevent WORKDIR from creating the directory. The new Dockerfile that contains this fix looks as follows:

FROM node:16
ENV HOME=/home/app-user
RUN useradd -m -d $HOME -s /bin/bash app-user
RUN mkdir -p $HOME/app
RUN chown -R app-user:app-user $HOME
USER app-user
WORKDIR $HOME/app
COPY . .
RUN npm install
CMD [ "node", "index.js" ]

Unfortunately, this doesn’t seem to be enough. Building the image still fails due to a different permission issue:

Step 10/11 : RUN npm install
 ---> Running in 860132289a60
npm ERR! code EACCES
npm ERR! syscall open
npm ERR! path /home/app-user/app/package-lock.json
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, open '/home/app-user/app/package-lock.json'
npm ERR!  [Error: EACCES: permission denied, open '/home/app-user/app/package-lock.json'] {
npm ERR!   errno: -13,
npm ERR!   code: 'EACCES',
npm ERR!   syscall: 'open',
npm ERR!   path: '/home/app-user/app/package-lock.json'
npm ERR! }
...

The error message indicates that this time the problem is that npm install cannot access the package-lock.json file. Listing the files shows again that all copied files are owned by root and other users don’t have the write permission:

ls -l
total 12
-rw-r--r-- 1 root root  71 Jan 15 02:03 index.js
-rw-r--r-- 1 root root 849 Jan 15 01:36 package-lock.json
-rw-r--r-- 1 root root 266 Jan 15 05:21 package.json

Apparently, the COPY instruction by default uses root privileges so, the files will be owned by root even if the COPY instruction appears the USER instruction. An easy fix is to change the Dockerfile to copy the files before configuring file ownership (alternatively, it is possible specify a different owner for the copied files with the --chown switch):

FROM node:16
ENV HOME=/home/app-user
RUN useradd -m -d $HOME -s /bin/bash app-user
RUN mkdir -p $HOME/app
COPY . .
RUN chown -R app-user:app-user $HOME
USER app-user
WORKDIR $HOME/app
RUN npm install
CMD [ "node", "index.js" ]

Annoyingly, this still doesn’t work – we get yet another permission error:

Step 9/10 : RUN npm install
 ---> Running in d4ebcec114cb
npm ERR! code EACCES
npm ERR! syscall mkdir
npm ERR! path /node_modules
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, mkdir '/node_modules'
npm ERR!  [Error: EACCES: permission denied, mkdir '/node_modules'] {
npm ERR!   errno: -13,
npm ERR!   code: 'EACCES',
npm ERR!   syscall: 'mkdir',
npm ERR!   path: '/node_modules'
npm ERR! }
...

This time the error indicates that npm install tried creating the node_modules directory directly in the root directory. This is unexpected as the WORKDIR instruction was supposed to set the default directory to the app directory inside the newly created user home directory. The problem is that the last fix was not completely correct. Before, COPY was executed after WORKDIR so it copied the files to the expected location. The fix moved the COPY instruction so that it is now executed before the WORKDIR instruction. This resulted in copying the application files to the container’s root directory, which is incorrect. Preserving the relative order of these two instructions should fix the error:

FROM node:16
ENV HOME=/home/app-user
RUN useradd -m -d $HOME -s /bin/bash app-user
RUN mkdir -p $HOME/app
WORKDIR $HOME/app
COPY . .
RUN chown -R app-user:app-user $HOME
USER app-user
RUN npm install
CMD [ "node", "index.js" ]

Indeed, building an image with this Dockerfile finally yields:

Successfully built b36ac6c948d3

Yay!

The application also runs as expected:

Debugging strategies

Reading about someone’s errors is one thing, figuring the errors out oneself is another. Below are a few debugging strategies I used to understand the errors described in the first part of the post. Even though I mention them in the context of permission errors they can be applied in a much broader set of scenarios.

Carefully read error messages

All error messages we looked at were very similar, yet each signaled a different problem. While the errors didn’t point directly to the root cause, the small hints were very helpful in understanding where to look to investigate the problem.

Check Docker documentation

Sometimes our assumptions about how the given instruction runs may not be correct. Docker documentation is the best place to verify these assumptions and understand if the wrong assumptions could be the culprit (e.g. the incorrect assumption that the COPY will make the current user the owner of the copied files).

Add additional debug info to Dockerfile

Sometimes it is helpful to print additional debug information when building a docker image. Some commands I used were:

  • RUN ls -al
  • RUN pwd
  • RUN whoami

They allowed me understand the state the container was in at a given time. One caveat is that by default docker caches intermediate steps when building containers which may result in not printing the debug information when re-building a container if no changes were made as the step was cached.

Run the failing command manually and/or inspect the container

This is the ultimate debugging strategy – manually reproduce the error and inspect the container state. One way to make it work is to comment out all the steps starting from the failing one and then build the image. Once the image is build start a container like this (replace IMAGE with the image id):

docker run -d IMAGE tail -f /dev/null

This will start the container whose state is just as it was before the failing step was executed. The command will also keep the container running which makes it possible for you to launch bash within the container (replace CONTAINER with the container id returned by the previous command):

docker exec -it CONTAINER /bin/bash

Once inside the container you can run the command that was failing (e.g. npm install). Since the container is in the same state it was when it failed to build you should be able to reproduce the failure. You can also easily check for the factors that caused the failure.

Conclusion

This post showed how to create a docker container that is not running as root and discussed a few permission issues encountered in the process. It also described a few debugging strategies that can help troubleshoot a wide range of issues – including issues related to permissions. The code for this post is available on github in my docker-permissions repo.

Craigslist automation

Update: this project is now available on npm: https://www.npmjs.com/package/craigslist-automation

Long time ago Craigslist allowed accessing their post via RSS. It was possible to append &format=rss to the Craigslist’s URL query string to get a programmatic access to posts. Unfortunately, Craigslist stopped supporting RSS a few years ago and it does not seem like it (or a replacement) is going to be available anytime soon, if ever. With RSS gone, the community stepped up and created python-craigslist  – a Python package that allows accessing Craigslist posts from a Python program. I remember experimenting with it some time ago and it worked pretty well. I tried it again last night and to my surprise I couldn’t get any results for my queries. I checked the project’s repo, and I quickly found an issue that looked exactly like mine. The issue points out that the HTML that Craigslist returns no longer contains posts but a message mentioning that to see the page a browser with JavaScript support is required. This breaks the python-craigslist library as it just sends HTTP requests and simply parses the returned HTML. It seems, Craigslist no longer serves results as plain old HTML but is using JavaScript to build the post gallery dynamically. Not being a web developer, it surprised me to see the same behavior when using a browser – out of curiosity I loaded the “cars+trucks” for sale post gallery, checked the page source, and saw the same message as mentioned in the GitHub issue. However, after inspecting the DOM with the built-in developer tools, I could see individual posts.  

For my experiment, the python-craigslist was an option anymore and I needed a different solution. I spend a few minutes looking at network request Craigslist was sending, and it was clear that making sense out of it would require a lot of effort. What I wanted was something that can act the same way as a browser only can be driven programmatically.  

Enter the headless browser 

When I described what I wanted, I realized this was an exact definition of a headless browser – a browser that can run without a graphical user interface. I knew Chrome could run in the headless mode and could be controlled from a Node.js project as I had played with it a few years earlier. Because it had been a while, I wanted to check how people do this these days. Sure enough, I quickly found puppeteer – a Node.js library that allows interacting with headless Chrome. I quickly created a new Node.js project, configured it to use TypeScript and voila – with a few lines of code:

import * as puppeteer from "puppeteer";
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(
"https://seattle.craigslist.org/search/cta?query=blazer%20k5",
{
waitUntil: "networkidle0",
}
);
let elements = await page.$$("a.post-title");
console.log(elements.length);
await Promise.all(
elements.map(async (e) => {
let href = await e.getProperty("href");
console.log(await href.jsonValue());
})
);
await browser.close();
})();

I was able to get links to listings from my query:

Obviously this is only a simple prototype but could be useful to conduct simple experiments.

The SignalR for ASP.NET Core JavaScript Client, Part 2 – Outside the Browser

Last time we looked at using the ASP.NET Core SignalR TypeScript/JavaScript client in the browser. I mentioned, however, that the new client no longer has dependencies that prevent from using it outside the browser. So, today we will try taking the client outside the browser and use it in a NodeJS application. We will add a NodeJS client for the SignalR Chat service we created last time. Initially we will write the client in JavaScript and then we will convert it to TypeScript.

Let’s start from creating a new folder in the SignalRChat repo and adding a new node project:

mkdir SignalRChatNode
cd SignalRChatNode
npm init

We will call the application signarlchatnode and we will leave all other options set to default values. (6425ec1)

Our application will read messages typed by the user and send them to the server. To handle user input we will use node’s readline module. To see that things, work, let’s just add code to prompts the user for the name and displays it in the console. We will use it a starting point of our application (34bc493).

const readline = require('readline');
let rl = readline.createInterface(process.stdin, process.stdout)

rl.question('Enter your name: ', name => {
console.log(name);
  rl.close();
});

To communicate with the SignalR server we need to add the SignalR JavaScript client to the project using the following command (7875c07):

npm install @aspnet/signalr-client --save

We can now try starting the connection like this (3228a10):

const readline = require('readline');
const signalR = require('@aspnet/signalr-client');

let rl = readline.createInterface(process.stdin, process.stdout);

rl.question('Enter your name: ', name => {
  console.log(name);

  let connection = new signalR.HubConnection('http://localhost:5000/chat');
  connection.start()
  .catch(error => {
    console.error(error);
    rl.close();
  });
});

The code looks good but if you try running it, it will immediately fail with the following error:

Error: Failed to start the connection. ReferenceError: XMLHttpRequest is not defined
ReferenceError: XMLHttpRequest is not defined

What happened? The new JavaScript client no longer depends on the browser but still uses standard libraries like XmlHttpRequest or WebSocket to communicate with the server. If these libraries are not provided the client will fail. Fortunately, the required functionality can be easily polyfilled in the NodeJS environment. For now, we will just stick the polyfills on the global object. It’s not beautiful by any means but will do the trick. We are discussing how to make it better in the future but at the moment this is the way to go.

Depending on the features of SignalR you plan to use you will need to provide appropriate polyfills. Currently the absolute minimum is XmlHttpRequest. SignalR client uses it to send the initial OPTIONS HTTP request which initializes the connection on the server side and for the long polling transport. So, if use the long polling transport only, XmlHttpRequest is the only polyfill you will need to provide . If you want to use the WebSockets transport you will need a WebSocket polyfill in addition to XmlHttpRequest. (We are thinking about skipping sending the OPTIONS request for WebSockets. If this is implemented you will not need the XmlHttpRequest polyfill when using the WebSockets transport.) For ServerSentEvents transport you will need an EventSource polyfill. Finally, if you happen to use binary protocols (e.g. MessagePack) over the ServerSentEvent transport you will need polyfills for atob/btoa functions. For simplicity, we will use the WebSocket transport in our application so we will add only polyfills for XmlHttpRequest and WebSockets:

npm install websocket xmlhttprequest --save

and make them available globally via:

XMLHttpRequest = require('xmlhttprequest').XMLHttpRequest;
WebSocket = require('websocket').w3cwebsocket;

If we run the code now we will see something like this:

moozzyk:~/source/SignalRChat/SignalRChatNode$ node index.js
Enter your name: moozzyk
moozzyk
Information: WebSocket connected to ws://localhost:5000/chat?id=0d015ce4-3a78-4313-9343-cb6183a5e8ea
Information: Using HubProtocol 'json'.

which tells us that the client was able to connect successfully to the server. (946f85d)

Now, we need to add some code to handle user input and interact with the server and our Node SignalR Chat client is ready. (I admit that the user interface is not very robust but should be enough for the purpose of this post). You can now talk to browser clients from your node client and vice versa (0f7f71f):

Screen Shot 2017-09-30 at 6.57.14 PM

Now let’s convert our client to TypeScript. We will start from creating a new TypeScript project with tsc --init. In the generated tsconfig.json file we will change the target to es6. We will also add an empty index.ts file and delete the existing index.js file (we will no longer need the index.js file since we will now be generating one by compiling the newly created index.ts). (b83cf92) If you now run tsc you should see an empty index.js file created as a result of compiling the index.ts file.  The last thing to do is to actually convert our JavaScript code to TypeScript. We could just translate it one-to-one but we can do a little better. TypeScript supports async/await which makes writing asynchronous code much easier. Since many of SignalR client methods return Promises we can just await these calls instead of using .then/.catch functions. Here is how our node SignalRChat client written in TypeScript looks like (2a6d0e9):

import * as readline from "readline"
import * as signalR from "@aspnet/signalr-client"

(<any>global).XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest;
(<any>global).WebSocket = require("websocket").w3cwebsocket;

let rl = readline.createInterface(process.stdin, process.stdout);

rl.question("Enter your name: ", async name => {
  console.log(name);
  let connection = new signalR.HubConnection("http://localhost:5000/chat");

  connection.on("broadcastMessage", (name, message) => {
    console.log(`${name}: ${message}`);
    rl.prompt(true);
  });

  try {
    await connection.start();
    rl.prompt();

    rl.on("line", async input => {
      if (input === "!q") {
        console.log("Stopping connection...");
        connection.stop();
        rl.close();
        return;
      }
      await connection.send("send", name, input);
    });
  }
  catch (error) {
    console.error(error);
    rl.close();
  }
});

You can run it by executing the following commands:
tsc
node index.js

Today we learned how to use the ASP.NET Core SignalR client in the NodeJS environment. We created a small node JavaScript application that was able to communicate with browser clients which. Finally, we converted the JavaScript code to TypeScript and learn a little bit about the TypeScript’s async/await feature.