Language selector

Testing (if there are no) pinned Virtual Threads

In the previous post we were testing if Java’s Virtual Threads aren’t used in a way that makes them pinned. It’s because we want our Platform Threads (especially those in the ForkJoin Pool Co.) to keep carrying more and more new Virtual Threads, instead of keeping the engine running while VTs are pinned. (At least as long as we can’t provide our own thread pools to carry VTs.)
Spoiler alert: we’re going to use Testcontainers and Toxiproxy today!

The “WHY”

Things can be tested in many ways usually. There’s no “one-size-fits-all” solution here. Sometimes unit tests are just enough, sometimes we need something more. How we tested for pinned Virtual Threads using JfrUnit is a nice approach, but there might be times, where something more opaque might be needed. And something resembling actual production environment much better. So today we’re going to achieve a similar effect using Testcontainers, for the sake of education and some fun!

Previous approach with unit test had two obvious flaws.

First, it’s using a fixed port 8080. There have been quite a few posts and talks explaining why this is one of these Really Bad Ideas™. I shall say: if you haven’t encountered issues while testing something and using a fixed port for that, chances are you haven’t been testing things for real and at bigger scale. Just one hint: have you considered parallel builds on the same machine? Will that work? Okay, another one: what if you discover (hopefully not Friday afternoon), that you can’t run the system with the same port in production environment and oopsies, there’s no way to configure our system to do so?
Luckily for us, Testcontainers supports port randomization by default.

Second, the network. We all know that a) it’s always the DNS, and b) the latency is never zero. (And we also know a few other things about networks, don’t we? ;-)) Let’s be honest, the idea of enforcing the latency using heavy calculation like random.ints(15_000_000).sum() is really dumb and wasteful. When was the last time when you could control the latency by adding and removing millions of ints here and there in real production system? Exactly…
And in case you’re tempted for Thread.sleep(), ask yourself the same question.
For emulating there’s-always-some-latency network we’re going to use something more clever, which is Toxiproxy.

There are also more benefits of using Testcontainers, of course. The most important one might be relying on Docker from within the code! Basically, instead of manually creating a server, or installing a 3rd party solution at every developer’s machine and every CI environment (and daemons know where else), we can simply declare (via our code): “hey, fetch system ABC, add some files, attach it to network, set the latency to be more or less this” and then use this whole sophisticated machinery to test our stuff.

In the test, we’re going to use three test containers. We will write our test relying on the awesome JUnit 5 library, but despite the “unit” in its name, it’s not going to be a unit test. Basically, we’re only going to drive our test using JUnit. To make things somewhat simpler to our friends from non-Java ecosystems, let’s try not to rely on the JUnit integration, that already exists in Testcontainers.

Preparation

The first container we’re going to use is going to be nginx. Yes, instead of creating an HTTP server “manually”, let’s use a solution that’s already out there. Also, to host something, let’s create a simple test/resources/index.html file (in modern Java ecosystem it’s a resource that’s going to be used only while testing). Then we’re going to use this file as follows to copy it to a container later on:

var index = MountableFile.forClasspathResource("index.html");

Declaring the nginx container can look like this, we’re going to use "nginx:1.23.1" version (it’s good to be specific), we’re going to copy the index.html file to /usr/share/nginx/html/ (because that’s where nginx hosts files from), we’re also going to tell Testcontainers that the nginx is not fully ready to be used in testing, until the response code for requests sent to index isn’t 200: for this we’ll be waitingFor.

var nginx = new NginxContainer<>("nginx:1.23.1")
    .withCopyFileToContainer(index, "/usr/share/nginx/html/index.html")
    .waitingFor(new HttpWaitStrategy());

One of the reasons I joined AtomicJar was to be close to Testcontainers, because this library made my life so much easier… (And would make in all those places where I couldn’t use it, because reasons.) However, somehow I managed to “avoid” Toxiproxy in Testcontainers. That is, until Oleg showed it to me ;-)

So let me show it to you and let’s use it, which isn’t difficult:

var toxiproxy = new ToxiproxyContainer("ghcr.io/shopify/toxiproxy:2.5.0")
    .withNetworkAliases("toxiproxy")

As you can see, we’re still specifying the version (because it’s good) and we can even grab images not from Docker Hub, we also alias this very container as toxiproxy (for now think of it as of the host name).

Next we’re going to start the containers. And to make things happen faster, let’s start both at the same time:

Stream.of(nginx, toxiproxy).parallel().forEach(GenericContainer::start);

(Here we could use some JUnit integration, like @Container or even something darker, if you’re using JDBC… If you use that in your tests already, there’s no need to touch this.)

Fun will now commence

Now it’s time to be toxic to commence the fun!

Let’s intoxicate the connection that our System Under Test is going to utilise:

var proxy = toxiProxy.getProxy(nginx, 80);
proxy.toxics()
    .latency("latency", ToxicDirection.DOWNSTREAM, 500)
    .setJitter(50);

This way we’re making the connection between the client and nginx last ~500ms +/- 50ms.

Container-izing our stuff

The artifact we’d like to release to the world (so the world could admire our creation) after the build is in target/concurrency-1.0-SNAPSHOT.jar directory (that’s Maven’s standard). We could simply run it, of course, because we’re trying to migriate our system to Java 19, so we, the developers, have this version of Java installed. However, the CI/CD systems might not (because “LTS only policy blah blah blah yadda yadda” or because having many Java versions might be tricky). Also, to make sure that the kraken we’re about to release doesn’t change even in a single bit, we might want to build it before testing and test exactly that in an isolated container.

Therefore, we’re going to create yet another container, just to container-ize our gem, and make sure it can run using Java 19:

var container = new GenericContainer<>("eclipse-temurin:19-alpine")
.withCopyFileToContainer(jar, "/tmp/test.jar")
.withExposedPorts(8000)
.withCommand("jwebserver")

We’re copying our gem to /tmp/test.jar, to call it later. The last two lines aren’t going to be used in our test directly, this way we can tell this Java container not to exit immediately after starting, which happens like this: container.start(); because jwebserver will keep running and Testcontainers won’t exit the start() before the jwebserver is listening on this port inside the container.
To put it in other words: there’s no magic here. Testcontainers needs to know when the container it’s starting can be considered as started. There are various WaitStrategy implementations, exposing a port is a trick which makes TC wait for that port.
Finally, we’re composing our stuff without Yelling At My Laptop™ ;-)

Telling out client to call over intoxicated network

Before we tell our client that it should call somewhere, we need to know the where. The whole idea is not to call the nginx from the container directly, but over the intoxicated network. Therefore, the URI to call should be:

var uriFromContainer = String.format("http://%s:%d/", "toxiproxy", proxy.getOriginalProxyPort());

Okay, now we can call the client (it was copied to /tmp/test.jar), tell it to connect to uriFromContainer, and let’s not forget about the most important thing, to log if any of the Virtual Threads has been pinned! As per JEP-425, if we tell our Java process to log the occurrences of pinned VTs using -Djdk.tracePinnedThreads=full, we shall expect full stack traces in the standard output, telling us where it took place exactly.

So this is how we run our artifact inside the Java container:

var result = container
    .execInContainer("java", "--enable-preview", "-Djdk.tracePinnedThreads=full", "-jar", "/tmp/test.jar", uriFromContainer);

After the process exits, we can assert that the exit code of our program was still zero (i.e. the process exited normally):

Assertions.assertEquals(0, result.getExitCode());

And that there was no onPinned in the standard output (which would appear, if any of our Virtual Threads got, well, pinned):

MatcherAssert.assertThat(result.getStdout(), not(containsString("onPinned")));

And that’s really it!

Let’s recap the whole setup, which is really quite simple: we build our programme, then tell it to call an HTTP service via a Proxy, which is guaranteed to be toxic.

One could ask: was it really worth it? I’d say that in some cases: definitely YES.

Okay, this example might seem quite simple (if not simplistic), because our external dependency (the HTTP service) was really trivial and easy to spin in some mock/stub/fake fashion. But what if that wasn’t the case? What if it was a more sophisticated thing: a queue, a message broker, a database, some cache, another microservice of ours? Sure, we could keep spending more and more time to make our mocks/stubs/fakes more and more sophisticated, to write this “one more unit test”. And then we could maintain them, updating their behaviour to mimic the real dependency, based on every documented change. Oh, and undocumented too ;-) (If you haven’t yet wasted two weeks of your life due to undocumented 3rd party change, I “envy” you the journeys ahead!)
Did I also mention that testing your stuff against many different versions of external dependencies is also waaaay easier this way? Instead of keeping mocks/stubs/fakes for the latest five versions of PostgreSQL (because they’re supported and our product is run on-premises by the people who pay us and use any of the five) to ensure our ORM and/or DB queries are okay, we can simply parametrize our tests with these versions. Maintaining such test base is way easier and cheaper.

This is the beauty of Testcontainers, these libraries allow us testing against the real thing, in all stages of the software delivery process, starting from our own machine. Not worrying about keeping all dependencies installed everywhere. And if you’re still not sure, how are you going to mock/stub/fake the non-zero and tunable network latency?

A bit more

Should you be interested in the whole code, it’s available below or in my GitHub repo for Java 19. Please keep in mind that this demo assumes that this is the only Testcontainers test in our suite. Should that not be the case, we could perhaps apply some improvements, which I shall describe some other time.

Should you want the test to fail, make sure some Virtual Threads get pinned (which normally you don’t want to happen) by making the GreetingObtainer.getGreeting method a synchronized one. (After this you may want to rebuild the artifact, e.g. by MAVEN_OPTS="--enable-preview" ./mvnw package ;-) )

In the test I’m using try-with-resources to make Ryuk less busy. This is a Java idiom which closes the resources opened in the try section, regardless the exit point. In general using it to close stuff isn’t a bad idea, I’d say.

The code:

// we're going to copy this file from resources to nginx container
var index = MountableFile.forClasspathResource("index.html");

try (
    var nginx = new NginxContainer<>("nginx:1.23.1")
        .withCopyFileToContainer(index, "/usr/share/nginx/html/index.html")
        .waitingFor(new HttpWaitStrategy());
    var toxiProxy = new ToxiproxyContainer("ghcr.io/shopify/toxiproxy:2.5.0")
        .withNetworkAliases("toxiproxy")
) {
    // starting both containers in parallel
    Stream.of(nginx, toxiProxy).parallel().forEach(GenericContainer::start);

    // creating intoxicated connection to be used between our client and nginx
    var proxy = toxiProxy.getProxy(nginx, 80);
    proxy.toxics().latency("latency", ToxicDirection.DOWNSTREAM, 500).setJitter(50);

    // preparing the artifact to be copied
    var jar = MountableFile.forHostPath(Paths.get("target/concurrency-1.0-SNAPSHOT.jar"));

    try (var container = new GenericContainer<>("eclipse-temurin:19-alpine")
        .withCopyFileToContainer(jar, "/tmp/test.jar")
        .withExposedPorts(8000)
        .withCommand("jwebserver")) {

        // starting container for the client with the client already copied
        container.start();

        // where the client should call
        var uriFromContainer = String.format("http://%s:%d/", "toxiproxy", proxy.getOriginalProxyPort());

        Assertions.assertDoesNotThrow(() -> {

            // running the client which should call the nginx using intoxicated proxy
            var result = container.execInContainer(
                "java", "--enable-preview", "-Djdk.tracePinnedThreads=full", "-jar", "/tmp/test.jar", uriFromContainer);

            // eventually it should exit successfully
            Assertions.assertEquals(0, result.getExitCode());

            // and there should be no virtual threads pinned
            MatcherAssert.assertThat(result.getStdout(), not(containsString("onPinned")));
        });
    }
}

Language selector