Docker https://www.docker.com Thu, 28 Mar 2024 14:32:32 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://www.docker.com/wp-content/uploads/2024/02/cropped-docker-logo-favicon-32x32.png Docker https://www.docker.com 32 32 Building a Video Analysis and Transcription Chatbot with the GenAI Stack https://www.docker.com/blog/building-a-video-analysis-and-transcription-chatbot-with-the-genai-stack/ Thu, 28 Mar 2024 14:32:22 +0000 https://www.docker.com/?p=53230 Videos are full of valuable information, but tools are often needed to help find it. From educational institutions seeking to analyze lectures and tutorials to businesses aiming to understand customer sentiment in video reviews, transcribing and understanding video content is crucial for informed decision-making and innovation. Recently, advancements in AI/ML technologies have made this task more accessible than ever. 

Developing GenAI technologies with Docker opens up endless possibilities for unlocking insights from video content. By leveraging transcription, embeddings, and large language models (LLMs), organizations can gain deeper understanding and make informed decisions using diverse and raw data such as videos. 

In this article, we’ll dive into a video transcription and chat project that leverages the GenAI Stack, along with seamless integration provided by Docker, to streamline video content processing and understanding. 

2400x1260 building next gen video analysis transcription chatbot with genai stack

High-level architecture 

The application’s architecture is designed to facilitate efficient processing and analysis of video content, leveraging cutting-edge AI technologies and containerization for scalability and flexibility. Figure 1 shows an overview of the architecture, which uses Pinecone to store and retrieve the embeddings of video transcriptions. 

Two-part illustration showing “yt-whisper” process on the left, which involves downloading audio, transcribing it using Whisper (an audio transcription system), computing embeddings (mathematical representations of the audio features), and saving those embeddings into Pinecone. On the right side (labeled "dockerbot"), the process includes computing a question embedding, completing a chat with the question combined with provided transcriptions and knowledge, and retrieving relevant transcriptions.
Figure 1: Schematic diagram outlining a two-component system for processing and interacting with video data.

The application’s high-level service architecture includes the following:

  • yt-whisper: A local service, run by Docker Compose, that interacts with the remote OpenAI and Pinecone services. Whisper is an automatic speech recognition (ASR) system developed by OpenAI, representing a significant milestone in AI-driven speech processing. Trained on an extensive dataset of 680,000 hours of multilingual and multitask supervised data sourced from the web, Whisper demonstrates remarkable robustness and accuracy in English speech recognition. 
  • Dockerbot: A local service, run by Docker Compose, that interacts with the remote OpenAI and Pinecone services. The service takes the question of a user, computes a corresponding embedding, and then finds the most relevant transcriptions in the video knowledge database. The transcriptions are then presented to an LLM, which takes the transcriptions and the question and tries to provide an answer based on this information.
  • OpenAI: The OpenAI API provides an LLM service, which is known for its cutting-edge AI and machine learning technologies. In this application, OpenAI’s technology is used to generate transcriptions from audio (using the Whisper model) and to create embeddings for text data, as well as to generate responses to user queries (using GPT and chat completions).
  • Pinecone: A vector database service optimized for similarity search, used for building and deploying large-scale vector search applications. In this application, Pinecone is employed to store and retrieve the embeddings of video transcriptions, enabling efficient and relevant search functionality within the application based on user queries.

Getting started

To get started, complete the following steps:

The application is a chatbot that can answer questions from a video. Additionally, it provides timestamps from the video that can help you find the sources used to answer your question.

Clone the repository 

The next step is to clone the repository:

git clone https://github.com/dockersamples/docker-genai.git

The project contains the following directories and files:

├── docker-genai/
│ ├── docker-bot/
│ ├── yt-whisper/
│ ├── .env.example
│ ├── .gitignore
│ ├── LICENSE
│ ├── README.md
│ └── docker-compose.yaml

Specify your API keys

In the /docker-genai directory, create a text file called .env, and specify your API keys inside. The following snippet shows the contents of the .env.example file that you can refer to as an example.

#-------------------------------------------------------------
# OpenAI
#-------------------------------------------------------------
OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key

#-------------------------------------------------------------
# Pinecone
#--------------------------------------------------------------
PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key

Build and run the application

In a terminal, change directory to your docker-genai directory and run the following command:

docker compose up --build

Next, Docker Compose builds and runs the application based on the services defined in the docker-compose.yaml file. When the application is running, you’ll see the logs of two services in the terminal.

In the logs, you’ll see the services are exposed on ports 8503 and 8504. The two services are complementary to each other.

The yt-whisper service is running on port 8503. This service feeds the Pinecone database with videos that you want to archive in your knowledge database. The next section explores the yt-whisper service.

Using yt-whisper

The yt-whisper service is a YouTube video processing service that uses the OpenAI Whisper model to generate transcriptions of videos and stores them in a Pinecone database. The following steps outline how to use the service.

Open a browser and access the yt-whisper service at http://localhost:8503. Once the application appears, specify a YouTube video URL in the URL field and select Submit. The example shown in Figure 2 uses a video from David Cardozo.

Screenshot showing example of processed content with "download transcription" option for a video from David Cardozo on how to "Develop ML interactive gpu-workflows with Visual Studio Code, Docker and Docker Hub."
Figure 2: A web interface showcasing processed video content with a feature to download transcriptions.

Submitting a video

The yt-whisper service downloads the audio of the video, then uses Whisper to transcribe it into a WebVTT (*.vtt) format (which you can download). Next, it uses the “text-embedding-3-small” model to create embeddings and finally uploads those embeddings into the Pinecone database.

After the video is processed, a video list appears in the web app that informs you which videos have been indexed in Pinecone. It also provides a button to download the transcript.

Accessing Dockerbot chat service

You can now access the Dockerbot chat service on port 8504 and ask questions about the videos as shown in Figure 3.

Screenshot of Dockerbot interaction with user asking a question about Nvidia containers and Dockerbot responding with links to specific timestamps in the video.
Figure 3: Example of a user asking Dockerbot about NVIDIA containers and the application giving a response with links to specific timestamps in the video.

Conclusion

In this article, we explored the exciting potential of GenAI technologies combined with Docker for unlocking valuable insights from video content. It shows how the integration of cutting-edge AI models like Whisper, coupled with efficient database solutions like Pinecone, empowers organizations to transform raw video data into actionable knowledge. 

Whether you’re an experienced developer or just starting to explore the world of AI, the provided resources and code make it simple to embark on your own video-understanding projects. 

Learn more

]]>
containerd vs. Docker: Understanding Their Relationship and How They Work Together https://www.docker.com/blog/containerd-vs-docker/ Wed, 27 Mar 2024 13:41:27 +0000 https://www.docker.com/?p=52606 During the past decade, containers have revolutionized software development by introducing higher levels of consistency and scalability. Now, developers can work without the challenges of dependency management, environment consistency, and collaborative workflows.

When developers explore containerization, they might learn about container internals, architecture, and how everything fits together. And, eventually, they may find themselves wondering about the differences between containerd and Docker and how they relate to one another.

In this blog post, we’ll explain what containerd is, how Docker and containerd work together, and how their combined strengths can improve developer experience.

containerd 2400x1260 1

What’s a container?

Before diving into what containerd is, I should briefly review what containers are. Simply put, containers are processes with added isolation and resource management. Containers have their own virtualized operating system with access to host system resources. 

Containers also use operating system kernel features. They use namespaces to provide isolation and cgroups to limit and monitor resources like CPU, memory, and network bandwidth. As you can imagine, container internals are complex, and not everyone has the time or energy to become an expert in the low-level bits. This is where container runtimes, like containerd, can help.

What’s containerd?

In short, containerd is a runtime built to run containers. This open source tool builds on top of operating system kernel features and improves container management with an abstraction layer, which manages namespaces, cgroups, union file systems, networking capabilities, and more. This way, developers don’t have to handle the complexities directly. 

In March 2017, Docker pulled its core container runtime into a standalone project called containerd and donated it to the Cloud Native Computing Foundation (CNCF).  By February 2019, containerd had reached the Graduated maturity level within the CNCF, representing its significant development, adoption, and community support. Today, developers recognize containerd as an industry-standard container runtime known for its scalability, performance, and stability.

Containerd is a high-level container runtime with many use cases. It’s perfect for handling container workloads across small-scale deployments, but it’s also well-suited for large, enterprise-level environments (including Kubernetes). 

A key component of containerd’s robustness is its default use of Open Container Initiative (OCI)-compliant runtimes. By using runtimes such as runc (a lower-level container runtime), containerd ensures standardization and interoperability in containerized environments. It also efficiently deals with core operations in the container life cycle, including creating, starting, and stopping containers.

How is containerd related to Docker?

But how is containerd related to Docker? To answer this, let’s take a high-level look at Docker’s architecture (Figure 1). 

Containerd facilitates operations on containers by directly interfacing with your operating system. The Docker Engine sits on top of containerd and provides additional functionality and developer experience enhancements.

containerd diagram v1

How Docker interacts with containerd

To better understand this interaction, let’s talk about what happens when you run the docker run command:

  • After you select enter, the Docker CLI will send the run command and any command-line arguments to the Docker daemon (dockerd) via REST API call.
  • dockerd will parse and validate the request, and then it will check that things like container images are available locally. If they’re not, it will pull the image from the specified registry.
  • Once the image is ready, dockerd will shift control to containerd to create the container from the image.
  • Next, containerd will set up the container environment. This process includes tasks such as setting up the container file system, networking interfaces, and other isolation features.
  • containerd will then delegate running the container to runc using a shim process. This will create and start the container.
  • Finally, once the container is running, containerd will monitor the container status and manage the lifecycle accordingly.

Docker and containerd: Better together 

Docker has played a key role in the creation and adoption of containerd, from its inception to its donation to the CNCF and beyond. This involvement helped standardize container runtimes and bolster the open source community’s involvement in containerd’s development. Docker continues to support the evolution of the open source container ecosystem by continuously maintaining and evolving containerd.

Containerd specializes in the core functionality of running containers. It’s a great choice for developers needing access to lower-level container internals and other advanced features. Docker builds on containerd to create a cohesive developer experience and comprehensive toolchain for building, running, testing, verifying, and sharing containers.

Build + Run

In development environments, tools like Docker Desktop, Docker CLI, and Docker Compose allow developers to easily define, build, and run single or multi-container environments and seamlessly integrate with your favorite editors or IDEs or even in your CI/CD pipeline. 

Test

One of the largest developer experience pain points involves testing and environment consistency. With Testcontainers, developers don’t have to worry about reproducibility across environments (for example, dev, staging, testing, and production). Testcontainers also allows developers to use containers for isolated dependency management, parallel testing, and simplified CI/CD integration.

Verify

By analyzing your container images and creating a software bill of materials (SBOM), Docker Scout works with Docker Desktop, Docker Hub, or Docker CLI to help organizations shift left. It also empowers developers to find and fix software vulnerabilities in container images, ensuring a secure software supply chain.

Share

Docker Registry serves as a store for developers to push container images to a shared repository securely. This functionality streamlines image sharing, making maintaining consistency and efficiency in development and deployment workflows easier. 

With Docker building on top of containerd, the software development lifecycle benefits from the inner loop and testing to secure deployment to production.

Wrapping up

In this article, we discussed the relationship between Docker and containerd. We showed how containers, as isolated processes, leverage operating system features to provide efficient and scalable development and deployment solutions. We also described what containerd is and explained how Docker leverages containerd in its stack. 

Docker builds upon containerd to enhance the developer experience, offering a comprehensive suite of tools for the entire development lifecycle across building, running, verifying, sharing, and testing containers. 

Start your next projects with containerd and other container components by checking out Docker’s open source projects and most popular open source tools

Learn more

]]>
Is Your Container Image Really Distroless? https://www.docker.com/blog/is-your-container-image-really-distroless/ Wed, 27 Mar 2024 13:25:43 +0000 https://www.docker.com/?p=52629 Containerization helped drastically improve the security of applications by providing engineers with greater control over the runtime environment of their applications. However, a significant time investment is required to maintain the security posture of those applications, given the daily discovery of new vulnerabilities as well as regular releases of languages and frameworks. 

The concept of “distroless” images offers the promise of greatly reducing the time needed to keep applications secure by eliminating most of the software contained in typical container images. This approach also reduces the amount of time teams spend remediating vulnerabilities, allowing them to focus only on the software they are using. 

In this article, we explain what makes an image distroless, describe tools that make the creation of distroless images practical, and discuss whether distroless images live up to their potential.

2400x1260 is your image really distroless

What’s a distro?

A Linux distribution is a complete operating system built around the Linux kernel, comprising a package management system, GNU tools and libraries, additional software, and often a graphical user interface.

Common Linux distributions include Debian, Ubuntu, Arch Linux, Fedora, Red Hat Enterprise Linux, CentOS, and Alpine Linux (which is more common in the world of containers). These Linux distributions, like most Linux distros, treat security seriously, with teams working diligently to release frequent patches and updates to known vulnerabilities. A key challenge that all Linux distributions must face involves the usability/security dilemma. 

On its own, the Linux kernel is not very usable, so many utility commands are included in distributions to cover a large array of use cases. Having the right utilities included in the distribution without having to install additional packages greatly improves a distro’s usability. The downside of this increase in usability, however, is an increased attack surface area to keep up to date. 

A Linux distro must strike a balance between these two elements, and different distros have different approaches to doing so. A key aspect to keep in mind is that a distro that emphasizes usability is not “less secure” than one that does not emphasize usability. What it means is that the distro with more utility packages requires more effort from its users to keep it secure.

Multi-stage builds

Multi-stage builds allow developers to separate build-time dependencies from runtime ones. Developers can now start from a full-featured build image with all the necessary components installed, perform the necessary build step, and then copy only the result of those steps to a more minimal or even an empty image, called “scratch”. With this approach, there’s no need to clean up dependencies and, as an added bonus, the build stages are also cacheable, which can considerably reduce build time. 

The following example shows a Go program taking advantage of multi-stage builds. Because the Golang runtime is compiled into the binary, only the binary and root certificates need to be copied to the blank slate image.

FROM golang:1.21.5-alpine as build
WORKDIR /
COPY go.* .
RUN go mod download
COPY . .
RUN go build -o my-app


FROM scratch
COPY --from=build
  /etc/ssl/certs/ca-certificates.crt
  /etc/ssl/certs/ca-certificates.crt
COPY --from=build /my-app /usr/local/bin/my-app
ENTRYPOINT ["/usr/local/bin/my-app"]

BuildKit

BuildKit, the current engine used by docker build, helps developers create minimal images thanks to its extensible, pluggable architecture. It provides the ability to specify alternative frontends (with the default being the familiar Dockerfile) to abstract and hide the complexity of creating distroless images. These frontends can accept more streamlined and declarative inputs for builds and can produce images that contain only the software needed for the application to run. 

The following example shows the input for a frontend for creating Python applications called mopy by Julian Goede.

#syntax=cmdjulian/mopy
apiVersion: v1
python: 3.9.2
build-deps:
  - libopenblas-dev
  - gfortran
  - build-essential
envs:
  MYENV: envVar1
pip:
  - numpy==1.22
  - slycot
  - ./my_local_pip/
  - ./requirements.txt
labels:
  foo: bar
  fizz: ${mopy.sbom}
project: my-python-app/

So, is your image really distroless?

Thanks to new tools for creating container images like multi-stage builds and BuildKit, it is now a lot more practical to create images that only contain the required software and its runtime dependencies. 

However, many images claiming to be distroless still include a shell (usually Bash) and/or BusyBox, which provides many of the commands a Linux distribution does — including wget — that can leave containers vulnerable to Living off the land (LOTL) attacks. This raises the question, “Why would an image trying to be distroless still include key parts of a Linux distribution?” The answer typically involves container initialization. 

Developers often have to make their applications configurable to meet the needs of their users. Most of the time, those configurations are not known at build time so they need to be configured at run time. Often, these configurations are applied using shell initialization scripts, which in turn depend on common Linux utilities such as sed, grep, cp, etc. When this is the case, the shell and utilities are only needed for the first few seconds of the container’s lifetime. Luckily, there is a way to create true distroless images while still allowing initialization using tools available from most container orchestrators: init containers.

Init containers

In Kubernetes, an init container is a container that starts and must complete successfully before the primary container can start. By using a non-distroless container as an init container that shares a volume with the primary container, the runtime environment and application can be configured before the application starts. 

The lifetime of that init container is short (often just a couple seconds), and it typically doesn’t need to be exposed to the internet. Much like multi-stage builds allow developers to separate the build-time dependencies from the runtime dependencies, init containers allow developers to separate initialization dependencies from the execution dependencies. 

The concept of init container may be familiar if you are using relational databases, where an init container is often used to perform schema migration before a new version of an application is started.

Kubernetes example

Here are two examples of using init containers. First, using Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: kubecon-postgress-pod
  labels:
    app.kubernetes.io/name: KubeConPostgress
spec:
  containers:
  - name: postgress
    image: laurentgoderre689/postgres-distroless
    securityContext:
      runAsUser: 70
      runAsGroup: 70
    volumeMounts:
    - name: db
      mountPath: /var/lib/postgresql/data/
  initContainers:
  - name: init-postgress
    image: postgres:alpine3.18
    env:
      - name: POSTGRES_PASSWORD
        valueFrom:
          secretKeyRef:
            name: kubecon-postgress-admin-pwd
            key: password
    command: ['docker-ensure-initdb.sh']
    volumeMounts:
    - name: db
      mountPath: /var/lib/postgresql/data/
  volumes:
  - name: db
    emptyDir: {}

- - - 

> kubectl apply -f pod.yml && kubectl get pods
pod/kubecon-postgress-pod created
NAME                    READY   STATUS     RESTARTS   AGE
kubecon-postgress-pod   0/1     Init:0/1   0          0s
> kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
kubecon-postgress-pod   1/1     Running   0          10s

Docker Compose example

The init container concept can also be emulated in Docker Compose for local development using service dependencies and conditions.

services:
 db:
   image: laurentgoderre689/postgres-distroless
   user: postgres
   volumes:
     - pgdata:/var/lib/postgresql/data/
   depends_on:
     db-init:
       condition: service_completed_successfully

 db-init:
   image: postgres:alpine3.18
   environment:
      POSTGRES_PASSWORD: example
   volumes:
     - pgdata:/var/lib/postgresql/data/
   user: postgres
    command: docker-ensure-initdb.sh

volumes:
 pgdata:

- - - 
> docker-compose up 
[+] Running 4/0
 ✔ Network compose_default      Created                                                                                                                      
 ✔ Volume "compose_pgdata"      Created                                                                                                                     
 ✔ Container compose-db-init-1  Created                                                                                                                      
 ✔ Container compose-db-1       Created                                                                                                                      
Attaching to db-1, db-init-1
db-init-1  | The files belonging to this database system will be owned by user "postgres".
db-init-1  | This user must also own the server process.
db-init-1  | 
db-init-1  | The database cluster will be initialized with locale "en_US.utf8".
db-init-1  | The default database encoding has accordingly been set to "UTF8".
db-init-1  | The default text search configuration will be set to "english".
db-init-1  | [...]
db-init-1 exited with code 0
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  starting PostgreSQL 16.1 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db-1       | 2024-02-23 14:59:33.194 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db-1       | 2024-02-23 14:59:33.196 UTC [9] LOG:  database system was shut down at 2024-02-23 14:59:32 UTC
db-1       | 2024-02-23 14:59:33.198 UTC [1] LOG:  database system is ready to accept connections

As demonstrated by the previous example, an init container can be used alongside a container to remove the need for general-purpose software and allow the creation of true distroless images. 

Conclusion

This article explained how Docker build tools allow for the separation of build-time dependencies from run-time dependencies to create “distroless” images. For example, using init containers allows developers to separate the logic needed to configure a runtime environment from the environment itself and provide a more secure container. This approach also helps teams focus their efforts on the software they use and find a better balance between security and usability.

Learn more

]]>
Is Your Image Really Distroless? - Laurent Goderre, Docker nonadult
11 Years of Docker: Shaping the Next Decade of Development https://www.docker.com/blog/docker-11-year-anniversary/ Thu, 21 Mar 2024 15:31:36 +0000 https://www.docker.com/?p=53202 Eleven years ago, Solomon Hykes walked onto the stage at PyCon 2013 and revealed Docker to the world for the first time. The problem Docker was looking to solve? “Shipping code to the server is hard.”

And the world of application software development changed forever.

2400x1260

Docker was built on the shoulders of giants of the Linux kernel, copy-on-write file systems, and developer-friendly git semantics. The result? Docker has fundamentally transformed how developers build, share, and run applications. By “dockerizing” an app and its dependencies into a standardized, open format, Docker dramatically lowered the friction between devs and ops, enabling devs to focus on their apps — what’s inside the container — and ops to focus on deploying any app, anywhere — what’s outside the container, in a standardized format. Furthermore, this standardized “unit of work” that abstracts the app from the underlying infrastructure enables an “inner loop” for developers of code, build, test, verify, and debug, which results in 13X more frequent releases of higher-quality, more secure updates.

The subsequent energy over the past 11 years from the ecosystem of developers, community, open source maintainers, partners, and customers cannot be understated, and we are so thankful and appreciative of your support. This has shown up in many ways, including the following:

  • Ranked #1 “most-wanted” tool/platform by Stack Overflow’s developer community for the past four years
  • 26 million monthly active IPs accessing 15 million repos on Docker Hub, pulling them 25 billion times per month
  • 17 million registered developers
  • Moby project has 67.5k stars, 18.5k forks, and more than 2,200 contributors; Docker Compose has 32.1k stars and 5k forks
  • A vibrant network of 70 Docker Captains across 25 countries serving 167 community meetup groups with more than 200k members and 4800 total meetups
  • 79,000+ customers

The next decade

In our first decade, we changed how developers build, share, and run any app, anywhere — and we’re only accelerating in our second!

Specifically, you’ll see us double down on meeting development teams where they are to enable them to rapidly ship high-quality, secure apps via the following focus areas:

  • Dev Team Productivity. First, we’ll continue to help teams take advantage of the right tech for the right job — whether that’s Linux containers, Windows containers, serverless functions, and/or Wasm (Web Assembly) — with the tools they love and the skills they already possess. Second, by bringing together the best of local and the best of cloud, we’ll enable teams to discover and address issues even faster in the “inner loop,” as you’re already seeing today with our early efforts with Docker Scout, Docker Build Cloud, and Testcontainers Cloud.
  • GenAI. This tech is ushering in a “golden age” for development teams, and we’re investing to help in two areas: First, our GenAI Stack — built through collaboration with partners Ollama, LangChain, and Neo4j — enables dev teams to quickly stand up secure, local GenAI-powered apps. Second, our Docker AI is uniquely informed by anonymized data from dev teams using Docker, which enables us to deliver automations that eliminate toil and reduce security risks.
  • Software Supply Chain. The software supply chain is heterogeneous, extensive, and complex for dev teams to navigate and secure, and Docker will continue to help simplify, make more visible, and manage it end-to-end. Whether it’s the trusted content “building blocks” of Docker Official Images (DOI) in Docker Hub, the transformation of ingredients into runnable images via BuildKit, verifying and securing the dev environment with digital signatures and enhanced container isolation, consuming metadata feedback from running containers in production, or making the entire end-to-end supply chain visible and issues actionable in Docker Scout, Docker has it covered and helps make a more secure internet!

Dialing it past 11

While our first decade was fantastic, there’s so much more we can do together as a community to serve app development teams, and we couldn’t be more excited as our second decade together gets underway and we dial it past 11! If you haven’t already, won’t you join us today?!

How has Docker influenced your approach to software development? Share your experiences with the community and join the conversation on LinkedIn.

Let’s build, share, and run — together!

]]>
Docker Partners with NVIDIA to Support Building and Running AI/ML Applications https://www.docker.com/blog/docker-nvidia-support-building-running-ai-ml-apps/ Mon, 18 Mar 2024 22:06:12 +0000 https://www.docker.com/?p=53020 The domain of GenAI and LLMs has been democratized and tasks that were once purely in the domain of AI/ML developers must now be reasoned with by regular application developers into everyday products and business logic. This is leading to new products and services across banking, security, healthcare, and more with generative text, images, and videos. Moreover, GenAI’s potential economic impact is substantial, with estimates it could add trillions of dollars annually to the global economy. 

Docker offers an ideal way for developers to build, test, run, and deploy the NVIDIA AI Enterprise software platform — an end-to-end, cloud-native software platform that brings generative AI within reach for every business. The platform is available to use in Docker containers, deployable as microservices. This enables teams to focus on cutting-edge AI applications where performance isn’t just a goal — it’s a necessity.

This week, at the NVIDIA GTC global AI conference, the latest release of NVIDIA AI Enterprise was announced, providing businesses with the tools and frameworks necessary to build and deploy custom generative AI models with NVIDIA AI foundation models, the NVIDIA NeMo framework, and the just-announced NVIDIA NIM inference microservices, which deliver enhanced performance and efficient runtime. 

This blog post summarizes some of the Docker resources available to customers today.

docker nvidia 2400x1260 1

Docker Hub

Docker Hub is the world’s largest repository for container images with an extensive collection of AI/ML development-focused container images, including leading frameworks and tools such as PyTorch, TensorFlow, Langchain, Hugging Face, and Ollama. With more than 100 million pull requests for AI/ML-related images, Docker Hub’s significance to the developer community is self-evident. It not only simplifies the development of AI/ML applications but also democratizes innovation, making AI technologies accessible to developers across the globe.

NVIDIA’s Docker Hub library offers a suite of container images that harness the power of accelerated computing, supplementing NVIDIA’s API catalog. Docker Hub’s vast audience — which includes approximately 27 million monthly active IPs, showcasing an impressive 47% year-over-year growth — can use these container images to enhance AI performance. 

Docker Hub’s extensive reach, underscored by an astounding 26 billion monthly image pulls, suggests immense potential for continued growth and innovation.

Docker Desktop with NVIDIA AI Workbench

Docker Desktop on Windows and Mac helps deliver NVIDIA AI Workbench developers a smooth experience on local and remote machines. 

NVIDIA AI Workbench is an easy-to-use toolkit that allows developers to create, test, and customize AI and machine learning models on their PC or workstation and scale them to the data center or public cloud. It simplifies interactive development workflows while automating technical tasks that halt beginners and derail experts. AI Workbench makes workstation setup and configuration fast and easy. Example projects are also included to help developers get started even faster with their own data and use cases.   

Docker engineering teams are collaborating with NVIDIA to improve the user experience with NVIDIA GPU-accelerated platforms through recent improvements to the AI Workbench installation on WSL2.

Check out how NVIDIA AI Workbench can be used locally to tune a generative image model to produce more accurate prompted results:

In a near-term update, AI Workbench will use the Container Device Interface (CDI) to govern local and remote GPU-enabled environments. CDI is a CNCF-sponsored project led by NVIDIA and Intel, which exposes NVIDIA GPUs inside of containers to support complex device configurations and CUDA compatibility checks. This simplifies how research, simulation, GenAI, and ML applications utilize local and cloud-native GPU resources.  

With Docker Desktop 4.29 (which includes Moby 25), developers can configure CDI support in the daemon and then easily make all NVIDIA GPUs available in a running container by using the –device option via support for CDI devices.

docker run --device nvidia.com/gpu=all <image> <command>

LLM-powered apps with Docker GenAI Stack

The Docker GenAI Stack lets teams easily integrate NVIDIA accelerated computing into their AI workflows. This stack, designed for seamless component integration, can be set up on a developer’s laptop using Docker Desktop for Windows. It helps deliver the power of NVIDIA GPUs and NVIDIA NIM to accelerate LLM inference, providing tangible improvements in application performance. Developers can experiment and modify five pre-packaged applications to leverage the stack’s capabilities.

Accelerate AI/ML development with Docker Desktop

Docker Desktop facilitates an accelerated machine learning development environment on a developer’s laptop. By tapping NVIDIA GPU support for containers, developers can leverage tools distributed via Docker Hub, such as PyTorch and TensorFlow, to see significant speed improvements in their projects, underscoring the efficiency gains possible with NVIDIA technology on Docker.

Securing the software supply chain

Securing the software supply chain is a crucial aspect of continuously developing ML applications that can run reliably and securely in production. Building with verified, trusted content from Docker Hub and staying on top of security issues through actionable insights from Docker Scout is key to improving security posture across the software supply chain. By following these best practices, customers can minimize the risk of security issues hitting production, improving the overall reliability and integrity of applications running in production. This comprehensive approach not only accelerates the development of ML applications built with the Docker GenAI Stack but also allows for more secure images when building on images sourced from Hub that interface with LLMs, such as LangChain. Ultimately, this provides developers with the confidence that their applications are built on a secure and reliable foundation.

With exploding interest in AI from a huge range of developers, we are excited to work with NVIDIA to build tooling that helps accelerate building AI applications. The ecosystem around Docker and NVIDIA has been building strong foundations for many years and this is enabling a new community of enterprise AI/ML developers to explore and build GPU accelerated applications.”

Justin Cormack, Chief Technology Officer, Docker

Enterprise applications like NVIDIA AI Workbench can benefit enormously from the streamlining that Docker Desktop provides on local systems. Our work with the Docker team will help improve the AI Workbench user experience for managing GPUs on Windows.”

Tyler Whitehouse, Principal Product Manager, NVIDIA

Learn more 

By leveraging Docker Desktop and Docker Hub with NVIDIA technologies, developers are equipped to harness the revolutionary power of AI, grow their skills, and seize opportunities to deliver innovative applications that push the boundaries of what’s possible. Check out NVIDIA’s Docker Hub library  and NVIDIA AI Enterprise to get started with your own AI solutions.

]]>
NVIDIA AI Workbench | Fine Tuning Generative AI nonadult
Rev Up Your Cloud-Native Journey: Join Docker at KubeCon + CloudNativeCon EU 2024 for Innovation, Expert Insight, and Motorsports https://www.docker.com/blog/docker-at-kubecon-eu-2024/ Wed, 13 Mar 2024 13:42:19 +0000 https://www.docker.com/?p=52866 We are racing toward the finish line at KubeCon + CloudNativeCon Europe, March 19 – 22, 2024 in Paris, France. Join the Docker “pit crew” at Booth #J3 for an incredible racing experience, new product demos, and limited-edition SWAG. 

Meet us at our KubeCon booth, sessions, and events to learn about the latest trends in AI productivity and best practices in cloud-native development with Docker. At our KubeCon booth (#J3), we’ll show you how building in the cloud accelerates development and simplifies multi-platform builds with a side-by-side demo of Docker Build Cloud. Learn how Docker and Testcontainers Cloud provide a seamless integration within the testing framework to improve the quality and speed of application delivery.

It’s not all work, though — join us at the booth for our Megennis Motorsport Racing experience and try to beat the best!

Take advantage of this opportunity to connect with the Docker team, learn from the experts, and contribute to the ever-evolving cloud-native landscape. Let’s shape the future of cloud-native technologies together at KubeCon!

2400x1260 docker at kubecon eu 2024

Deep dive sessions from Docker experts

Is Your Image Really Distroless? — Docker software engineer Laurent Goderre will dive into the world of “distroless” Docker images on Wednesday, March 20. In this session, Goderre will explain the significance of separating build-time and run-time dependencies to enhance container security and reduce vulnerabilities. He’ll also explore strategies for configuring runtime environments without compromising security or functionality. Don’t miss this must-attend session for KubeCon attendees keen on fortifying their Docker containers.

Simplified Inner and Outer Cloud Native Developer Loops — Docker Staff Community Relations Manager Oleg Šelajev and Diagrid Customer Success Engineer Alice Gibbons tackle the challenges of developer productivity in cloud-native development. On Wednesday, March 20, they will present tools and practices to bridge the gap between development and production environments, demonstrating how a unified approach can streamline workflows and boost efficiency across the board.

Engage, learn, and network at these events

Security Soiree: Hands-on cloud-native security workshop and party — Join Sysdig, Snyk, and Docker on March 19 for cocktails, team photos, music, prizes, and more at the Security Soiree. Listen to a compelling panel discussion led by industry experts, including Docker’s Director of Security, Risk & Trust, Rachel Taylor, followed by an evening of networking and festivities. Get tickets to secure your invitation.

Docker Meetup at KubeCon: Development & data productivity in the age of AI — Join us at our meetup during KubeCon on March 21 and hear insights from Docker, Pulumi, Tailscale, and New Relic. This networking mixer at Tonton Becton Restaurant promises candid discussions on enhancing developer productivity with the latest AI and data technologies. Reserve your spot now for an evening of casual conversation, drinks, and delicious appetizers.

See you March 19 – 22 at KubeCon + CloudNativeCon Europe

We look forward to seeing you in Paris — safe travels and prepare for an unforgettable experience!

Learn more

]]>
Are Containers Only for Microservices? Myth Debunked https://www.docker.com/blog/are-containers-only-for-microservices-myth-debunked/ Wed, 06 Mar 2024 14:44:54 +0000 https://www.docker.com/?p=52184 In the ever-evolving software delivery landscape, containerization has emerged as a transformative force, reshaping how organizations build, test, deploy, and manage their applications. 

Whether you are maintaining a monolithic legacy system, navigating the complexities of Service-Oriented Architecture (SOA), or orchestrating your digital strategy around application programming interfaces (APIs), containerization offers a pathway to increased efficiency, resilience, and agility. 

In this post, we’ll debunk the myth that containerization is solely the domain of microservices by exploring its applicability and advantages across different architectural paradigms. 

rectangle containerization

Containerization across architectures

Although containerization is commonly associated with microservices architecture because of its agility and scalability, the potential extends far beyond, offering compelling benefits to a variety of architectural styles. From the tightly integrated components of monolithic applications to the distributed nature of SOA and the strategic approach of API-led connectivity, containerization stands as a universal tool, adaptable and beneficial across the board.

Beyond the immediate benefits of improved resource utilization, faster deployment cycles, and streamlined maintenance, the true value of containerization lies in its ability to ensure consistent application performance across varied environments. This consistency is a cornerstone for reliability and efficiency, pivotal in today’s fast-paced software delivery demands.

Here, we will provide examples of how this technology can be a game-changer for your digital strategy, regardless of your adopted style. Through this exploration, we invite technology leaders and executives to broaden their perspective on containerization, seeing it not just as a tool for one architectural approach but as a versatile ally in the quest for digital excellence.

1. Event-driven architecture

Event-driven architecture (EDA) represents a paradigm shift in how software components interact, pivoting around the concept of events — such as state changes or specific action occurrences — as the primary conduit for communication. This architectural style fosters loose coupling, enabling components to operate independently and react asynchronously to events, thereby augmenting system flexibility and agility. EDA’s intrinsic support for scalability, by allowing components to address fluctuating workloads independently, positions it as an ideal candidate for handling dynamic system demands.

Within the context of EDA, containerization emerges as a critical enabler, offering a streamlined approach to encapsulate applications alongside their dependencies. This encapsulation guarantees that each component of an event-driven system functions within a consistent, isolated environment — a crucial factor when managing components with diverse dependency requirements. Containers’ scalability becomes particularly advantageous in EDA, where fluctuating event volumes necessitate dynamic resource allocation. By deploying additional container instances in response to increased event loads, systems maintain high responsiveness levels.

Moreover, containerization amplifies the deployment flexibility of event-driven components, ensuring consistent event generation and processing across varied infrastructures (Figure 1). This adaptability facilitates the creation of agile, scalable, and portable architectures, underpinning the deployment and management of event-driven components with a robust, flexible infrastructure. Through containerization, EDA systems achieve enhanced operational efficiency, scalability, and resilience, embodying the principles of modern, agile application delivery.

 Illustration of event-driven architecture showing Event Broker connected to Orders Service, Fulfillment Service, and Notification Service.
Figure 1: Event-driven architecture.

2. API-led architecture

API-led connectivity represents a strategic architectural approach focused on the design, development, and management of APIs to foster seamless connectivity and data exchange across various systems, applications, and services within an organization (Figure 2). This methodology champions a modular and scalable framework ideal for the modern digital enterprise.

The principles of API-led connectivity — centering on system, process, and experience APIs — naturally harmonize with the benefits of containerization. By encapsulating each API within its container, organizations can achieve unparalleled modularity and scalability. Containers offer an isolated runtime environment for each API, ensuring operational independence and eliminating the risk of cross-API interference. This isolation is critical, as it guarantees that modifications or updates to one API can proceed without adversely affecting others, which is a cornerstone of maintaining a robust API-led ecosystem.

Moreover, the dual advantages of containerization — ensuring consistent execution environments and enabling easy scalability — align perfectly with the goals of API-led connectivity. This combination not only simplifies the deployment and management of APIs across diverse environments but also enhances the resilience and flexibility of the API infrastructure. Together, API-led connectivity and containerization empower organizations to develop, scale, and manage their API ecosystems more effectively, driving efficiency and innovation in application delivery.

Illustration of API-led architecture showing separate layers for Channel APIs, Orchestration APIs, and System APIs.
Figure 2: API-led architecture.

3. Service-oriented architecture

Service-oriented architecture (SOA) is a design philosophy that emphasizes the use of discrete services within an architecture to provide business functionalities. These services communicate through well-defined interfaces and protocols, enabling interoperability and facilitating the composition of complex applications from independently developed services. SOA’s focus on modularity and reusability makes it particularly amenable to the benefits offered by containerization.

Containerization brings a new dimension of flexibility and efficiency to SOA by encapsulating these services into containers. This encapsulation provides an isolated environment for each service, ensuring consistent execution regardless of the deployment environment. Such isolation is crucial for maintaining the integrity and availability of services, particularly in complex, distributed architectures where services must communicate across different platforms and networks.

Moreover, containerization enhances the scalability and manageability of SOA-based systems. Containers can be dynamically scaled to accommodate varying loads, enabling organizations to respond swiftly to changes in demand. This scalability, combined with the ease of deployment and rollback provided by container orchestration platforms, supports the agile delivery and continuous improvement of services.

The integration of containerization with SOA essentially results in a more resilient, scalable, and manageable architecture. It enables organizations to leverage the full potential of SOA by facilitating faster deployment, enhancing performance, and simplifying the lifecycle management of services. Together, SOA and containerization create a powerful framework for building flexible, future-proof applications that can adapt to the evolving needs of the business.

4. Monolithic applications

Contrary to common perceptions, monolithic applications stand to gain significantly from containerization. This technology can encapsulate the full application stack — including the core application, its dependencies, libraries, and runtime environment within a container. This encapsulation ensures uniformity across various stages of the development lifecycle, from development and testing to production, effectively addressing the infamous ‘it works on my machine’ challenge. Such consistency streamlines the deployment process and simplifies scaling efforts, which is particularly beneficial for applications that need to adapt quickly to changing demands.

Moreover, containerization fosters enhanced collaboration among development teams by standardizing the operational environment, thereby minimizing discrepancies that typically arise from working in divergent development environments. This uniformity is invaluable in accelerating development cycles and improving product reliability.

Perhaps one of the most strategic benefits of containerization for monolithic architectures is the facilitation of a smoother transition to microservices. By containerizing specific components of the monolith, organizations can incrementally decompose their application into more manageable, loosely coupled microservices. This approach not only mitigates the risks associated with a full-scale migration but also allows teams to gradually adapt to microservices’ architectural patterns and principles.

Containerization presents a compelling proposition for monolithic applications, offering a pathway to modernization that enhances deployment efficiency, operational consistency, and the flexibility to evolve toward a microservices-oriented architecture. Through this lens, containerization is not just a tool for new applications but a bridge that allows legacy applications to step into the future of software development.

Conclusion

The journey of modern software development, with its myriad architectural paths, is markedly enhanced by the adoption of containerization. This technology transcends architectural boundaries, bringing critical advantages such as isolation, scalability, and portability to the forefront of application delivery. Whether your environment is monolithic, service-oriented, event-driven, or API-led, containerization aligns perfectly with the ethos of modern, distributed, and cloud-native applications. 

By embracing the adaptability and transformative potential of containerization, you can open your architectures to a future where agility, efficiency, and resilience are not just aspirations but achievable realities. Begin your transformative journey with Docker Desktop today and redefine what’s possible within the bounds of your existing architectural framework.

Learn more

]]>
Filter Out Security Vulnerability False Positives with VEX https://www.docker.com/blog/filter-out-security-vulnerability-false-positives-with-vex/ Tue, 05 Mar 2024 13:57:23 +0000 https://www.docker.com/?p=52382 Development and security teams are becoming overwhelmed by an ever-growing backlog of security vulnerabilities requiring their attention. Although these vulnerability insights are essential to safeguard organizations and their customers from potential threats, the findings are often bloated with a high volume of noise, especially from false positives. 

The 2022 Cloud Security Alert Fatigue Report states that more than 40% of alerts from security tools are false positives, which means that teams can be inundated with vulnerabilities that pose no actual risk. The impact of these false positives includes delayed releases, wasted productivity, internal friction, burnout, and eroding customer trust, all of which accumulate significant financial loss for organizations.

How can developers and security professionals cut through the noise so that they can more effectively manage vulnerabilities and focus on what truly matters? That is where the Vulnerability Exploitability eXchange (VEX) comes in.

In this article, we’ll explain how VEX works with Docker Scout and walk through how you can get started. 

v2 rectangle false positives got you down vex them

What is VEX?

VEX, developed by the National Telecommunications and Information Administration (NTIA), is a specification aimed at capturing and conveying information about exploitable vulnerabilities within a product. Among other details, the framework classifies vulnerability status into four key categories, forming the core of a VEX document:

  • Not affected — No remediation is required regarding this vulnerability.
  • Affected — Actions are recommended to remediate or address this vulnerability.
  • Fixed — These product versions contain a fix for the vulnerability.
  • Under Investigation — Whether these product versions are affected by the vulnerability is still unknown. An update will be provided in a later release.

By ingesting the context from VEX, organizations can distinguish the noise from the confirmed exploitable vulnerabilities to get a more accurate picture of their attack surface and bring focus to their remediation activities. For example, vulnerabilities assigned a “not affected” status in the VEX document may potentially be ruled out as false positives and hidden from tool outputs to minimize distraction.

Although the practice of documenting software vulnerability context is not novel per se, VEX itself represents an advancement over solutions that have traditionally ruled over the vulnerability management processes, such as emails, spreadsheets, Confluence pages, and Jira tickets. 

What sets VEX apart are its standardized and machine-readable features, which make it much better suited for integration and automation within an organization’s vulnerability ecosystem, resulting in a more streamlined and effective approach to vulnerability management without unnecessary resource drain. However, to yield these results — repeatedly and at scale — the technology landscape surrounding VEX must first evolve to deliver tools and experiences that can successfully put VEX data into action in verifiable, automatable, and meaningful ways. 

For more information on VEX, refer to the one-page summary (PDF) by NTIA.

Want to get started with VEX? Docker can help

The implementation of VEX is still nascent in the industry and widespread utilization and adoption will be key in unleashing its full potential. Docker, too, is early in its VEX journey, but read on for how we’re helping our users get started.

Use Docker Scout with local VEX documents

If you want to try how VEX works with Docker Scout, the quickest way to get up and running is to create a local VEX document with the tool of your choice, such as vexctl, and incorporate it into your image analysis with the --vex-location flag for the docker scout cves command.

$ mkdir -p /usr/local/share/vex
$ vexctl create [options] --file /usr/local/share/vex/example.vex.json
$ docker scout cves --vex-location /usr/local/share/vex <image-reference>

Embed VEX documents as attestations

The new docker scout attestation add command lets you attach VEX documents to images as in-toto attestations, which means VEX statements are available on and distributed together with the image.

$ docker scout attestation add \
  --file /usr/local/share/vex/example.vex.json \
  --predicate-type https://openvex.dev/ns/v0.2.0 \
  <image>

Docker Scout automatically incorporates any VEX attestations into the results when you analyze images on the CLI. It also works with attestations signed with Sigstore and attached using vexctl attest --attest --sign.

Automatically create VEX documents with Sysdig

The Sysdig integration for Docker Scout detects what packages are being loaded into memory in your runtime environment and automatically creates VEX statements for filtering out non-applicable CVEs.

Try it out

We are working on embedding the above capability and more into Docker Scout so that users can effortlessly generate and apply VEX to vanquish their false positives for good. Simultaneously, we are exploring VEX for Docker Official Images to allow upstream maintainers to indicate non-applicable CVEs in their images, which can improve tooling (e.g., scanner) accuracy if VEX is taken into account. 

In the meantime, if you are curious about how this all works in practice, we’ve created a guide that walks you through the steps of creating a VEX document, applying it to image analysis, and creating VEX attestations. 

Learn more

]]>
Revolutionize Your CI/CD Pipeline: Integrating Testcontainers and Bazel https://www.docker.com/blog/revolutionize-your-ci-cd-pipeline-integrating-testcontainers-and-bazel/ Thu, 29 Feb 2024 15:00:00 +0000 https://www.docker.com/?p=51429 One of the challenges in modern software development is being able to release software often and with confidence. This can only be achieved when you have a good CI/CD setup in place that can test your software and release it with minimal or even no human intervention. But modern software applications also use a wide range of third-party dependencies and often need to run on multiple operating systems and architectures. 

In this post, I will explain how the combination of Bazel and Testcontainers helps developers build and release software by providing a hermetic build system.

banner Running Testcontainers Tests using Bazel 2400x1260 1

Using Bazel and Testcontainers together

Bazel is an open source build tool developed by Google to build and test multi-language, multi-platform projects. Several big IT companies have adopted monorepos for various reasons, such as:

  • Code sharing and reusability 
  • Cross-project refactoring 
  • Consistent builds and dependency management 
  • Versioning and release management

With its multi-language support and focus on reproducible builds, Bazel shines in building such monorepos.

A key concept of Bazel is hermeticity, which means that when all inputs are declared, the build system can know when an output needs to be rebuilt. This approach brings determinism where, given the same input source code and product configuration, it will always return the same output by isolating the build from changes to the host system.

Testcontainers is an open source framework for provisioning throwaway, on-demand containers for development and testing use cases. Testcontainers make it easy to work with databases, message brokers, web browsers, or just about anything that can run in a Docker container.

Using Bazel and Testcontainers together offers the following features:

  • Bazel can build projects using different programming languages like C, C++, Java, Go, Python, Node.js, etc.
  • Bazel can dynamically provision the isolated build/test environment with desired language versions.
  • Testcontainers can provision the required dependencies as Docker containers so that your test suite is self-contained. You don’t have to manually pre-provision the necessary services, such as databases, message brokers, and so on. 
  • All the test dependencies can be expressed through code using Testcontainers APIs, and you avoid the risk of breaking hermeticity by sharing such resources between tests.

Let’s see how we can use Bazel and Testcontainers to build and test a monorepo with modules using different languages.
We are going to explore a monorepo with a customers module, which uses Java, and a products module, which uses Go. Both modules interact with relational databases (PostgreSQL) and use Testcontainers for testing.

Getting started with Bazel

To begin, let’s get familiar with Bazel’s basic concepts. The best way to install Bazel is by using Bazelisk. Follow the official installation instructions to install Bazelisk. Once it’s installed, you should be able to run the Bazelisk version and Bazel version commands:

$ brew install bazelisk
$ bazel version

Bazelisk version: v1.12.0
Build label: 7.0.0

Before you can build a project using Bazel, you need to set up its workspace. 

A workspace is a directory that holds your project’s source files and contains the following files:

  • The WORKSPACE.bazel file, which identifies the directory and its contents as a Bazel workspace and lives at the root of the project’s directory structure.
  • A MODULE.bazel file, which declares dependencies on Bazel plugins (called “rulesets”).
  • One or more BUILD (or BUILD.bazel) files, which describe the sources and dependencies for different parts of the project. A directory within the workspace that contains a BUILD file is a package.

In the simplest case, a MODULE.bazel file can be an empty file, and a BUILD file can contain one or more generic targets as follows:

genrule(
    name = "foo",
    outs = ["foo.txt"],
    cmd_bash = "sleep 2 && echo 'Hello World' >$@",
)

genrule(
    name = "bar",
    outs = ["bar.txt"],
    cmd_bash = "sleep 2 && echo 'Bye bye' >$@",
)

Here, we have two targets: foo and bar. Now we can build those targets using Bazel as follows:

$ bazel build //:foo <- runs only foo target, // indicates root workspace
$ bazel build //:bar <- runs only bar target
$ bazel build //... <- runs all targets

Configuring the Bazel build in a monorepo

We are going to explore using Bazel in the testcontainers-bazel-demo repository. This repository is a monorepo with a customers module using Java and a products module using Go. Its structure looks like the following:

testcontainers-bazel-demo
|____customers
| |____BUILD.bazel
| |____src
|____products
| |____go.mod
| |____go.sum
| |____repo.go
| |____repo_test.go
| |____BUILD.bazel
|____MODULE.bazel

Bazel uses different rules for building different types of projects. Bazel uses rules_java for building Java packages, rules_go for building Go packages, rules_python for building Python packages, etc.

We may also need to load additional rules providing additional features. For building Java packages, we may want to use external Maven dependencies and use JUnit 5 for running tests. In that case, we should load rules_jvm_external to be able to use Maven dependencies. 

We are going to use Bzlmod, the new external dependency subsystem, to load the external dependencies. In the MODULE.bazel file, we can load the additional rules_jvm_external and contrib_rules_jvm as follows:

bazel_dep(name = "contrib_rules_jvm", version = "0.21.4")
bazel_dep(name = "rules_jvm_external", version = "5.3")

maven = use_extension("@rules_jvm_external//:extensions.bzl", "maven")
maven.install(
   name = "maven",
   artifacts = [
       "org.postgresql:postgresql:42.6.0",
       "ch.qos.logback:logback-classic:1.4.6",
       "org.testcontainers:postgresql:1.19.3",
       "org.junit.platform:junit-platform-launcher:1.10.1",
       "org.junit.platform:junit-platform-reporting:1.10.1",
       "org.junit.jupiter:junit-jupiter-api:5.10.1",
       "org.junit.jupiter:junit-jupiter-params:5.10.1",
       "org.junit.jupiter:junit-jupiter-engine:5.10.1",
   ],
)
use_repo(maven, "maven")

Let’s understand the above configuration in the MODULE.bazel file:

  • We have loaded the rules_jvm_external rules from Bazel Central Registry and loaded extensions to use third-party Maven dependencies.
  • We have configured all our Java application dependencies using Maven coordinates in the maven.install artifacts configuration.
  • We are loading the contrib_rules_jvm rules that supports running JUnit 5 tests as a suite.

Now, we can run the @maven//:pin program to create a JSON lockfile of the transitive dependencies, in a format that rules_jvm_external can use later:

bazel run @maven//:pin

Rename the generated file rules_jvm_external~4.5~maven~maven_install.json to maven_install.json. Now update the MODULE.bazel to reflect that we pinned the dependencies.

Add a lock_file attribute to the maven.install() and update the use_repo call to also expose the unpinned_maven repository used to update the dependencies:

maven.install(
    ...
    lock_file = "//:maven_install.json",
)

use_repo(maven, "maven", "unpinned_maven")

Now, when you update any dependencies, you can run the following command to update the lock file:

​​bazel run @unpinned_maven//:pin

Let’s configure our build targets in the customers/BUILD.bazel file, as follows:

load(
 "@bazel_tools//tools/jdk:default_java_toolchain.bzl",
 "default_java_toolchain", "DEFAULT_TOOLCHAIN_CONFIGURATION", "BASE_JDK9_JVM_OPTS", "DEFAULT_JAVACOPTS"
)

default_java_toolchain(
 name = "repository_default_toolchain",
 configuration = DEFAULT_TOOLCHAIN_CONFIGURATION,
 java_runtime = "@bazel_tools//tools/jdk:remotejdk_17",
 jvm_opts = BASE_JDK9_JVM_OPTS + ["--enable-preview"],
 javacopts = DEFAULT_JAVACOPTS + ["--enable-preview"],
 source_version = "17",
 target_version = "17",
)

load("@rules_jvm_external//:defs.bzl", "artifact")
load("@contrib_rules_jvm//java:defs.bzl", "JUNIT5_DEPS", "java_test_suite")

java_library(
   name = "customers-lib",
   srcs = glob(["src/main/java/**/*.java"]),
   deps = [
       artifact("org.postgresql:postgresql"),
       artifact("ch.qos.logback:logback-classic"),
   ],
)

java_library(
   name = "customers-test-resources",
   resources = glob(["src/test/resources/**/*"]),
)

java_test_suite(
   name = "customers-lib-tests",
   srcs = glob(["src/test/java/**/*.java"]),
   runner = "junit5",
   test_suffixes = [
       "Test.java",
       "Tests.java",
   ],
   runtime_deps = JUNIT5_DEPS,
   deps = [
       ":customers-lib",
       ":customers-test-resources",
       artifact("org.junit.jupiter:junit-jupiter-api"),
       artifact("org.junit.jupiter:junit-jupiter-params"),
       artifact("org.testcontainers:postgresql"),
   ],
)

Let’s understand this BUILD configuration:

  • We have loaded default_java_toolchain and then configured the Java version to 17.
  • We have configured a java_library target with the name customers-lib that will build the production jar file.
  • We have defined a java_test_suite target with the name customers-lib-tests to define our test suite, which will execute all the tests. We also configured the dependencies on the other target customers-lib and external dependencies.
  • We also defined another target with the name customers-test-resources to add non-Java sources (e.g., logging config files) to our test suite target as a dependency.

In the customers package, we have a CustomerService class that stores and retrieves customer details in a PostgreSQL database. And we have CustomerServiceTest that tests CustomerService methods using Testcontainers. Take a look at the GitHub repository for the complete code.

Note: You can use Gazelle, which is a Bazel build file generator, to generate the BUILD.bazel files instead of manually writing them.

Running Testcontainers tests

For running Testcontainers tests, we need a Testcontainers-supported container runtime. Let’s assume you have a local Docker installed using Docker Desktop.

Now, with our Bazel build configuration, we are ready to build and test the customers package:

# to run all build targets of customers package
$ bazel build //customers/...

# to run a specific build target of customers package
$ bazel build //customers:customers-lib

# to run all test targets of customers package
$ bazel test //customers/...

# to run a specific test target of customers package
$ bazel test //customers:customers-lib-tests

When you run the build for the first time, it will take time to download the required dependencies and then execute the targets. But, if you try to build or test again without any code or configuration changes, Bazel will not re-run the build/test again and will show the cached result. Bazel has a powerful caching mechanism that will detect code changes and run only the targets that are necessary to run.

While using Testcontainers, you define the required dependencies as part of code using Docker image names along with tags, such as Postgres:16. So, unless you change the code (e.g., Docker image name or tag), Bazel will cache the test results.

Similarly, we can use rules_go and Gazelle for configuring Bazel build for Go packages. Take a look at the MODULE.bazel and products/BUILD.bazel files to learn more about configuring Bazel in a Go package.

As mentioned earlier, we need a Testcontainers-supported container runtime for running Testcontainers tests. Installing Docker on complex CI platforms might be challenging, and you might need to use a complex Docker-in-Docker setup. Additionally, some Docker images might not be compatible with the operating system architecture (e.g., Apple M1). 

Testcontainers Cloud solves these problems by eliminating the need to have Docker on the localhost or CI runners and run the containers on cloud VMs transparently.

Here is an example of running the Testcontainers tests using Bazel on Testcontainers Cloud using GitHub Actions:

name: CI

on:
 push:
   branches:
     - '**'

jobs:
 build:
   runs-on: ubuntu-latest
   steps:
   - uses: actions/checkout@v4

   - name: Configure TestContainers cloud
     uses: atomicjar/testcontainers-cloud-setup-action@main
     with:
       wait: true
       token: ${{ secrets.TC_CLOUD_TOKEN }}

   - name: Cache Bazel
     uses: actions/cache@v3
     with:
       path: |
         ~/.cache/bazel
       key: ${{ runner.os }}-bazel-${{ hashFiles('.bazelversion', '.bazelrc', 'WORKSPACE', 'WORKSPACE.bazel', 'MODULE.bazel') }}
       restore-keys: |
         ${{ runner.os }}-bazel-

   - name: Build and Test
     run: bazel test --test_output=all //...

GitHub Actions runners already come with Bazelisk installed, so we can use Bazel out of the box. We have configured the TC_CLOUD_TOKEN environment variable through Secrets and started the Testcontainers Cloud agent. If you check the build logs, you can see that the tests are executed using Testcontainers Cloud.

Summary

We have shown how to use the Bazel build system to build and test monorepos with multiple modules using different programming languages. Combined with Testcontainers, you can make the builds self-contained and hermetic.

Although Bazel and Testcontainers help us have a self-contained build, we need to take extra measures to make it a hermetic build: 

  • Bazel can be configured to use a specific version of SDK, such as JDK 17, Go 1.20, etc., so that builds always use the same version instead of what is installed on the host machine. 
  • For Testcontainers tests, using Docker tag latest for container dependencies may result in non-deterministic behavior. Also, some Docker image publishers override the existing images using the same tag. To make the build/test deterministic, always use the Docker image digest so that the builds and tests always use the exact same version of images that gives reproducible and hermetic builds.
  • Using Testcontainers Cloud for running Testcontainers tests reduces the complexity of Docker setup and gives a deterministic container runtime environment.

Visit the Testcontainers website to learn more, and get started with Testcontainers Cloud by creating a free account.

Learn more

]]>
Azure Container Registry and Docker Hub: Connecting the Dots with Seamless Authentication and Artifact Cache https://www.docker.com/blog/azure-container-registry-and-docker-hub-connecting-the-dots-with-seamless-authentication-and-artifact-cache/ Thu, 29 Feb 2024 14:48:05 +0000 https://www.docker.com/?p=52583 By leveraging the wide array of public images available on Docker Hub, developers can accelerate development workflows, enhance productivity, and, ultimately, ship scalable applications that run like clockwork. When building with public content, acknowledging the potential operational risks associated with using that content without proper authentication is crucial. 

In this post, we will describe best practices for mitigating these risks and ensuring the security and reliability of your containers.

Black padlock on light blue digital background

Import public content locally

There are several advantages to importing public content locally. Doing so improves the availability and reliability of your public content pipeline and protects you from failed CI builds. By importing your public content, you can easily validate, verify, and deploy images to help run your business more reliably.

For more information on this best practice, check out the Open Container Initiative’s guide on Consuming Public Content.

Configure Artifact Cache to consume public content

Another best practice is to configure Artifact Cache to consume public content. Azure Container Registry’s (ACR) Artifact Cache feature allows you to cache your container artifacts in your own Azure Container Registry, even for private networks. This approach limits the impact of rate limits and dramatically increases pull reliability when combined with geo-replicated ACR, allowing you to pull artifacts from the region closest to your Azure resource. 

Additionally, ACR offers various security features, such as private networks, firewall configuration, service principals, and more, which can help you secure your container workloads. For complete information on using public content with ACR Artifact Cache, refer to the Artifact Cache technical documentation.

Authenticate pulls with public registries

We recommend authenticating your pull requests to Docker Hub using subscription credentials. Docker Hub offers developers the ability to authenticate when building with public library content. Authenticated users also have access to pull content directly from private repositories. For more information, visit the Docker subscriptions page. Microsoft Artifact Cache also supports authenticating with other public registries, providing an additional layer of security for your container workloads.

Following these best practices when using public content from Docker Hub can help mitigate security and reliability risks in your development and operational cycles. By importing public content locally, configuring Artifact Cache, and setting up preferred authentication methods, you can ensure your container workloads are secure and reliable.

Learn more about securing containers

Additional resources for improving container security for Microsoft and Docker customers

]]>