Visual disk-usage analyser for docker images

Overview

whaler

PyPI versions PyPI versions

What? A command-line tool for visually investigating the disk usage of docker images

Why? Large images are slow to move and expensive to store. They cost developer productivity by lengthening devops tasks and often contain unnecessary data

Who is this for? Primarily for engineers working with images containing Python packages.

User Stories

This tool should allow you to answer questions such as:

  1. Which file types are occupying the most disk space?
  2. Which are my largest Python packages?
  3. What are my unknown causes of high disk usage?

Quick start

pip install whaler

Run against a local directory

➜ whaler .venv
Running bash -c cd .venv && du -a -k
Done. Serving output at http://localhost:8000 (ctrl+c to exit)
Running python3 -m http.server 8000 --directory=_whaler/html

Run against a docker image

The tool will pull the image first if it is not present.

whaler --image='hl:latest' /
Running docker run --rm --entrypoint=du --workdir=/ hl:latest -a -k
Ignoring what seems to be non-fatal error(s):
du: cannot access './proc/1/task/1/fd/4': No such file or directory
du: cannot access './proc/1/task/1/fdinfo/4': No such file or directory
du: cannot access './proc/1/fd/3': No such file or directory
du: cannot access './proc/1/fdinfo/3': No such file or directory


Done. Serving output at http://localhost:8000 (ctrl+c to exit)
Running python3 -m http.server 8000 --directory=_whaler/html

HTML Report

Play with a hosted demo

Limitations

  1. Platform: whaler uses du to gather disk usage data. It must be present in your docker image
  2. Scale: I have tested the web UI with up to 500,000 file system nodes with du output of up to ~100MB.

Alternatives/Complements to this tool:

  1. Whaler can tell you what is taking up space in the final layer of your Docker image, but you may have intermediate layers which are contributing to the image size. For diving through the layers, use dive
    • Related: read up on multi-stage builds to understand how to mitigate the problem of intermediate layers bloating your image.
  2. For investigating disk usage in non-docker directories, Disk Inventory X is a great tool on OS X which I have based whaler on.
  3. GoogleContainerTools/distroless for base images not containing standard linux distro contents

See also

  1. a good blog post on making small python images

Developing

See .github/workflows/test.yml for the development platform and setup.

For UI, see whaler-ui

You might also like...
Lima is an alternative to using Docker Desktop on your Mac.
Lima is an alternative to using Docker Desktop on your Mac.

lima-xbar-plugin Table of Contents Description Installation Dependencies Lima is an alternative to using Docker Desktop on your Mac. Description This

Build Netbox as a Docker container

netbox-docker The Github repository houses the components needed to build Netbox as a Docker container. Images are built using this code and are relea

Some automation scripts to setup a deployable development database server (with docker).

Postgres-Docker Database Initializer This is a simple automation script that will create a Docker Postgres database with a custom username, password,

A job launching library for docker, EC2, GCP, etc.

doodad A library for packaging dependencies and launching scripts (with a focus on python) on different platforms using Docker. Currently supported pl

A repository containing a short tutorial for Docker (with Python).
A repository containing a short tutorial for Docker (with Python).

Docker Tutorial for IFT 6758 Lab In this repository, we examine the advtanges of virtualization, what Docker is and how we can deploy simple programs

Bitnami Docker Image for Python using snapshots for the system packages repositories

Python Snapshot packaged by Bitnami What is Python Snapshot? Python is a programming language that lets you work quickly and integrate systems more ef

Hatch plugin for Docker containers

hatch-containers CI/CD Package Meta This provides a plugin for Hatch that allows

Create pinned requirements.txt inside a Docker image using pip-tools
Create pinned requirements.txt inside a Docker image using pip-tools

Pin your Python dependencies! pin-requirements.py is a script that lets you pin your Python dependencies inside a Docker container. Pinning your depen

This repository contains useful docker-swarm-tools.

docker-swarm-tools This repository contains useful docker-swarm-tools. swarm-guardian This Docker image is intended to be used in a multihost docker e

Comments
  • Bounding box on selected node barely visible

    Bounding box on selected node barely visible

    This is due to the canvas not being fully re-drawn upon selection.

    The solution is probably to ensure the canvas can be re-drawn quickly and creating a fresh render every time

    bug 
    opened by alex-treebeard 0
Releases(0.1.2)
Owner
Treebeard Technologies
Helping with Data Infra
Treebeard Technologies
Lima is an alternative to using Docker Desktop on your Mac.

lima-xbar-plugin Table of Contents Description Installation Dependencies Lima is an alternative to using Docker Desktop on your Mac. Description This

Joe Block 68 Dec 22, 2022
A basic instruction for Kubernetes setup and understanding.

A basic instruction for Kubernetes setup and understanding Module ID Module Guide - Install Kubernetes Cluster k8s-install 3 Docker Core Technology mo

648 Jan 02, 2023
A Python Implementation for Git for learning

A pure Python implementation for Git based on Buliding Git

shidenggui 42 Jul 13, 2022
Deploy a simple Multi-Node Clickhouse Cluster with docker-compose in minutes.

Simple Multi Node Clickhouse Cluster I hate those single-node clickhouse clusters and manually installation, I mean, why should we: Running multiple c

Nova Kwok 11 Nov 18, 2022
Oncall is a calendar tool designed for scheduling and managing on-call shifts. It can be used as source of dynamic ownership info for paging systems like http://iris.claims.

Oncall See admin docs for information on how to run and manage Oncall. Development setup Prerequisites Debian/Ubuntu - sudo apt-get install libsasl2-d

LinkedIn 928 Dec 22, 2022
This project shows how to serve an TF based image classification model as a web service with TFServing, Docker, and Kubernetes(GKE).

Deploying ML models with CPU based TFServing, Docker, and Kubernetes By: Chansung Park and Sayak Paul This project shows how to serve a TensorFlow ima

Chansung Park 104 Dec 28, 2022
Docker Container wallstreetbets-sentiment-analysis

Docker Container wallstreetbets-sentiment-analysis A docker container using restful endpoints exposed on port 5000 "/analyze" to gather sentiment anal

145 Nov 22, 2022
Run Oracle on Kubernetes with El Carro

El Carro is a new project that offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a p

Google Cloud Platform 205 Dec 30, 2022
Nagios status monitor for your desktop.

Nagstamon Nagstamon is a status monitor for the desktop. It connects to multiple Nagios, Icinga, Opsview, Centreon, Op5 Monitor/Ninja, Checkmk Multisi

Henri Wahl 361 Jan 05, 2023
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Apache Airflow Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are define

The Apache Software Foundation 28.6k Jan 01, 2023
Get Response Of Container Deployment Kube with python

get-response-of-container-deployment-kube 概要 get-response-of-container-deployment-kube は、例えばエッジコンピューティング環境のコンテナデプロイメントシステムにおいて、デプロイ元の端末がデプロイ先のコンテナデプロイ

Latona, Inc. 3 Nov 05, 2021
IP address management (IPAM) and data center infrastructure management (DCIM) tool.

NetBox is an IP address management (IPAM) and data center infrastructure management (DCIM) tool. Initially conceived by the network engineering team a

NetBox Community 11.8k Jan 07, 2023
Create pinned requirements.txt inside a Docker image using pip-tools

Pin your Python dependencies! pin-requirements.py is a script that lets you pin your Python dependencies inside a Docker container. Pinning your depen

4 Aug 18, 2022
Ralph is the CMDB / Asset Management system for data center and back office hardware.

Ralph Ralph is full-featured Asset Management, DCIM and CMDB system for data centers and back offices. Features: keep track of assets purchases and th

Allegro Tech 1.9k Jan 01, 2023
Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App

Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App This example provisions a Google Kubernetes Engine

Pas Apicella 2 Feb 09, 2022
Blazingly-fast :rocket:, rock-solid, local application development :arrow_right: with Kubernetes.

Gefyra Gefyra gives Kubernetes-("cloud-native")-developers a completely new way of writing and testing their applications. Over are the times of custo

Michael Schilonka 352 Dec 26, 2022
sysctl/sysfs settings on a fly for Kubernetes Cluster. No restarts are required for clusters and nodes.

SysBindings Daemon Little toolkit for control the sysctl/sysfs bindings on Kubernetes Cluster on the fly and without unnecessary restarts of cluster o

Wallarm 19 May 06, 2022
Wiremind Kubernetes helper

Wiremind Kubernetes helper This Python library is a high-level set of Kubernetes Helpers allowing either to manage individual standard Kubernetes cont

Wiremind 3 Oct 09, 2021
Hubble - Network, Service & Security Observability for Kubernetes using eBPF

Network, Service & Security Observability for Kubernetes What is Hubble? Getting Started Features Service Dependency Graph Metrics & Monitoring Flow V

Cilium 2.4k Jan 04, 2023