← Ascendy 한국어

backend

ERROR showed, INFO vanished — the two traps that swallowed our Celery logs

· Ascendy Engineering


TL;DR

Background — “not running” and “running but muted” are different

While chasing another issue, I looked at the worker logs. Task-start INFO lines — there were none. My first suspicion was “the worker died and tasks aren’t running.” But ERROR lines were printing just fine.

ERROR shows, INFO doesn’t. That asymmetry points away from “tasks aren’t running” and toward “tasks run, but the INFO level is muted.” So I separated the two first — with an active probe that dispatches a task with a non-existent id to force a single logger.error(...).

# Probe to tell whether the dispatch + consume path is alive:
# a non-existent id triggers logger.error(...) via the not-found path
docker compose exec backend python -c "
from app.tasks.your_task import your_task
your_task.delay(99999)  # not-found path triggers logger.error
"
docker compose logs worker --since 30s | grep your_task

ERROR appeared in the worker’s stdout. That meant at least the probe task’s dispatch+consume path was alive, and suspicion narrowed straight to “INFO is being filtered out.”

Cause 1 — Python logging never initialized in the worker

logging.basicConfig(level=logging.INFO) only ran when the web app was imported. But the worker is a different door into the same code — it started with the root logger at the default WARNING, muting every logger.info(...) in the app modules.

On top of that, Celery prefork child processes can have their logging level re-touched by Celery’s own setup even after fork, so configuring once in the parent wasn’t enough. Three pieces together kept INFO alive down into the prefork children.

# At the top of the worker/beat entry module: keep INFO alive in every process
import logging
from celery.signals import setup_logging, worker_process_init


def _configure_app_logging() -> None:
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
        force=True,
    )


_configure_app_logging()


@setup_logging.connect
def _skip_celery_logging_setup(**kwargs):
    """If any handler is connected to this signal, Celery skips its own logging setup."""
    return


@worker_process_init.connect
def _init_logging_in_prefork_child(**kwargs):
    """Re-apply once per prefork child — fork inherits handlers, but Celery's
    per-child setup can revert the level."""
    _configure_app_logging()

Cause 2 — a YAML block scalar chopped the command apart

INFO was back. But something was still off — the worker’s --concurrency setting was being ignored. Inspecting the container’s actual CMD revealed that --loglevel, --queues, --concurrency, and --max-memory-per-child were running as separate (and failing) shell commands inside the container.

The culprit was command: |- combined with our container’s entrypoint. A literal block scalar (|) preserves the LF on every line. The key condition: Compose’s command doesn’t go through a shell automatically — our image’s entrypoint ran this command string via bash -lc, and once that string reaches bash, it treats the unquoted LFs as command separators. So in reality only celery ... worker ran, with zero arguments, and every other line scattered into separate (failing) commands. A single bug was simultaneously dropping concurrency, the memory limit, and the log level.

# BAD: literal |- preserves LFs → in an entrypoint that runs this via bash -lc, bash treats LF as a command separator
command: |-
  celery -A app.celery_app:celery worker
    --loglevel=INFO
    --queues=default,heavy
    --concurrency=1

# OK 1: folded `>-` folds LFs into spaces → arrives as a single command
command: >-
  celery -A app.celery_app:celery worker
    --loglevel=INFO
    --queues=default,heavy
    --concurrency=1

# OK 2: list form is explicit exec-style (shell semantics still depend on the image entrypoint)
command:
  - celery
  - -A
  - app.celery_app:celery
  - worker
  - --loglevel=INFO
  - --queues=default,heavy
  - --concurrency=1

How to avoid the same traps next time

What’s next


Authorship & citation: This post was written by Ascendy Engineering and may be re-cited with attribution. If you find an error, please let us know via a GitHub issue.


Tags: celery, python-logging, docker-compose, yaml, observability, debugging