Skip to content

collect: compute size before creating tarball, reject if not enough fs space#42

Open
bloom1 wants to merge 7 commits into
masterfrom
ITEM-342-compute-size-before-collect
Open

collect: compute size before creating tarball, reject if not enough fs space#42
bloom1 wants to merge 7 commits into
masterfrom
ITEM-342-compute-size-before-collect

Conversation

@bloom1
Copy link
Copy Markdown
Member

@bloom1 bloom1 commented May 25, 2026

Note

Low Risk
Operational guardrails on a support/debug CLI; failures are explicit before heavy I/O, with no auth or production traffic impact.

Overview
wazo-debug collect now estimates how much data will be gathered and fails early with a clear RuntimeError if disk space is insufficient, instead of filling /tmp or the output path mid-run.

Before gather_facts, check_free_space compares free space on the temp dir and the output file’s filesystem (via shutil.disk_usage and st_dev). It assumes a worst-case tarball as large as the gathered tree (matching tar caf behavior). When temp and output share one filesystem, it requires room for both the staging copy and the archive (required_bytes * 2), plus a 500 MiB buffer. Error messages use human-readable sizes from _format_bytes.

compute_gathering_size walks the same log/config/engine paths as rsync, centralized in _log_source_paths, _config_source_paths, and _engine_info_source_paths. Size uses st_blocks * 512 (allocated blocks) and mirrors the Asterisk rsync rules: only top-level full* files under /var/log/asterisk, no subdirs. Gather commands were updated to use these helpers so estimates match what is actually copied.

Reviewed by Cursor Bugbot for commit 903f3e1. Bugbot is set up for automated code reviews on this repo. Configure here.

@bloom1 bloom1 added the mergeit label May 25, 2026
@notion-workspace
Copy link
Copy Markdown

@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/cdd14d73fcf44c1db5581f4cfeee1984

✔️ tox-linters SUCCESS in 2m 19s
✔️ debian-packaging-bookworm SUCCESS in 2m 18s

Comment thread wazo_debug/collect.py
@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/b6b0bbbcea8e4d738920926676c48db9

✔️ tox-linters SUCCESS in 2m 21s
✔️ debian-packaging-bookworm SUCCESS in 2m 15s

Comment thread wazo_debug/collect.py Outdated
with tempfile.TemporaryDirectory(prefix='wazo-debug-') as temp_directory:
logger.info('Created temporary directory: "%s"', temp_directory)

check_free_space(temp_directory)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check space for both temp_directory and parsed_args.output_file since both can live on different file system.

Since parsed_args.output_file is a compressed file, the required space should be less than the temp_directory (possibly use a standard or worst-case compression ratio, e.g: 80%)

Comment thread wazo_debug/collect.py Outdated
if free_bytes < needed_bytes:
raise RuntimeError(
f'Not enough free space on filesystem hosting "{target_directory}": '
f'{free_bytes} bytes available, but {needed_bytes} bytes are required '
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposition: rewrite the bytes into a meaningful value, e.g: 178291293 bytes available, but 629145600 bytes are required is less readable than 170.03 MiB available but are 600 MiB required

Suggested change
f'{free_bytes} bytes available, but {needed_bytes} bytes are required '
f'{_format_bytes(free_bytes)} bytes available, but {_format_bytes(needed_bytes)} bytes are required '

where _format_bytes is a small helper (claude generated):

def _format_bytes(n: int) -> str:
      size = float(n)
      for unit in ('B', 'KiB', 'MiB', 'GiB'):
          if abs(size) < 1024:
              return f'{size:.1f} {unit}'
          size /= 1024
      return f'{size:.1f} TiB'

Comment thread wazo_debug/collect.py
Comment on lines +99 to +127
def _path_size(path):
if os.path.islink(path):
return 0
if os.path.isfile(path):
try:
return os.path.getsize(path)
except OSError:
return 0
if not os.path.isdir(path):
return 0

is_asterisk_log_dir = path == ASTERISK_LOG_DIR
total = 0
for root, dirs, files in os.walk(path):
if is_asterisk_log_dir and root != path:
# rsync filter excludes asterisk subdirectories (only keeps full* at top level)
dirs[:] = []
continue
for name in files:
if is_asterisk_log_dir and not fnmatch.fnmatch(name, 'full*'):
continue
full = os.path.join(root, name)
if os.path.islink(full):
continue
try:
total += os.path.getsize(full)
except OSError:
continue
return total
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.getsize returns the logical size. Since we want to calculate the actual space needed, it would be safer to calculate the block used instead. We can use the os.lstat syscall instead

Would need to import stat

Suggested change
def _path_size(path):
if os.path.islink(path):
return 0
if os.path.isfile(path):
try:
return os.path.getsize(path)
except OSError:
return 0
if not os.path.isdir(path):
return 0
is_asterisk_log_dir = path == ASTERISK_LOG_DIR
total = 0
for root, dirs, files in os.walk(path):
if is_asterisk_log_dir and root != path:
# rsync filter excludes asterisk subdirectories (only keeps full* at top level)
dirs[:] = []
continue
for name in files:
if is_asterisk_log_dir and not fnmatch.fnmatch(name, 'full*'):
continue
full = os.path.join(root, name)
if os.path.islink(full):
continue
try:
total += os.path.getsize(full)
except OSError:
continue
return total
def _path_size(path: str) -> int:
try:
st = os.lstat(path)
except OSError:
return 0
if stat.S_ISLNK(st.st_mode):
return 0
if stat.S_ISREG(st.st_mode):
return st.st_blocks * 512 # actual block size used (512 is POSIX standard, not FS block size)
if not stat.S_ISDIR(st.st_mode):
return 0
is_asterisk_log_dir = path == ASTERISK_LOG_DIR
total = 0
for root, dirs, files in os.walk(path, followlinks=False):
if is_asterisk_log_dir and root != path:
# rsync filter excludes asterisk subdirectories (only keeps full* at top level)
dirs[:] = []
continue
for name in files:
if is_asterisk_log_dir and not fnmatch.fnmatch(name, 'full*'):
continue
try:
entry_st = os.lstat(os.path.join(root, name))
except OSError:
continue
if stat.S_ISLNK(entry_st.st_mode):
continue
total += entry_st.st_blocks * 512
return total

@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/d116a57801d74ba38ec515db8bf0cd3d

✔️ tox-linters SUCCESS in 2m 24s
✔️ debian-packaging-bookworm SUCCESS in 2m 19s

@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/900624ec18f348659cdda93ca2b920f3

✔️ tox-linters SUCCESS in 2m 22s
✔️ debian-packaging-bookworm SUCCESS in 2m 19s

Comment thread wazo_debug/collect.py Outdated
Why: the compression method depends on the filename given by the user. If the
extension is only .tar, there is _no_ compression. Assuming double the size is
a safe alternative.
@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/1efe090a38d3412091719428d5f9a255

✔️ tox-linters SUCCESS in 2m 23s
✔️ debian-packaging-bookworm SUCCESS in 2m 17s

Comment thread wazo_debug/collect.py
@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/c6ce100a8a9e4badb5ad330975684e61

✔️ tox-linters SUCCESS in 2m 23s
✔️ debian-packaging-bookworm SUCCESS in 2m 21s

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3e8d937. Configure here.

Comment thread wazo_debug/collect.py Outdated
Why: remove multiple calls to shutil and remove noisy info logger
@bloom1 bloom1 force-pushed the ITEM-342-compute-size-before-collect branch from 3e8d937 to 903f3e1 Compare May 28, 2026 20:29
@wazo-community-zuul
Copy link
Copy Markdown
Contributor

Build succeeded.
https://zuul.wazo.community/zuul/t/local/buildset/d5f6c01e454b49c4a5a58e3b432100ac

✔️ tox-linters SUCCESS in 2m 20s
✔️ debian-packaging-bookworm SUCCESS in 2m 46s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants