ST014: NoLargeFiles

Overview

Property Value
ID ST014
Name NoLargeFiles
Group structure
Severity WARNING

Description

Checks that the repository does not contain large files or binary files in the source directory.

Large files and binary files can:

  • Bloat repository size and slow down cloning
  • Cause merge conflicts that are difficult to resolve
  • Make version control inefficient for files that change frequently

What it checks

The check scans the repository for:

  1. Large files (>1MB by default): Any file exceeding the size threshold
  2. Binary files in src/: Files with binary extensions in the source directory, including:
    • Pickle files: .pkl, .pickle
    • HDF5 files: .h5, .hdf5
    • Binary data: .bin
    • Executables: .exe, .dll, .so, .dylib
    • Python bytecode: .pyc, .pyo, .pyd

The check respects .gitignore patterns and always ignores common directories like .git, __pycache__, .tox, .nox, and .pytest_cache.

How to fix

For large files

  1. Remove and add to .gitignore: If the file shouldn’t be in version control

    git rm --cached large_file.dat
    echo "large_file.dat" >> .gitignore
  2. Use Git LFS: For large files that need version control

    git lfs install
    git lfs track "*.dat"
    git add .gitattributes
  3. Store elsewhere: Use external storage (S3, package data downloads) for large datasets

For binary files in src/

  1. Move test fixtures to a tests/fixtures/ directory
  2. Use package resources or importlib.resources for runtime data
  3. Generate binary files during build/install instead of storing them

Configuration

Skip this check

[tool.pycmdcheck]
skip = ["ST014"]

CLI

pycmdcheck --skip ST014

Configure file size threshold

[tool.pycmdcheck.options]
max_file_size_mb = 2.0  # Change threshold to 2MB

CLI with custom threshold

pycmdcheck --check-config ST014.max_file_size_mb=2.0