ST014: NoLargeFiles
Overview
| Property | Value |
|---|---|
| ID | ST014 |
| Name | NoLargeFiles |
| Group | structure |
| Severity | WARNING |
Description
Checks that the repository does not contain large files or binary files in the source directory.
Large files and binary files can:
- Bloat repository size and slow down cloning
- Cause merge conflicts that are difficult to resolve
- Make version control inefficient for files that change frequently
What it checks
The check scans the repository for:
- Large files (>1MB by default): Any file exceeding the size threshold
- Binary files in src/: Files with binary extensions in the source directory, including:
- Pickle files:
.pkl,.pickle - HDF5 files:
.h5,.hdf5 - Binary data:
.bin - Executables:
.exe,.dll,.so,.dylib - Python bytecode:
.pyc,.pyo,.pyd
- Pickle files:
The check respects .gitignore patterns and always ignores common directories like .git, __pycache__, .tox, .nox, and .pytest_cache.
How to fix
For large files
Remove and add to .gitignore: If the file shouldn’t be in version control
git rm --cached large_file.dat echo "large_file.dat" >> .gitignoreUse Git LFS: For large files that need version control
git lfs install git lfs track "*.dat" git add .gitattributesStore elsewhere: Use external storage (S3, package data downloads) for large datasets
For binary files in src/
- Move test fixtures to a
tests/fixtures/directory - Use package resources or importlib.resources for runtime data
- Generate binary files during build/install instead of storing them
Configuration
Skip this check
[tool.pycmdcheck]
skip = ["ST014"]CLI
pycmdcheck --skip ST014Configure file size threshold
[tool.pycmdcheck.options]
max_file_size_mb = 2.0 # Change threshold to 2MBCLI with custom threshold
pycmdcheck --check-config ST014.max_file_size_mb=2.0