Smart Cleanup โ
Claudux automatically identifies and removes obsolete documentation using semantic analysis rather than simple pattern matching.
The Documentation Rot Problem โ
Traditional approaches fail:
- Manual cleanup is time-consuming and error-prone
- Regex-based tools are too aggressive or miss subtle issues
- Stale content accumulates, confusing users
Claudux solution: AI-powered semantic analysis identifies truly obsolete content with high confidence before removal.
How Smart Cleanup Works โ
๐ง Semantic Analysis โ
Claudux analyzes documentation content against your current codebase:
Code cross-referencing:
- Function and class names mentioned in docs
- API endpoints documented vs. currently implemented
- Configuration options described vs. actually supported
- Dependencies referenced vs. currently installed
Context understanding:
- Distinguishes between deprecated and removed features
- Identifies redirect targets for moved content
- Preserves historical information that's still valuable
- Maintains links between related concepts
๐ Confidence Scoring โ
Cleanup decisions use confidence thresholds:
๐งน Cleanup Analysis:
๐ docs/api/legacy-auth.md
โ References deleted AuthService class (95% confidence)
โ Documents removed /auth/token endpoint (98% confidence)
โ
REMOVE: High confidence obsolete content
๐ docs/guide/old-setup.md
โ ๏ธ References deprecated setupV1() (75% confidence)
โน๏ธ KEEP: May still be relevant for migration users
Threshold policy:
- โฅ95% confidence: Automatic removal
- 75-94% confidence: Flag for manual review
- <75% confidence: Preserve content
What Gets Cleaned Up โ
๐๏ธ Automatically Removed โ
Dead code references:
## Using the authenticate() method (REMOVED)
The authenticate() method was removed in v2.0...
Broken API documentation:
### POST /api/v1/legacy (404)
This endpoint returns user session data...
Obsolete configuration:
# This section references deleted config.legacy.yml
legacy:
enabled: true
โ ๏ธ Preserved Content โ
Migration documentation:
## Migrating from v1 to v2
While v1 APIs are deprecated, they remain supported...
Troubleshooting guides:
## Common Issues with Legacy Setups
If you're still using the old configuration...
Historical context:
## Design Decision: Why We Moved Away from X
In v1, we used X for Y, but discovered...
Cleanup Process โ
๐ Detection Phase โ
- Content inventory: Catalogs all documentation files
- Reference extraction: Finds code/API references in each doc
- Cross-validation: Checks references against current codebase
- Confidence calculation: Scores likelihood of obsolescence
๐งน Removal Phase โ
- High-confidence removal: Deletes files with โฅ95% obsolescence confidence
- Partial cleanup: Removes obsolete sections within otherwise valid files
- Link updates: Updates internal links affected by removals
- Navigation cleanup: Removes dead links from sidebar/nav
โ Validation Phase โ
- Link integrity: Ensures no broken internal links remain
- Content gaps: Identifies missing documentation after cleanup
- Structure validation: Confirms navigation hierarchy is intact
Example Cleanup Session โ
๐งน Smart cleanup starting...
๐ Analysis Results:
โข 47 documentation files scanned
โข 156 code references validated
โข 3 obsolete files identified
โข 12 outdated sections found
๐๏ธ Removing obsolete content:
โ docs/api/v1-authentication.md (96% confidence - API removed)
โ docs/guide/old-deployment.md (98% confidence - process changed)
โ Section in docs/config.md about legacy.yml (94% confidence)
๐ Updating affected links:
โ Updated 8 internal references
โ Removed 3 navigation entries
โ
Cleanup complete! 3 files removed, 8 files updated
Manual Override Options โ
Protected Content โ
Use skip markers to prevent cleanup of specific content:
<!-- skip -->
## Legacy API Reference (Keep for Migration Users)
This documents the old v1 API that some users still rely on...
<!-- /skip -->
Confidence Threshold Control โ
Adjust cleanup aggressiveness:
# Conservative cleanup (โฅ98% confidence)
claudux update -m "Conservative cleanup - only remove obviously obsolete content"
# Aggressive cleanup (โฅ85% confidence)
claudux update -m "Aggressive cleanup - remove likely obsolete content"
Dry Run Analysis โ
Preview what would be cleaned up without making changes:
claudux update -m "Analyze obsolete content but don't remove anything"
Integration with Generation โ
Smart cleanup runs automatically during claudux update
:
- Pre-generation cleanup: Remove high-confidence obsolete files
- Content generation: Create/update documentation
- Post-generation validation: Final link check and structure validation
This ensures the generation process works with clean, accurate existing content.
Safety Features โ
๐ก๏ธ Protected Paths โ
Cleanup never touches protected directories:
notes/
,private/
.git/
,node_modules/
- Any path matching protection patterns in
lib/content-protection.sh
๐ Change Logging โ
All cleanup actions are logged:
๐ Cleanup Summary:
๐๏ธ Removed: docs/api/legacy.md (confidence: 96%)
๐ Updated: docs/guide/setup.md (removed legacy section)
๐ Fixed: 5 broken internal links
๐ Reversible Actions โ
Since claudux works with git repositories:
- All changes are trackable via git history
- Easy to revert:
git checkout -- docs/
- Commit-by-commit review of cleanup decisions
Best Practices โ
Regular cleanup:
# Weekly/monthly cleanup
claudux update # Includes cleanup automatically
Before major releases:
claudux update -m "Thorough cleanup before v2.0 release"
Migration periods:
claudux update -m "Clean up v1 docs but preserve migration guides"
The smart cleanup feature ensures your documentation stays lean, accurate, and trustworthy without manual maintenance overhead.