# Testing and Validating Skills This guide helps you validate skills before adding them to the repository or using them in production. ## Quick Validation Checklist Run through this checklist before submitting a skill: ``` Metadata [ ] SKILL.md exists [ ] YAML frontmatter is valid [ ] Name ≤ 64 characters [ ] Description ≤ 1024 characters [ ] Description includes trigger scenarios Content Quality [ ] "When to Use This Skill" section present [ ] At least one concrete example [ ] Examples are runnable/testable [ ] File references are accurate [ ] No sensitive data hardcoded Triggering Tests [ ] Triggers on target scenarios [ ] Doesn't trigger on unrelated scenarios [ ] No conflicts with similar skills Security [ ] No credentials or API keys [ ] No personal information [ ] Safe file system access only [ ] External dependencies verified ``` ## Detailed Testing Process ### 1. Metadata Validation #### Test YAML Parsing Try parsing the frontmatter: ```bash # Extract and validate YAML head -n 10 SKILL.md | grep -A 3 "^---$" ``` Verify: - YAML is valid (no syntax errors) - Both `name` and `description` are present - Values are within character limits #### Character Limits ```bash # Count characters in name (must be ≤ 64) grep "^name:" SKILL.md | sed 's/name: //' | wc -c # Count characters in description (must be ≤ 1024) grep "^description:" SKILL.md | sed 's/description: //' | wc -c ``` ### 2. Content Quality Testing #### Check Required Sections ```bash # Verify "When to Use This Skill" section exists grep -i "when to use" SKILL.md # Verify examples exist grep -i "example" SKILL.md ``` #### Test File References If skill references other files, verify they exist: ```bash # Find markdown links grep -o '\[.*\]([^)]*\.md)' SKILL.md # Check if referenced files exist # (manually verify each one) ``` #### Validate Examples For each example in the skill: 1. Try running the code/commands 2. Verify output matches expectations 3. Check for edge cases 4. Ensure examples are complete (no placeholders) ### 3. Trigger Testing This is the most important validation step. #### Create Test Scenarios **Positive Tests (SHOULD trigger)** Create a list of scenarios where the skill should activate: ```markdown Test Scenario 1: [Describe task that should trigger] Expected: Skill activates Actual: [Test result] Test Scenario 2: [Another trigger case] Expected: Skill activates Actual: [Test result] ``` **Negative Tests (SHOULD NOT trigger)** Create scenarios where the skill should NOT activate: ```markdown Test Scenario 3: [Similar but different task] Expected: Skill does NOT activate Actual: [Test result] Test Scenario 4: [Unrelated task] Expected: Skill does NOT activate Actual: [Test result] ``` #### Example Testing Session For a "Python Testing with pytest" skill: **Should Trigger:** - "Help me write tests for my Python function" - "How do I use pytest fixtures?" - "Create unit tests for this class" **Should NOT Trigger:** - "Help me test my JavaScript code" (different language) - "Debug my pytest installation" (installation, not testing) - "Explain what unit testing is" (concept, not implementation) #### Run Tests with Claude 1. Load the skill 2. Ask Claude each test question 3. Observe if skill triggers (check response for skill context) 4. Document results ### 4. Token Efficiency Testing #### Measure Content Size ```bash # Count tokens (approximate: words × 1.3) wc -w SKILL.md # Or use a proper token counter # (tokens ≈ characters ÷ 4 for rough estimate) wc -c SKILL.md ``` #### Evaluate Split Points Ask yourself: - Is content loaded only when needed? - Could mutually exclusive sections be split? - Are examples concise but complete? - Is reference material in separate files? Target sizes: - **SKILL.md**: Under 3000 tokens (core workflows) - **Additional files**: Load only when referenced - **Total metadata**: ~100 tokens ### 5. Security Validation #### Automated Checks ```bash # Check for potential secrets grep -iE "(password|api[_-]?key|secret|token|credential)" SKILL.md # Check for hardcoded paths grep -E "(/Users/|/home/|C:\\\\)" SKILL.md # Check for sensitive file extensions grep -E "\.(key|pem|cert|p12|pfx)( |$)" SKILL.md ``` #### Manual Review Review each file for: - [ ] No credentials in examples - [ ] No personal information - [ ] File paths are generic/relative - [ ] Network access is documented - [ ] External dependencies are from trusted sources - [ ] Scripts don't make unsafe system changes ### 6. Cross-Skill Conflict Testing If you have multiple skills installed: 1. **Similar domain overlap**: Test that specific skills trigger (not generic ones) 2. **Keyword conflicts**: Check if multiple skills trigger on same query 3. **Description clarity**: Ensure each skill's domain is distinct Example conflicts to avoid: - "Python Helper" (too generic) vs "Python Testing with pytest" (specific) - Both trigger on "Help with Python" → Fix by making descriptions more specific ## Testing Workflows ### Quick Test (5 minutes) For minor updates or simple skills: 1. ✓ Validate metadata (YAML, character limits) 2. ✓ Check one example works 3. ✓ Test one positive trigger 4. ✓ Test one negative trigger 5. ✓ Scan for secrets ### Standard Test (15 minutes) For new skills or significant changes: 1. ✓ Complete metadata validation 2. ✓ Test all examples 3. ✓ Run 3-5 trigger tests (positive + negative) 4. ✓ Check token efficiency 5. ✓ Full security review 6. ✓ Verify file references ### Comprehensive Test (30+ minutes) For complex skills or pre-release: 1. ✓ All standard tests 2. ✓ Test with different Claude models 3. ✓ Test conflict scenarios with other skills 4. ✓ Have someone else try the skill 5. ✓ Test edge cases in examples 6. ✓ Review progressive disclosure strategy 7. ✓ Load test (simulate typical usage) ## Common Issues and Fixes ### Skill Doesn't Trigger **Symptoms**: Claude doesn't load skill context when expected **Diagnose**: 1. Description too vague? 2. Description missing trigger keywords? 3. Name too generic? **Fix**: ```yaml # Before description: Python development helpers # After description: Create Python projects using Hatch and Hatchling for dependency management. Use when initializing new Python packages or configuring build systems. ``` ### Skill Triggers Too Often **Symptoms**: Skill loads for unrelated queries **Diagnose**: 1. Description too broad? 2. Keywords too common? **Fix**: ```yaml # Add specificity and exclusions description: Debug Swift applications using LLDB for crashes, memory issues, and runtime errors. Use when investigating Swift bugs or analyzing app behavior. NOT for general Swift coding or learning. ``` ### Examples Don't Work **Symptoms**: Users can't reproduce examples **Diagnose**: 1. Missing prerequisites? 2. Placeholders not explained? 3. Environment-specific code? **Fix**: - Add prerequisites section - Make examples self-contained - Use generic paths and values ### High Token Usage **Symptoms**: Skill loads too much content **Diagnose**: 1. Too much in SKILL.md? 2. No progressive disclosure? 3. Verbose examples? **Fix**: - Split reference material to separate files - Link to external resources - Condense examples - Move advanced content to on-demand files ## Automated Testing (Advanced) For repositories with many skills, consider automation: ### Validate All Skills ```bash #!/bin/bash # validate-skills.sh for skill_dir in */; do if [ -f "$skill_dir/SKILL.md" ]; then echo "Validating $skill_dir..." # Check frontmatter exists if ! grep -q "^---$" "$skill_dir/SKILL.md"; then echo "❌ Missing YAML frontmatter" fi # Check name length name=$(grep "^name:" "$skill_dir/SKILL.md" | sed 's/name: //') if [ ${#name} -gt 64 ]; then echo "❌ Name too long: ${#name} chars" fi # Check for secrets if grep -qiE "(password|api[_-]?key|secret)" "$skill_dir/SKILL.md"; then echo "⚠️ Potential secrets found" fi echo "✓ $skill_dir validated" fi done ``` ### CI/CD Integration Add to GitHub Actions or similar: ```yaml name: Validate Skills on: [push, pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Run validation run: | chmod +x validate-skills.sh ./validate-skills.sh ``` ## Documentation Testing Ensure documentation is accurate: 1. **Links work**: All markdown links resolve 2. **Paths are correct**: File references are accurate 3. **Examples are current**: Code samples match latest versions 4. **Formatting is consistent**: Markdown renders correctly ```bash # Check for broken internal links grep -r '\[.*\](.*\.md)' . | while read line; do # Extract and verify file exists # (implementation left as exercise) done ``` ## User Acceptance Testing The ultimate test is real usage: 1. **Give skill to others**: Have colleagues test it 2. **Monitor usage**: See when it triggers in practice 3. **Gather feedback**: Ask users about clarity and usefulness 4. **Iterate**: Refine based on real-world usage ## Testing Checklist Template Copy this for each skill you test: ```markdown # Testing Report: [Skill Name] Date: [YYYY-MM-DD] Tester: [Name] ## Metadata - [ ] YAML valid - [ ] Name ≤ 64 chars - [ ] Description ≤ 1024 chars - [ ] Trigger scenarios in description ## Content - [ ] "When to Use" section present - [ ] Examples runnable - [ ] File references accurate - [ ] No secrets ## Triggering Positive tests: 1. [Scenario] - Result: [ ] Pass [ ] Fail 2. [Scenario] - Result: [ ] Pass [ ] Fail Negative tests: 1. [Scenario] - Result: [ ] Pass [ ] Fail 2. [Scenario] - Result: [ ] Pass [ ] Fail ## Security - [ ] No credentials - [ ] No personal data - [ ] Safe file access - [ ] Dependencies verified ## Overall - [ ] Ready for production - [ ] Needs revision - [ ] Rejected Notes: [Any additional observations] ``` ## Resources - [claude-skills/SKILL.md](./claude-skills/SKILL.md) - Best practices guide - [claude-skills/checklist.md](./claude-skills/checklist.md) - Quality checklist - [CONTRIBUTING.md](./CONTRIBUTING.md) - Contribution guidelines --- **Remember**: Testing isn't just about finding bugs—it's about ensuring your skill provides real value and triggers at the right time.