claude-plugins/TESTING.md

# Testing and Validating Skills

This guide helps you validate skills before adding them to the repository or using them in production.

## Quick Validation Checklist

Run through this checklist before submitting a skill:

```
Metadata
[ ] SKILL.md exists
[ ] YAML frontmatter is valid
[ ] Name ≤ 64 characters
[ ] Description ≤ 1024 characters
[ ] Description includes trigger scenarios

Content Quality
[ ] "When to Use This Skill" section present
[ ] At least one concrete example
[ ] Examples are runnable/testable
[ ] File references are accurate
[ ] No sensitive data hardcoded

Triggering Tests
[ ] Triggers on target scenarios
[ ] Doesn't trigger on unrelated scenarios
[ ] No conflicts with similar skills

Security
[ ] No credentials or API keys
[ ] No personal information
[ ] Safe file system access only
[ ] External dependencies verified
```

## Detailed Testing Process

### 1. Metadata Validation

#### Test YAML Parsing

Try parsing the frontmatter:

```bash
# Extract and validate YAML
head -n 10 SKILL.md | grep -A 3 "^---$"
```

Verify:
- YAML is valid (no syntax errors)
- Both `name` and `description` are present
- Values are within character limits

#### Character Limits

```bash
# Count characters in name (must be ≤ 64)
grep "^name:" SKILL.md | sed 's/name: //' | wc -c

# Count characters in description (must be ≤ 1024)
grep "^description:" SKILL.md | sed 's/description: //' | wc -c
```

### 2. Content Quality Testing

#### Check Required Sections

```bash
# Verify "When to Use This Skill" section exists
grep -i "when to use" SKILL.md

# Verify examples exist
grep -i "example" SKILL.md
```

#### Test File References

If skill references other files, verify they exist:

```bash
# Find markdown links
grep -o '\[.*\]([^)]*\.md)' SKILL.md

# Check if referenced files exist
# (manually verify each one)
```

#### Validate Examples

For each example in the skill:
1. Try running the code/commands
2. Verify output matches expectations
3. Check for edge cases
4. Ensure examples are complete (no placeholders)

### 3. Trigger Testing

This is the most important validation step.

#### Create Test Scenarios

**Positive Tests (SHOULD trigger)**

Create a list of scenarios where the skill should activate:

```markdown
Test Scenario 1: [Describe task that should trigger]
Expected: Skill activates
Actual: [Test result]

Test Scenario 2: [Another trigger case]
Expected: Skill activates
Actual: [Test result]
```

**Negative Tests (SHOULD NOT trigger)**

Create scenarios where the skill should NOT activate:

```markdown
Test Scenario 3: [Similar but different task]
Expected: Skill does NOT activate
Actual: [Test result]

Test Scenario 4: [Unrelated task]
Expected: Skill does NOT activate
Actual: [Test result]
```

#### Example Testing Session

For a "Python Testing with pytest" skill:

**Should Trigger:**
- "Help me write tests for my Python function"
- "How do I use pytest fixtures?"
- "Create unit tests for this class"

**Should NOT Trigger:**
- "Help me test my JavaScript code" (different language)
- "Debug my pytest installation" (installation, not testing)
- "Explain what unit testing is" (concept, not implementation)

#### Run Tests with Claude

1. Load the skill
2. Ask Claude each test question
3. Observe if skill triggers (check response for skill context)
4. Document results

### 4. Token Efficiency Testing

#### Measure Content Size

```bash
# Count tokens (approximate: words × 1.3)
wc -w SKILL.md

# Or use a proper token counter
# (tokens ≈ characters ÷ 4 for rough estimate)
wc -c SKILL.md
```

#### Evaluate Split Points

Ask yourself:
- Is content loaded only when needed?
- Could mutually exclusive sections be split?
- Are examples concise but complete?
- Is reference material in separate files?

Target sizes:
- **SKILL.md**: Under 3000 tokens (core workflows)
- **Additional files**: Load only when referenced
- **Total metadata**: ~100 tokens

### 5. Security Validation

#### Automated Checks

```bash
# Check for potential secrets
grep -iE "(password|api[_-]?key|secret|token|credential)" SKILL.md

# Check for hardcoded paths
grep -E "(/Users/|/home/|C:\\\\)" SKILL.md

# Check for sensitive file extensions
grep -E "\.(key|pem|cert|p12|pfx)( |$)" SKILL.md
```

#### Manual Review

Review each file for:
- [ ] No credentials in examples
- [ ] No personal information
- [ ] File paths are generic/relative
- [ ] Network access is documented
- [ ] External dependencies are from trusted sources
- [ ] Scripts don't make unsafe system changes

### 6. Cross-Skill Conflict Testing

If you have multiple skills installed:

1. **Similar domain overlap**: Test that specific skills trigger (not generic ones)
2. **Keyword conflicts**: Check if multiple skills trigger on same query
3. **Description clarity**: Ensure each skill's domain is distinct

Example conflicts to avoid:
- "Python Helper" (too generic) vs "Python Testing with pytest" (specific)
- Both trigger on "Help with Python" → Fix by making descriptions more specific

## Testing Workflows

### Quick Test (5 minutes)

For minor updates or simple skills:

1. ✓ Validate metadata (YAML, character limits)
2. ✓ Check one example works
3. ✓ Test one positive trigger
4. ✓ Test one negative trigger
5. ✓ Scan for secrets

### Standard Test (15 minutes)

For new skills or significant changes:

1. ✓ Complete metadata validation
2. ✓ Test all examples
3. ✓ Run 3-5 trigger tests (positive + negative)
4. ✓ Check token efficiency
5. ✓ Full security review
6. ✓ Verify file references

### Comprehensive Test (30+ minutes)

For complex skills or pre-release:

1. ✓ All standard tests
2. ✓ Test with different Claude models
3. ✓ Test conflict scenarios with other skills
4. ✓ Have someone else try the skill
5. ✓ Test edge cases in examples
6. ✓ Review progressive disclosure strategy
7. ✓ Load test (simulate typical usage)

## Common Issues and Fixes

### Skill Doesn't Trigger

**Symptoms**: Claude doesn't load skill context when expected

**Diagnose**:
1. Description too vague?
2. Description missing trigger keywords?
3. Name too generic?

**Fix**:
```yaml
# Before
description: Python development helpers

# After
description: Create Python projects using Hatch and Hatchling for dependency management. Use when initializing new Python packages or configuring build systems.
```

### Skill Triggers Too Often

**Symptoms**: Skill loads for unrelated queries

**Diagnose**:
1. Description too broad?
2. Keywords too common?

**Fix**:
```yaml
# Add specificity and exclusions
description: Debug Swift applications using LLDB for crashes, memory issues, and runtime errors. Use when investigating Swift bugs or analyzing app behavior. NOT for general Swift coding or learning.
```

### Examples Don't Work

**Symptoms**: Users can't reproduce examples

**Diagnose**:
1. Missing prerequisites?
2. Placeholders not explained?
3. Environment-specific code?

**Fix**:
- Add prerequisites section
- Make examples self-contained
- Use generic paths and values

### High Token Usage

**Symptoms**: Skill loads too much content

**Diagnose**:
1. Too much in SKILL.md?
2. No progressive disclosure?
3. Verbose examples?

**Fix**:
- Split reference material to separate files
- Link to external resources
- Condense examples
- Move advanced content to on-demand files

## Automated Testing (Advanced)

For repositories with many skills, consider automation:

### Validate All Skills

```bash
#!/bin/bash
# validate-skills.sh

for skill_dir in */; do
    if [ -f "$skill_dir/SKILL.md" ]; then
        echo "Validating $skill_dir..."

        # Check frontmatter exists
        if ! grep -q "^---$" "$skill_dir/SKILL.md"; then
            echo "❌ Missing YAML frontmatter"
        fi

        # Check name length
        name=$(grep "^name:" "$skill_dir/SKILL.md" | sed 's/name: //')
        if [ ${#name} -gt 64 ]; then
            echo "❌ Name too long: ${#name} chars"
        fi

        # Check for secrets
        if grep -qiE "(password|api[_-]?key|secret)" "$skill_dir/SKILL.md"; then
            echo "⚠️  Potential secrets found"
        fi

        echo "✓ $skill_dir validated"
    fi
done
```

### CI/CD Integration

Add to GitHub Actions or similar:

```yaml
name: Validate Skills
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run validation
        run: |
          chmod +x validate-skills.sh
          ./validate-skills.sh
```

## Documentation Testing

Ensure documentation is accurate:

1. **Links work**: All markdown links resolve
2. **Paths are correct**: File references are accurate
3. **Examples are current**: Code samples match latest versions
4. **Formatting is consistent**: Markdown renders correctly

```bash
# Check for broken internal links
grep -r '\[.*\](.*\.md)' . | while read line; do
    # Extract and verify file exists
    # (implementation left as exercise)
done
```

## User Acceptance Testing

The ultimate test is real usage:

1. **Give skill to others**: Have colleagues test it
2. **Monitor usage**: See when it triggers in practice
3. **Gather feedback**: Ask users about clarity and usefulness
4. **Iterate**: Refine based on real-world usage

## Testing Checklist Template

Copy this for each skill you test:

```markdown
# Testing Report: [Skill Name]

Date: [YYYY-MM-DD]
Tester: [Name]

## Metadata
- [ ] YAML valid
- [ ] Name ≤ 64 chars
- [ ] Description ≤ 1024 chars
- [ ] Trigger scenarios in description

## Content
- [ ] "When to Use" section present
- [ ] Examples runnable
- [ ] File references accurate
- [ ] No secrets

## Triggering
Positive tests:
1. [Scenario] - Result: [ ] Pass [ ] Fail
2. [Scenario] - Result: [ ] Pass [ ] Fail

Negative tests:
1. [Scenario] - Result: [ ] Pass [ ] Fail
2. [Scenario] - Result: [ ] Pass [ ] Fail

## Security
- [ ] No credentials
- [ ] No personal data
- [ ] Safe file access
- [ ] Dependencies verified

## Overall
- [ ] Ready for production
- [ ] Needs revision
- [ ] Rejected

Notes:
[Any additional observations]
```

## Resources

- [claude-skills/SKILL.md](./claude-skills/SKILL.md) - Best practices guide
- [claude-skills/checklist.md](./claude-skills/checklist.md) - Quality checklist
- [CONTRIBUTING.md](./CONTRIBUTING.md) - Contribution guidelines

---

**Remember**: Testing isn't just about finding bugs—it's about ensuring your skill provides real value and triggers at the right time.
-												feat: Convert to Claude Code plugin marketplace

Transform repository into a plugin marketplace structure with two plugins:

- claude-code plugin: Complete toolkit with 5 skills
  * claude-code-plugins
  * claude-code-slash-commands
  * claude-code-hooks
  * claude-code-subagents
  * claude-code-memory

- claude-skills plugin: Meta-skill for creating Agent Skills
  * Comprehensive best practices guide
  * Templates and examples
  * Progressive disclosure patterns

Infrastructure:
- Add marketplace.json manifest
- Create plugin.json for each plugin
- Update documentation for marketplace structure
- Add contribution and testing guides

Installation:
- /plugin install claude-code@claude-skills
- /plugin install claude-skills@claude-skills

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-10-17 11:17:09 -05:00
+								# Testing and Validating Skills
 								This guide helps you validate skills before adding them to the repository or using them in production.
 								## Quick Validation Checklist
 								Run through this checklist before submitting a skill:
 								```
 								Metadata
 								[ ] SKILL.md exists
 								[ ] YAML frontmatter is valid
 								[ ] Name ≤ 64 characters
 								[ ] Description ≤ 1024 characters
 								[ ] Description includes trigger scenarios
 								Content Quality
 								[ ] "When to Use This Skill" section present
 								[ ] At least one concrete example
 								[ ] Examples are runnable/testable
 								[ ] File references are accurate
 								[ ] No sensitive data hardcoded
 								Triggering Tests
 								[ ] Triggers on target scenarios
 								[ ] Doesn't trigger on unrelated scenarios
 								[ ] No conflicts with similar skills
 								Security
 								[ ] No credentials or API keys
 								[ ] No personal information
 								[ ] Safe file system access only
 								[ ] External dependencies verified
 								```
 								## Detailed Testing Process
 								### 1. Metadata Validation
 								#### Test YAML Parsing
 								Try parsing the frontmatter:
 								```bash
 								# Extract and validate YAML
 								head -n 10 SKILL.md | grep -A 3 "^---$"
 								```
 								Verify:
 								- YAML is valid (no syntax errors)
 								- Both `name` and `description` are present
 								- Values are within character limits
 								#### Character Limits
 								```bash
 								# Count characters in name (must be ≤ 64)
 								grep "^name:" SKILL.md | sed 's/name: //' | wc -c
 								# Count characters in description (must be ≤ 1024)
 								grep "^description:" SKILL.md | sed 's/description: //' | wc -c
 								```
 								### 2. Content Quality Testing
 								#### Check Required Sections
 								```bash
 								# Verify "When to Use This Skill" section exists
 								grep -i "when to use" SKILL.md
 								# Verify examples exist
 								grep -i "example" SKILL.md
 								```
 								#### Test File References
 								If skill references other files, verify they exist:
 								```bash
 								# Find markdown links
 								grep -o '\[.*\]([^)]*\.md)' SKILL.md
 								# Check if referenced files exist
 								# (manually verify each one)
 								```
 								#### Validate Examples
 								For each example in the skill:
 . Try running the code/commands
 . Verify output matches expectations
 . Check for edge cases
 . Ensure examples are complete (no placeholders)
 								### 3. Trigger Testing
 								This is the most important validation step.
 								#### Create Test Scenarios
 								**Positive Tests (SHOULD trigger)**
 								Create a list of scenarios where the skill should activate:
 								```markdown
 								Test Scenario 1: [Describe task that should trigger]
 								Expected: Skill activates
 								Actual: [Test result]
 								Test Scenario 2: [Another trigger case]
 								Expected: Skill activates
 								Actual: [Test result]
 								```
 								**Negative Tests (SHOULD NOT trigger)**
 								Create scenarios where the skill should NOT activate:
 								```markdown
 								Test Scenario 3: [Similar but different task]
 								Expected: Skill does NOT activate
 								Actual: [Test result]
 								Test Scenario 4: [Unrelated task]
 								Expected: Skill does NOT activate
 								Actual: [Test result]
 								```
 								#### Example Testing Session
 								For a "Python Testing with pytest" skill:
 								**Should Trigger:**
 								- "Help me write tests for my Python function"
 								- "How do I use pytest fixtures?"
 								- "Create unit tests for this class"
 								**Should NOT Trigger:**
 								- "Help me test my JavaScript code" (different language)
 								- "Debug my pytest installation" (installation, not testing)
 								- "Explain what unit testing is" (concept, not implementation)
 								#### Run Tests with Claude
 . Load the skill
 . Ask Claude each test question
 . Observe if skill triggers (check response for skill context)
 . Document results
 								### 4. Token Efficiency Testing
 								#### Measure Content Size
 								```bash
 								# Count tokens (approximate: words × 1.3)
 								wc -w SKILL.md
 								# Or use a proper token counter
 								# (tokens ≈ characters ÷ 4 for rough estimate)
 								wc -c SKILL.md
 								```
 								#### Evaluate Split Points
 								Ask yourself:
 								- Is content loaded only when needed?
 								- Could mutually exclusive sections be split?
 								- Are examples concise but complete?
 								- Is reference material in separate files?
 								Target sizes:
 								- **SKILL.md**: Under 3000 tokens (core workflows)
 								- **Additional files**: Load only when referenced
 								- **Total metadata**: ~100 tokens
 								### 5. Security Validation
 								#### Automated Checks
 								```bash
 								# Check for potential secrets
 								grep -iE "(password|api[_-]?key|secret|token|credential)" SKILL.md
 								# Check for hardcoded paths
 								grep -E "(/Users/|/home/|C:\\\\)" SKILL.md
 								# Check for sensitive file extensions
 								grep -E "\.(key|pem|cert|p12|pfx)( |$)" SKILL.md
 								```
 								#### Manual Review
 								Review each file for:
 								- [ ] No credentials in examples
 								- [ ] No personal information
 								- [ ] File paths are generic/relative
 								- [ ] Network access is documented
 								- [ ] External dependencies are from trusted sources
 								- [ ] Scripts don't make unsafe system changes
 								### 6. Cross-Skill Conflict Testing
 								If you have multiple skills installed:
 . **Similar domain overlap**: Test that specific skills trigger (not generic ones)
 . **Keyword conflicts**: Check if multiple skills trigger on same query
 . **Description clarity**: Ensure each skill's domain is distinct
 								Example conflicts to avoid:
 								- "Python Helper" (too generic) vs "Python Testing with pytest" (specific)
 								- Both trigger on "Help with Python" → Fix by making descriptions more specific
 								## Testing Workflows
 								### Quick Test (5 minutes)
 								For minor updates or simple skills:
 . ✓ Validate metadata (YAML, character limits)
 . ✓ Check one example works
 . ✓ Test one positive trigger
 . ✓ Test one negative trigger
 . ✓ Scan for secrets
 								### Standard Test (15 minutes)
 								For new skills or significant changes:
 . ✓ Complete metadata validation
 . ✓ Test all examples
 . ✓ Run 3-5 trigger tests (positive + negative)
 . ✓ Check token efficiency
 . ✓ Full security review
 . ✓ Verify file references
 								### Comprehensive Test (30+ minutes)
 								For complex skills or pre-release:
 . ✓ All standard tests
 . ✓ Test with different Claude models
 . ✓ Test conflict scenarios with other skills
 . ✓ Have someone else try the skill
 . ✓ Test edge cases in examples
 . ✓ Review progressive disclosure strategy
 . ✓ Load test (simulate typical usage)
 								## Common Issues and Fixes
 								### Skill Doesn't Trigger
 								**Symptoms**: Claude doesn't load skill context when expected
 								**Diagnose**:
 . Description too vague?
 . Description missing trigger keywords?
 . Name too generic?
 								**Fix**:
 								```yaml
 								# Before
 								description: Python development helpers
 								# After
 								description: Create Python projects using Hatch and Hatchling for dependency management. Use when initializing new Python packages or configuring build systems.
 								```
 								### Skill Triggers Too Often
 								**Symptoms**: Skill loads for unrelated queries
 								**Diagnose**:
 . Description too broad?
 . Keywords too common?
 								**Fix**:
 								```yaml
 								# Add specificity and exclusions
 								description: Debug Swift applications using LLDB for crashes, memory issues, and runtime errors. Use when investigating Swift bugs or analyzing app behavior. NOT for general Swift coding or learning.
 								```
 								### Examples Don't Work
 								**Symptoms**: Users can't reproduce examples
 								**Diagnose**:
 . Missing prerequisites?
 . Placeholders not explained?
 . Environment-specific code?
 								**Fix**:
 								- Add prerequisites section
 								- Make examples self-contained
 								- Use generic paths and values
 								### High Token Usage
 								**Symptoms**: Skill loads too much content
 								**Diagnose**:
 . Too much in SKILL.md?
 . No progressive disclosure?
 . Verbose examples?
 								**Fix**:
 								- Split reference material to separate files
 								- Link to external resources
 								- Condense examples
 								- Move advanced content to on-demand files
 								## Automated Testing (Advanced)
 								For repositories with many skills, consider automation:
 								### Validate All Skills
 								```bash
 								#!/bin/bash
 								# validate-skills.sh
 								for skill_dir in */; do
 								    if [ -f "$skill_dir/SKILL.md" ]; then
 								        echo "Validating $skill_dir..."
 								        # Check frontmatter exists
 								        if ! grep -q "^---$" "$skill_dir/SKILL.md"; then
 								            echo "❌ Missing YAML frontmatter"
 								        fi
 								        # Check name length
 								        name=$(grep "^name:" "$skill_dir/SKILL.md" | sed 's/name: //')
 								        if [ ${#name} -gt 64 ]; then
 								            echo "❌ Name too long: ${#name} chars"
 								        fi
 								        # Check for secrets
 								        if grep -qiE "(password|api[_-]?key|secret)" "$skill_dir/SKILL.md"; then
 								            echo "⚠️  Potential secrets found"
 								        fi
 								        echo "✓ $skill_dir validated"
 								    fi
 								done
 								```
 								### CI/CD Integration
 								Add to GitHub Actions or similar:
 								```yaml
 								name: Validate Skills
 								on: [push, pull_request]
 								jobs:
 								  validate:
 								    runs-on: ubuntu-latest
 								    steps:
 								      - uses: actions/checkout@v2
 								      - name: Run validation
 								        run: |
 								          chmod +x validate-skills.sh
 								          ./validate-skills.sh
 								```
 								## Documentation Testing
 								Ensure documentation is accurate:
 . **Links work**: All markdown links resolve
 . **Paths are correct**: File references are accurate
 . **Examples are current**: Code samples match latest versions
 . **Formatting is consistent**: Markdown renders correctly
 								```bash
 								# Check for broken internal links
 								grep -r '\[.*\](.*\.md)' . | while read line; do
 								    # Extract and verify file exists
 								    # (implementation left as exercise)
 								done
 								```
 								## User Acceptance Testing
 								The ultimate test is real usage:
 . **Give skill to others**: Have colleagues test it
 . **Monitor usage**: See when it triggers in practice
 . **Gather feedback**: Ask users about clarity and usefulness
 . **Iterate**: Refine based on real-world usage
 								## Testing Checklist Template
 								Copy this for each skill you test:
 								```markdown
 								# Testing Report: [Skill Name]
 								Date: [YYYY-MM-DD]
 								Tester: [Name]
 								## Metadata
 								- [ ] YAML valid
 								- [ ] Name ≤ 64 chars
 								- [ ] Description ≤ 1024 chars
 								- [ ] Trigger scenarios in description
 								## Content
 								- [ ] "When to Use" section present
 								- [ ] Examples runnable
 								- [ ] File references accurate
 								- [ ] No secrets
 								## Triggering
 								Positive tests:
 . [Scenario] - Result: [ ] Pass [ ] Fail
 . [Scenario] - Result: [ ] Pass [ ] Fail
 								Negative tests:
 . [Scenario] - Result: [ ] Pass [ ] Fail
 . [Scenario] - Result: [ ] Pass [ ] Fail
 								## Security
 								- [ ] No credentials
 								- [ ] No personal data
 								- [ ] Safe file access
 								- [ ] Dependencies verified
 								## Overall
 								- [ ] Ready for production
 								- [ ] Needs revision
 								- [ ] Rejected
 								Notes:
 								[Any additional observations]
 								```
 								## Resources
 								- [claude-skills/SKILL.md](./claude-skills/SKILL.md) - Best practices guide
 								- [claude-skills/checklist.md](./claude-skills/checklist.md) - Quality checklist
 								- [CONTRIBUTING.md](./CONTRIBUTING.md) - Contribution guidelines
 								---
 								**Remember**: Testing isn't just about finding bugs—it's about ensuring your skill provides real value and triggers at the right time.