Skip to main content

audio_transcribe

Work in Progress This skill is currently under development and may change significantly.

Type: Shared Skill
Scope: All agents
Location: /skills/audio_transcribe/SKILL.md


Copy This Skill

📋 Click to view SKILL.md content
---
name: audio_transcribe
description: Transcribe audio files into text for processing
---

# Audio Transcribe

Transcribes audio files (voice messages, recordings) into text for processing by the agent.

## Usage

Invoke with: /audio_transcribe [audio_url_or_attachment]

## Examples

- /audio_transcribe https://example.com/meeting-recording.mp3
- /audio_transcribe [attached voice message]

## Supported Formats

- MP3
- WAV
- OGG
- M4A
- WebM

## Output

The agent will return:

## Transcription

[Full transcribed text of the audio]

---

Duration: 3:42 Confidence: 94%

## Options

Additional processing flags:

- --summarize - Provide a summary of the transcription
- --extract-action-items - Extract action items from the audio
- --speaker-identification - Identify different speakers

## Notes

- Maximum file size: 50MB
- For longer recordings, the transcription may take several minutes
- Quality depends on audio clarity and background noise
- Speaker identification works best with distinct voices 

Description

Transcribes audio files (voice messages, recordings) into text for processing by the agent.

Usage

/audio_transcribe [audio_url_or_attachment]

Examples

/audio_transcribe https://example.com/meeting-recording.mp3
/audio_transcribe [attached voice message]

Supported Formats

  • MP3
  • WAV
  • OGG
  • M4A
  • WebM

Output

The agent will return:

## Transcription

[Full transcribed text of the audio]

---

**Duration**: 3:42
**Confidence**: 94%

Options

You can request additional processing:

/audio_transcribe [audio] --summarize
/audio_transcribe [audio] --extract-action-items
/audio_transcribe [audio] --speaker-identification

Notes

  • Maximum file size: 50MB
  • For longer recordings, the transcription may take several minutes
  • Quality depends on audio clarity and background noise
  • Speaker identification works best with distinct voices