Test/Supertonic

Fork 0

Files

History

LGram16 77af47274c initial commit

2026-01-25 18:58:40 +09:00

.gitignore

initial commit

2026-01-25 18:58:40 +09:00

helper.js

initial commit

2026-01-25 18:58:40 +09:00

index.html

initial commit

2026-01-25 18:58:40 +09:00

main.js

initial commit

2026-01-25 18:58:40 +09:00

package.json

initial commit

2026-01-25 18:58:40 +09:00

README.md

initial commit

2026-01-25 18:58:40 +09:00

style.css

initial commit

2026-01-25 18:58:40 +09:00

vite.config.js

initial commit

2026-01-25 18:58:40 +09:00

README.md

Supertonic Web Example

This example demonstrates how to use Supertonic in a web browser using ONNX Runtime Web.

📰 Update News

2026.01.06 - 🎉 Supertonic 2 released with multilingual support! Now supports English (en), Korean (ko), Spanish (es), Portuguese (pt), and French (fr). Demo | Models

2025.12.10 - Added 6 new voice styles (M3, M4, M5, F3, F4, F5). See Voices for details

2025.12.08 - Optimized ONNX models via OnnxSlim now available on Hugging Face Models

2025.11.23 - Enhanced text preprocessing with comprehensive normalization, emoji removal, symbol replacement, and punctuation handling for improved synthesis quality.

2025.11.19 - Added speed control slider to adjust speech synthesis speed (default: 1.05, recommended range: 0.9-1.5).

2025.11.19 - Added automatic text chunking for long-form inference. Long texts are split into chunks and synthesized with natural pauses.

Features

🌐 Runs entirely in the browser (no server required for inference)
🚀 WebGPU support with automatic fallback to WebAssembly
🌍 Multilingual support: English (en), Korean (ko), Spanish (es), Portuguese (pt), French (fr)
⚡ Pre-extracted voice styles for instant generation
🎨 Modern, responsive UI
🎭 Multiple voice style presets (5 Male, 5 Female)
💾 Download generated audio as WAV files
📊 Detailed generation statistics (audio length, generation time)
⏱️ Real-time progress tracking

Requirements

Node.js (for development server)
Modern web browser (Chrome, Edge, Firefox, Safari)

Installation

Install dependencies:

npm install

Running the Demo

Start the development server:

npm run dev

This will start a local development server (usually at http://localhost:3000) and open the demo in your browser.

Usage

Wait for Models to Load: The app will automatically load models and the default voice style (M1)
Select Voice Style: Choose from available voice presets
- Male 1-5 (M1-M5): Male voice styles
- Female 1-5 (F1-F5): Female voice styles
Select Language: Choose the language that matches your input text
- English (en): Default language
- 한국어 (ko): Korean
- Español (es): Spanish
- Português (pt): Portuguese
- Français (fr): French
Enter Text: Type or paste the text you want to convert to speech
Adjust Settings (optional):
- Total Steps: More steps = better quality but slower (default: 5)
Generate Speech: Click the "Generate Speech" button
View Results:
- See the full input text
- View audio length and generation time statistics
- Play the generated audio in the browser
- Download as WAV file

Multilingual Support

Supertonic 2 supports multiple languages. Make sure to select the correct language for your input text to get the best results. The model will automatically handle text preprocessing and pronunciation for the selected language.

Technical Details

Browser Compatibility

This demo uses:

ONNX Runtime Web: For running models in the browser
Web Audio API: For playing generated audio
Vite: For development and bundling

Notes

The ONNX models must be accessible at assets/onnx/ relative to the web root
Voice style JSON files must be accessible at assets/voice_styles/ relative to the web root
Pre-extracted voice styles enable instant generation without audio processing
Ten voice style presets are provided (M1-M5, F1-F5)

Troubleshooting

Models not loading

Check browser console for errors
Ensure assets/onnx/ path is correct and models are accessible
Check CORS settings if serving from a different domain

WebGPU not available

WebGPU is only available in recent Chrome/Edge browsers (version 113+)
The app will automatically fall back to WebAssembly if WebGPU is not available
Check the backend badge to see which execution provider is being used

Out of memory errors

Try shorter text inputs
Reduce denoising steps
Use a browser with more available memory
Close other tabs to free up memory

Audio quality issues

Try different voice style presets
Increase denoising steps for better quality

Slow generation

If using WebAssembly, try a browser that supports WebGPU
Ensure no other heavy processes are running
Consider using fewer denoising steps for faster (but lower quality) results