VOX to SPH Converter

Convert Dialogic VOX to NIST SPHERE format

Drop files here. 1 GB maximum file size or Sign Up
to
Facebook Amazon Microsoft Tesla Nestle Walmart L'Oreal

Research Corpus Ready

NIST SPHERE is the standard for speech research data. Your VOX telephony recordings become training material.

Telephony to Research

Real call center audio in research format — valuable for building telephony-specific speech recognition models.

Bulk Processing

Convert entire collections of VOX recordings to SPH for corpus building.

How to convert VOX to SPH

1

Select files from Computer, Google Drive, Dropbox, URL or by dragging it on the page.

2

Choose sph or any other format you need as a result (more than 200 formats supported)

3

Let the file convert and you can download your sph file right afterwards

About formats

VOX is a headerless audio format built around Dialogic ADPCM encoding, widely adopted in telephony, interactive voice response (IVR) systems, and voice mail platforms since the 1980s. Each audio sample is compressed into 4 bits using an algorithm developed by Oki Electric and implemented in hardware on Dialogic Corporation's telephony interface cards. VOX files typically use a sampling rate of 6000 or 8000 Hz, producing extremely compact recordings optimized for speech intelligibility rather than musical fidelity. Because the format carries no header, playback software must know the sample rate and encoding parameters in advance — a trade-off that reduces overhead but demands careful file management. The primary advantage of VOX is storage efficiency: a one-minute voice recording at 8 kHz occupies roughly 240 KB, making it practical for systems storing thousands of prompts. Dialogic ADPCM conforms to the ITU-T G.726 standard, ensuring interoperability across telephony equipment from different vendors. Even as modern call centers migrate to IP-based systems with codecs like Opus), vast libraries of VOX recordings persist in legacy IVR deployments and compliance archives worldwide.
Initial release: 1983
SPH is the file extension for audio stored in the NIST SPHERE (SPeech HEader REsources) format, a standard created by the U.S. National Institute of Standards and Technology around 1990. Built for speech research, SPH files carry a 1024-byte ASCII header packed with metadata — database identifiers, channel counts, sample rates, byte ordering, and compression type — making every recording self-describing. The underlying audio is typically 16-bit linear PCM sampled at 16 kHz, though other configurations are permitted. Researchers at NIST, DARPA, and universities worldwide rely on SPH for distributing speech corpora such as TIMIT, Switchboard, and the LDC collections that underpin modern automatic speech recognition systems. A key advantage is that the human-readable header lets scripts parse recording metadata without binary decoding. The format's strict standardization also eliminates ambiguity when sharing datasets across institutions and platforms. Because SPH files store uncompressed PCM, they preserve full audio fidelity — critical when training acoustic models where even small artifacts can skew results.
Initial release: 1990

Frequently Asked Questions

Why convert VOX to SPH?

SPHERE is the standard for speech research corpora. Converting VOX creates telephony training data for speech recognition.

What can open SPH files?

NIST SPHERE tools, SoX, HTK, and Kaldi read SPH files.

Is telephony VOX good for research?

Real-world telephony audio is valuable for training speech recognition — it represents actual call conditions.

Can I batch-convert for corpus building?

Upload multiple VOX files and convert to SPH simultaneously — efficient for building telephony speech corpora.

Is SPH the same as NIST?

Yes — SPH is the file extension, NIST refers to the originating organization.