Overview

The image below shows a classical process for speaker diarization. The computation time of each step is diplayed on the left of the image.

This speaker diarization system was developed for broadcast news data. We do not supply a state of the art system dedicated to telephone data, although it is possible to adapt the system to process it. The only system for telephone data has been developed specifically for Media data (ELRA-S0272 and ELRA-e0024 corpus in ELRA). The system is optimized to minimize the only WER (not the DER).

Single and Cross show diarization evaluation

Corpus coming from 3 French evaluation campaigns:

  • ESTER 2 test: 26 broadcast news, prepared speech, 15 h
  • ETAPE test: 15 French TV and radio shows, spontaneous speech or very spontaneous speech, 7 h
  • REPERE test: 28 French TV shows, balanced between prepared and spontaneous speech, 3 h

The table shows the results given with the same thresholds (not the best for corpus).

Type Corpus Single-show DER Cross-show DER
CLR ESTER 2 11.27 % 20.43 %
CLR ETAPE 21.57 % 27.79 %
CLR REPERE 17.19 % 23.95 %
ILP ESTER 2 8.35 % 17.51 %
ILP ETAPE 24.49 % 26.31 %
ILP REPERE 15.46 % 19.59 %

Official evaluation campaign results

  • ESTER 2[1] test – 2008
    • 1st, single-show, CLR based system – DER 10.8%
  • ETAPE test – 2012
    • 2nd, single-show, ILP based system – DER 18.46%
    • 3rd, single-show, CLR based system – DER 18.89%
    • 1st, cross-show, ILP+CLR based system – DER 19.71%
    • 2nd, cross-show, CLR+CLR based system – DER 21.63%
  • REPERE test – 2013
    • 1st, single-show, ILP based system – DER 11.1%
    • 1st, cross-show, ILP+ILP based system – DER 14.2%

1. a Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. ;In INTERSPEECH. 2009
overview.txt · Last modified: 2014/08/27 10:53 by meignier