Hearing Aid Research Data Set for Acoustic Environment Recognition (HEAR-DS)
============================================================================

The HEAR-DS provides binaural audio material recorded in acoustic
environments, which are typical for hearing aid users. Its goal is to
support researchers to train and test algorithms in environments 
relevant for hearing aids, e.g. deep neural networks.

 * To refer to HEAR-DS in a publication
Please cite this paper with DOI 10.1109/ICASSP40776.2020.9053611:
[Hearing Aid Research Data Set for Acoustic Environment Recognition](https://ieeexplore.ieee.org/document/9053611)
(Andreas Hüwel, Dr. Kamil Adiloğlu and Dr. Jörg-Hendrik Bach), pulished at
[ICASSP2020](https://2020.ieeeicassp.org) 

 * Download 
[HEAR-DS download link](https://download.hoertech.de/hear-ds-data/HEAR-DS/)

 * Parts of HEAR-DS

HEAR-DS consists of this parts, for each its licensing see LICENSE.txt in subfolders:

 - HEAR-DS/RawAudioCuts
 - HEAR-DS/AudioSnippets 
 - HEAR-DS/Code


Acoustic Environments Overview
==============================

 | Cocktail party          |                                   |
 | Interfering speakers    |                                   |  
 | In traffic              | Speech in traffic                 |
 | In vehicle              | Speech in vehicle                 |
 | Music                   | Speech in music                   |
 | Quiet indoors           | Speech in quiet indoors           |
 | Reverberant environment | Speech in reverberant environment |
 | Wind turbulence         | Speech in wind turbulence         |


* AudioSnippets having Speech in Background SNR Variations:

The AudioSnippets folder currently contains the 10s audio snippets
from the in the channel (ITC) hearing aid, downsampled to 16kHz, with
speech randomly mixed with the background, resulting in
SNRs of 3dBA steps [-21 .. +21dBA].

As described in the paper, some audio material was used from 3rd
party, and thus cannot be provided here. But all the needed data is
accessible online. With our provided scripts
everyone can re-generate the whole data set by themselves.

Audio for interfering speakers comes from
[CHiME5](http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME5/data.html)
and the material for speech mixing for the speech in background
environments comes from CHiME2.  For [CHiME2 (2013) and CHiME5
(2018)](http://spandh.dcs.shef.ac.uk/chime_challenge), please contact
the organizators to get access to the data sets.
Audio for music comes from
[GTZan](http://marsyas.info/downloads/datasets.html).

To create the needed audio snippets for e.g. Interfering Speaker environment,
you can use the jupyter notebooks in folder Code/CHiME2018.

To recreate the whole speech in background environments at your own
samplerate and SNR combinations fully by yourself, you can use the
CHimE 2013 Track1 data for speech and mix it with the
HEAR-DS/RawAudioCuts like described in the paper (we can provide our
code and assistance for this, too).


Data and Format
===============

An acoustic environment holds audio from different recording
situations. Each recording situation has a unique id (rec_id)
containing one or more recording sessions.

From the raw audio of each recording session we manually cut suitable
audio pieces (the cuts) to fill its recording situation with audio
material, each cut has a local unique cut_id.

To generate the actual data set to train machine learning systems, we
performed a further processing step, which produces for every
acoustical environments all 10s audio samples, as further described in sub
section Audio Samples.

[comment]: # (acoustic environment -> recording situation -> recording session -> cut -> snippet)


HEAR-DS Raw Audio Cuts
----------------------

* Folder Structure

For each recording situation one folder holds all cut wav files.

    ├── CocktailParty
    │   ├── rec_001_HDH_1
    │   ├── rec_002_HDH_2_bistro
    │   ├── rec_003_ParishHall_1
    │   └── rec_004_UniCafete_1
    ├── InTraffic
    │   ├── rec_id_551_rush_hour_outskirts_mainroad_1
    │   ├── rec_id_552_busstop_city_mainroad_2
    │   ├── rec_id_553_city_secondaryroad_3
    │   └── rec_id_554_rush_hour_city_mainhub_1
    ├── InVehicle
    │   ├── rec_id_501_berlingo_II_diesel_1
    │   ├── rec_id_502_skoda_fabia_ottoengine_1
    │   └── rec_id_503_vw_t5_diesel_caravelle_1
    ├── QuietIndoors
    │   ├── rec_id_401_quiet_rural_home_1
    │   ├── rec_id_402_quiet_smalltown_home_1
    │   └── rec_id_403_quiet_city_home_1
    ├── ReverberantEnvironment
    │   ├── rec_005_Oldenburg_Church_1
    │   ├── rec_006_Rheine_Railstation_Hall_1
    │   └── rec_007_Staircase_1
    └── WindTurbulence
        ├── rec_id_202_wind_suburban_garden_02
        ├── rec_id_203_wind_before_rural_house_01
        └── rec_id_204_marie_curie_parkplace


 * The naming scheme is:

Due to the manual process of audio cutting (with Ardour5 software), the length of cuts vary.

    rec_id_<REC_ID>_cut_<CUT_I>_<DESCRIPTION>_<TRACKNAME>_<EXPORTFORMAT>.wav

With <REC_ID> being a 3 digit number and <CUT_I> a 2 digit number. The
<DESCRIPTION> could e.g. be "startengine_driveoff" for InVehicle or
"bell" in ReverberantEnvironment. <TRACKNAME> stands for one of the used
hearing aid microphones [Mic_BTE_L_front, Mic_BTE_L_rear,
Mic_BTE_R_front, Mic_BTE_R_rear, Mic_ITC_L, Mic_ITC_R]. <EXPORTFORMAT>
is the name of the used audio-exporter, currently "raw_48kHz32bit".


HEAR-DS - Audio Snippet Samples
-------------------------------

In this processing step the raw audio cuts were further sliced into
10s snippets. This 10s snippets are either used directly as background
sample or are further mixed with random speech, at multiple (-21, -18,
.. 0, .. 18, 21) SNRs, to create audio samples for the speech in background
environments.  The binaural speech source material comes from five
different directions, which we randomly choose from, the start and
end-time of this source speech, and the start time of the
background snippet are also randomized.

This 10s samples finally form the HEAR-DS audio-material for training
of machine learning systems, e.g. as input for the feature-extraction
step of deep neural networks.


* Audio Sample Snippet File Format

The naming scheme for snippets is:

    <ENV_ID>_<REC_ID>_<CUT_ID>_<SNIP_ID>_<TRACKNAME>_<SAMPLERATE>.wav

 - <ENV_ID>: 2 digit id of acoustical environment, where each speech
   in background environment has its own id, separated from the pure
   background environment.
 
 - <REC_ID>: 3 digit id of record situation

 - <CUT_ID>: 2 digit id of cut of the record situation (unique for all
   sessions of that situation)

 - <SNIP_ID>: 3 digit id of the snippet of this cut.

 - <TRACKNAME>: as described above.

 - <SAMPLERATE>: in e.g. [48kHz, 16kHz]
  
For e.g. ReverberantEnvironment, recording situation "Oldenburg
Church", first cut, first snippet the 16kHz Version the snippet
filename is 06_005_00_000_BTE_L_front_16kHz.wav

Folder Structure of HEAR-DS AudioSnippets
-----------------------------------------

In the speech folders the snippet files over the different SNR
subfolders share same filenames. But the content of those audio
snippets is different: For each SNR subfolder a different (random)
speech piece was mixed at **that** SNR with that (same) background
audio snippet.

AudioSnippets-ITC-16kHz
├── CocktailParty
│   ├── All
│   └── Background
├── InterferingSpeakers
│   └── Background
├── InTraffic
│   ├── All
│   │   ├── 0
│   │   ├── 12
│   │   ├── -12
│   │   ├── 15
│   │   ├── -15
│   │   ├── 18
│   │   ├── -18
│   │   ├── 21
│   │   ├── -21
│   │   ├── 3
│   │   ├── -3
│   │   ├── 6
│   │   ├── -6
│   │   ├── 9
│   │   └── -9
│   ├── Background
│   └── Speech
│       ├── 0
│       ├── 12
│       ├── -12
│       ├── 15
│       ├── -15
│       ├── 18
│       ├── -18
│       ├── 21
│       ├── -21
│       ├── 3
│       ├── -3
│       ├── 6
│       ├── -6
│       ├── 9
│       └── -9
├── InVehicle
│   ├── All
│   │   ├── 0
│   │   ├── 12
│   │   ├── -12
│   │   ├── 15
│   │   ├── -15
│   │   ├── 18
│   │   ├── -18
│   │   ├── 21
│   │   ├── -21
│   │   ├── 3
│   │   ├── -3
│   │   ├── 6
│   │   ├── -6
│   │   ├── 9
│   │   └── -9
│   ├── Background
│   └── Speech
│       ├── 0
│       ├── 12
│       ├── -12
│       ├── 15
│       ├── -15
│       ├── 18
│       ├── -18
│       ├── 21
│       ├── -21
│       ├── 3
│       ├── -3
│       ├── 6
│       ├── -6
│       ├── 9
│       └── -9
├── Music
│   ├── All
│   │   ├── 0
│   │   ├── 12
│   │   ├── -12
│   │   ├── 15
│   │   ├── -15
│   │   ├── 18
│   │   ├── -18
│   │   ├── 21
│   │   ├── -21
│   │   ├── 3
│   │   ├── -3
│   │   ├── 6
│   │   ├── -6
│   │   ├── 9
│   │   └── -9
│   ├── Background
│   └── Speech
│       ├── 0
│       ├── 12
│       ├── -12
│       ├── 15
│       ├── -15
│       ├── 18
│       ├── -18
│       ├── 21
│       ├── -21
│       ├── 3
│       ├── -3
│       ├── 6
│       ├── -6
│       ├── 9
│       └── -9
├── QuietIndoors
│   ├── All
│   │   ├── 0
│   │   ├── 12
│   │   ├── -12
│   │   ├── 15
│   │   ├── -15
│   │   ├── 18
│   │   ├── -18
│   │   ├── 21
│   │   ├── -21
│   │   ├── 3
│   │   ├── -3
│   │   ├── 6
│   │   ├── -6
│   │   ├── 9
│   │   └── -9
│   ├── Background
│   └── Speech
│       ├── 0
│       ├── 12
│       ├── -12
│       ├── 15
│       ├── -15
│       ├── 18
│       ├── -18
│       ├── 21
│       ├── -21
│       ├── 3
│       ├── -3
│       ├── 6
│       ├── -6
│       ├── 9
│       └── -9
├── ReverberantEnvironment
│   ├── All
│   │   ├── 0
│   │   ├── 12
│   │   ├── -12
│   │   ├── 15
│   │   ├── -15
│   │   ├── 18
│   │   ├── -18
│   │   ├── 21
│   │   ├── -21
│   │   ├── 3
│   │   ├── -3
│   │   ├── 6
│   │   ├── -6
│   │   ├── 9
│   │   └── -9
│   ├── Background
│   └── Speech
│       ├── 0
│       ├── 12
│       ├── -12
│       ├── 15
│       ├── -15
│       ├── 18
│       ├── -18
│       ├── 21
│       ├── -21
│       ├── 3
│       ├── -3
│       ├── 6
│       ├── -6
│       ├── 9
│       └── -9
└── WindTurbulence
    ├── All
    │   ├── 0
    │   ├── 12
    │   ├── -12
    │   ├── 15
    │   ├── -15
    │   ├── 18
    │   ├── -18
    │   ├── 21
    │   ├── -21
    │   ├── 3
    │   ├── -3
    │   ├── 6
    │   ├── -6
    │   ├── 9
    │   └── -9
    ├── Background
    └── Speech
        ├── 0
        ├── 12
        ├── -12
        ├── 15
        ├── -15
        ├── 18
        ├── -18
        ├── 21
        ├── -21
        ├── 3
        ├── -3
        ├── 6
        ├── -6
        ├── 9
        └── -9


Acknowledgments
===============

This work was supported by the German Ministry of Education and
Science (BMBF), FZK 02K16C202 Audio-PSS.

The authors would like to thank Marei Typlt and the partners in the
AUDIO-PSS project for support in designing the acoustic environments
and Audifon GmbH for providing the hearing aid dummies.