Hearing Aid Research Data Set for Acoustic Environment Recognition (HEAR-DS) ============================================================================ The HEAR-DS provides binaural audio material recorded in acoustic environments, which are typical for hearing aid users. Its goal is to support researchers to train and test algorithms in environments relevant for hearing aids, e.g. deep neural networks. * To refer to HEAR-DS in a publication Please cite this paper with DOI 10.1109/ICASSP40776.2020.9053611: [Hearing Aid Research Data Set for Acoustic Environment Recognition](https://ieeexplore.ieee.org/document/9053611) (Andreas Hüwel, Dr. Kamil Adiloğlu and Dr. Jörg-Hendrik Bach), pulished at [ICASSP2020](https://2020.ieeeicassp.org) * Download [HEAR-DS download link](https://download.hoertech.de/hear-ds-data/HEAR-DS/) * Parts of HEAR-DS HEAR-DS consists of this parts, for each its licensing see LICENSE.txt in subfolders: - HEAR-DS/RawAudioCuts - HEAR-DS/AudioSnippets - HEAR-DS/Code Acoustic Environments Overview ============================== | Cocktail party | | | Interfering speakers | | | In traffic | Speech in traffic | | In vehicle | Speech in vehicle | | Music | Speech in music | | Quiet indoors | Speech in quiet indoors | | Reverberant environment | Speech in reverberant environment | | Wind turbulence | Speech in wind turbulence | * AudioSnippets having Speech in Background SNR Variations: The AudioSnippets folder currently contains the 10s audio snippets from the in the channel (ITC) hearing aid, downsampled to 16kHz, with speech randomly mixed with the background, resulting in SNRs of 3dBA steps [-21 .. +21dBA]. As described in the paper, some audio material was used from 3rd party, and thus cannot be provided here. But all the needed data is accessible online. With our provided scripts everyone can re-generate the whole data set by themselves. Audio for interfering speakers comes from [CHiME5](http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME5/data.html) and the material for speech mixing for the speech in background environments comes from CHiME2. For [CHiME2 (2013) and CHiME5 (2018)](http://spandh.dcs.shef.ac.uk/chime_challenge), please contact the organizators to get access to the data sets. Audio for music comes from [GTZan](http://marsyas.info/downloads/datasets.html). To create the needed audio snippets for e.g. Interfering Speaker environment, you can use the jupyter notebooks in folder Code/CHiME2018. To recreate the whole speech in background environments at your own samplerate and SNR combinations fully by yourself, you can use the CHimE 2013 Track1 data for speech and mix it with the HEAR-DS/RawAudioCuts like described in the paper (we can provide our code and assistance for this, too). Data and Format =============== An acoustic environment holds audio from different recording situations. Each recording situation has a unique id (rec_id) containing one or more recording sessions. From the raw audio of each recording session we manually cut suitable audio pieces (the cuts) to fill its recording situation with audio material, each cut has a local unique cut_id. To generate the actual data set to train machine learning systems, we performed a further processing step, which produces for every acoustical environments all 10s audio samples, as further described in sub section Audio Samples. [comment]: # (acoustic environment -> recording situation -> recording session -> cut -> snippet) HEAR-DS Raw Audio Cuts ---------------------- * Folder Structure For each recording situation one folder holds all cut wav files. ├── CocktailParty │   ├── rec_001_HDH_1 │   ├── rec_002_HDH_2_bistro │   ├── rec_003_ParishHall_1 │   └── rec_004_UniCafete_1 ├── InTraffic │   ├── rec_id_551_rush_hour_outskirts_mainroad_1 │   ├── rec_id_552_busstop_city_mainroad_2 │   ├── rec_id_553_city_secondaryroad_3 │   └── rec_id_554_rush_hour_city_mainhub_1 ├── InVehicle │   ├── rec_id_501_berlingo_II_diesel_1 │   ├── rec_id_502_skoda_fabia_ottoengine_1 │   └── rec_id_503_vw_t5_diesel_caravelle_1 ├── QuietIndoors │   ├── rec_id_401_quiet_rural_home_1 │   ├── rec_id_402_quiet_smalltown_home_1 │   └── rec_id_403_quiet_city_home_1 ├── ReverberantEnvironment │   ├── rec_005_Oldenburg_Church_1 │   ├── rec_006_Rheine_Railstation_Hall_1 │   └── rec_007_Staircase_1 └── WindTurbulence ├── rec_id_202_wind_suburban_garden_02 ├── rec_id_203_wind_before_rural_house_01 └── rec_id_204_marie_curie_parkplace * The naming scheme is: Due to the manual process of audio cutting (with Ardour5 software), the length of cuts vary. rec_id__cut____.wav With being a 3 digit number and a 2 digit number. The could e.g. be "startengine_driveoff" for InVehicle or "bell" in ReverberantEnvironment. stands for one of the used hearing aid microphones [Mic_BTE_L_front, Mic_BTE_L_rear, Mic_BTE_R_front, Mic_BTE_R_rear, Mic_ITC_L, Mic_ITC_R]. is the name of the used audio-exporter, currently "raw_48kHz32bit". HEAR-DS - Audio Snippet Samples ------------------------------- In this processing step the raw audio cuts were further sliced into 10s snippets. This 10s snippets are either used directly as background sample or are further mixed with random speech, at multiple (-21, -18, .. 0, .. 18, 21) SNRs, to create audio samples for the speech in background environments. The binaural speech source material comes from five different directions, which we randomly choose from, the start and end-time of this source speech, and the start time of the background snippet are also randomized. This 10s samples finally form the HEAR-DS audio-material for training of machine learning systems, e.g. as input for the feature-extraction step of deep neural networks. * Audio Sample Snippet File Format The naming scheme for snippets is: _____.wav - : 2 digit id of acoustical environment, where each speech in background environment has its own id, separated from the pure background environment. - : 3 digit id of record situation - : 2 digit id of cut of the record situation (unique for all sessions of that situation) - : 3 digit id of the snippet of this cut. - : as described above. - : in e.g. [48kHz, 16kHz] For e.g. ReverberantEnvironment, recording situation "Oldenburg Church", first cut, first snippet the 16kHz Version the snippet filename is 06_005_00_000_BTE_L_front_16kHz.wav Folder Structure of HEAR-DS AudioSnippets ----------------------------------------- In the speech folders the snippet files over the different SNR subfolders share same filenames. But the content of those audio snippets is different: For each SNR subfolder a different (random) speech piece was mixed at **that** SNR with that (same) background audio snippet. AudioSnippets-ITC-16kHz ├── CocktailParty │   ├── All │   └── Background ├── InterferingSpeakers │   └── Background ├── InTraffic │   ├── All │   │   ├── 0 │   │   ├── 12 │   │   ├── -12 │   │   ├── 15 │   │   ├── -15 │   │   ├── 18 │   │   ├── -18 │   │   ├── 21 │   │   ├── -21 │   │   ├── 3 │   │   ├── -3 │   │   ├── 6 │   │   ├── -6 │   │   ├── 9 │   │   └── -9 │   ├── Background │   └── Speech │   ├── 0 │   ├── 12 │   ├── -12 │   ├── 15 │   ├── -15 │   ├── 18 │   ├── -18 │   ├── 21 │   ├── -21 │   ├── 3 │   ├── -3 │   ├── 6 │   ├── -6 │   ├── 9 │   └── -9 ├── InVehicle │   ├── All │   │   ├── 0 │   │   ├── 12 │   │   ├── -12 │   │   ├── 15 │   │   ├── -15 │   │   ├── 18 │   │   ├── -18 │   │   ├── 21 │   │   ├── -21 │   │   ├── 3 │   │   ├── -3 │   │   ├── 6 │   │   ├── -6 │   │   ├── 9 │   │   └── -9 │   ├── Background │   └── Speech │   ├── 0 │   ├── 12 │   ├── -12 │   ├── 15 │   ├── -15 │   ├── 18 │   ├── -18 │   ├── 21 │   ├── -21 │   ├── 3 │   ├── -3 │   ├── 6 │   ├── -6 │   ├── 9 │   └── -9 ├── Music │   ├── All │   │   ├── 0 │   │   ├── 12 │   │   ├── -12 │   │   ├── 15 │   │   ├── -15 │   │   ├── 18 │   │   ├── -18 │   │   ├── 21 │   │   ├── -21 │   │   ├── 3 │   │   ├── -3 │   │   ├── 6 │   │   ├── -6 │   │   ├── 9 │   │   └── -9 │   ├── Background │   └── Speech │   ├── 0 │   ├── 12 │   ├── -12 │   ├── 15 │   ├── -15 │   ├── 18 │   ├── -18 │   ├── 21 │   ├── -21 │   ├── 3 │   ├── -3 │   ├── 6 │   ├── -6 │   ├── 9 │   └── -9 ├── QuietIndoors │   ├── All │   │   ├── 0 │   │   ├── 12 │   │   ├── -12 │   │   ├── 15 │   │   ├── -15 │   │   ├── 18 │   │   ├── -18 │   │   ├── 21 │   │   ├── -21 │   │   ├── 3 │   │   ├── -3 │   │   ├── 6 │   │   ├── -6 │   │   ├── 9 │   │   └── -9 │   ├── Background │   └── Speech │   ├── 0 │   ├── 12 │   ├── -12 │   ├── 15 │   ├── -15 │   ├── 18 │   ├── -18 │   ├── 21 │   ├── -21 │   ├── 3 │   ├── -3 │   ├── 6 │   ├── -6 │   ├── 9 │   └── -9 ├── ReverberantEnvironment │   ├── All │   │   ├── 0 │   │   ├── 12 │   │   ├── -12 │   │   ├── 15 │   │   ├── -15 │   │   ├── 18 │   │   ├── -18 │   │   ├── 21 │   │   ├── -21 │   │   ├── 3 │   │   ├── -3 │   │   ├── 6 │   │   ├── -6 │   │   ├── 9 │   │   └── -9 │   ├── Background │   └── Speech │   ├── 0 │   ├── 12 │   ├── -12 │   ├── 15 │   ├── -15 │   ├── 18 │   ├── -18 │   ├── 21 │   ├── -21 │   ├── 3 │   ├── -3 │   ├── 6 │   ├── -6 │   ├── 9 │   └── -9 └── WindTurbulence ├── All │   ├── 0 │   ├── 12 │   ├── -12 │   ├── 15 │   ├── -15 │   ├── 18 │   ├── -18 │   ├── 21 │   ├── -21 │   ├── 3 │   ├── -3 │   ├── 6 │   ├── -6 │   ├── 9 │   └── -9 ├── Background └── Speech ├── 0 ├── 12 ├── -12 ├── 15 ├── -15 ├── 18 ├── -18 ├── 21 ├── -21 ├── 3 ├── -3 ├── 6 ├── -6 ├── 9 └── -9 Acknowledgments =============== This work was supported by the German Ministry of Education and Science (BMBF), FZK 02K16C202 Audio-PSS. The authors would like to thank Marei Typlt and the partners in the AUDIO-PSS project for support in designing the acoustic environments and Audifon GmbH for providing the hearing aid dummies.