当前位置:网站首页>100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
2022-04-23 16:21:00 【Classmate K】
- Running environment :python3
- author :K Students
- From column :《 Deep learning 100 example 》
- Select columns :《 Novice introduction and deep learning 》
- Recommendation column :《Matplotlib course 》
- 🧿 Excellent column :《Python introduction 100 topic 》
My environment :
- Language environment :Python3.6.5
- compiler :jupyter notebook
- Deep learning environment :TensorFlow2.4.1
- The graphics card (GPU):NVIDIA GeForce RTX 3080
- Data address :【 Portal 】
Our code flow chart is as follows :
List of articles
One 、 preparation
Hello everyone , I am a K Students !
Today, I would like to share with you a practical case of audio classification .
The data set used is UrbanSound8K, The dataset contains data from 10 There are three categories of urban sound 8732 A tag sound excerpt (<=4s):air_conditioner、car_horn、children_playing、dog_bark、drilling、enginge_idling、gun_shot、jackhammer、siren and street_music, Data exist separately fold1-fold10 Wait in ten folders .
In addition to sound excerpts , One is also provided CSV file , It contains metadata about each excerpt .
Methods to introduce
-
Yes 3 A basic method of extracting features from audio files :
a) Using audio files mffcs data
b) Use the spectrum image of audio , Then convert it into data points ( Just like you did with the image ). This can be used Librosa Of mel_spectogram function Easy to finish
c) Combine these two features to build a better model . ( It takes a lot of time to read and extract data ). -
I choose to use the second method .
-
Labels have been converted to classification data for classification .
-
CNN Has been used as the main layer for classifying data
1. Import the required Library
# Basic Libraries
import pandas as pd
import numpy as np
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPool2D, Dropout
from tensorflow.keras.utils import to_categorical
import os,glob,skimage,librosa
import librosa.display
import warnings
warnings.filterwarnings("ignore") # Ignore the warning
2. Analyze data type and format
analysis CSV data
df = pd.read_csv("./41-data/UrbanSound8K.csv")
df.head()
slice_file_name | fsID | start | end | salience | fold | classID | class | |
---|---|---|---|---|---|---|---|---|
0 | 100032-3-0-0.wav | 100032 | 0.0 | 0.317551 | 1 | 5 | 3 | dog_bark |
1 | 100263-2-0-117.wav | 100263 | 58.5 | 62.500000 | 1 | 5 | 2 | children_playing |
2 | 100263-2-0-121.wav | 100263 | 60.5 | 64.500000 | 1 | 5 | 2 | children_playing |
3 | 100263-2-0-126.wav | 100263 | 63.0 | 67.000000 | 1 | 5 | 2 | children_playing |
4 | 100263-2-0-137.wav | 100263 | 68.5 | 72.500000 | 1 | 5 | 2 | children_playing |
Name
-
slice_file_name: Audio file name . The naming format is : [fsID]-[classID]-[occurrenceID]-[sliceID].wav
- [fsID]: Extract extracts from ( fragment ) Recorded Freesound ID
- [classID]: Category ID
- [occurrenceID]: A numeric identifier , Sounds used to distinguish different events in the original recording
- [sliceID]: A numeric identifier , Used to distinguish different slices obtained from the same event
-
fsID: Extract extracts from ( fragment ) Recorded Freesound ID
-
start: original Freesound The start time of the clip in the recording
-
end: original Freesound The end time of the slice in the recording
-
salience: The voice ( subjective ) Significance level . 1 = prospects ,2 = background .
-
fold: altogether 1-10,10 A folder
-
classID: Digital identifier of the sound category :
- 0 = air_conditioner
- 1 = car_horn
- 2 = children_playing
- 3 = dog_bark
- 4 = drilling
- 5 = engine_idling
- 6 = gun_shot
- 7 = jackhammer
- 8 = siren
- 9 = street_music
Use Librosa Analyze random sound samples
a,b = librosa.load() Return value :
- a: Audio signal value , The type is ndarray
- b: Sampling rate
3. Data presentation
import IPython.display as ipd
ipd.Audio('./41-data/fold5/100263-2-0-117.wav')
dataSample, sampling_rate = librosa.load('./41-data/fold5/100032-3-0-0.wav')
plt.figure(figsize=(10, 3))
D = librosa.amplitude_to_db(np.abs(librosa.stft(dataSample)), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()
arr = np.array(df["slice_file_name"])
fold = np.array(df["fold"])
cla = np.array(df["class"])
for i in range(192, 197, 2):
plt.figure(figsize=(8, 2))
path = './41-data/fold' + str(fold[i]) + '/' + arr[i]
data, sampling_rate = librosa.load(path)
D = librosa.amplitude_to_db(np.abs(librosa.stft(data)), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title(cla[i])
Two 、 Feature extraction and data set construction
Let's see how to use it librosa.feature.melspectrogram()
The data extracted by the function shape
arr = librosa.feature.melspectrogram(y=data, sr=sampling_rate)
arr.shape
(128, 173)
1. Data feature extraction
feature = []
label = []
def parser():
# Load the file and extract the features
for i in range(8732):
if i%1000 == 0:
print(" Has been extracted %d Data characteristics "%i)
file_name = './41-data/fold' + str(df["fold"][i]) + '/' + df["slice_file_name"][i]
X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
# Extract the spectrum to form an image array
mels = np.mean(librosa.feature.melspectrogram(y=X, sr=sample_rate).T,axis=0)
feature.append(mels)
label.append(df["classID"][i])
print(" Data feature extraction is complete !")
return [feature, label]
temp = parser()
Has been extracted 0 Data characteristics
Has been extracted 1000 Data characteristics
Has been extracted 2000 Data characteristics
Has been extracted 3000 Data characteristics
Has been extracted 4000 Data characteristics
Has been extracted 5000 Data characteristics
Has been extracted 6000 Data characteristics
Has been extracted 7000 Data characteristics
Has been extracted 8000 Data characteristics
Data feature extraction is complete !
temp_numpy = np.array(temp).transpose()
X_ = temp_numpy[:, 0]
Y_ = temp_numpy[:, 1]
X = np.array([X_[i] for i in range(8732)])
Y = to_categorical(Y_)
print(X.shape, Y.shape)
(8732, 128) (8732, 10)
2. Dataset construction
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state = 1)
X_train = X_train.reshape(6549, 16, 8, 1)
X_test = X_test.reshape(2183, 16, 8, 1)
input_dim = (16, 8, 1)
3、 ... and 、 Build models and train
model = Sequential()
model.add(Conv2D(64, (3, 3), padding = "same", activation = "tanh", input_shape = input_dim))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), padding = "same", activation = "tanh"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(1024, activation = "tanh"))
model.add(Dense(10, activation = "softmax"))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
model.fit(X_train, Y_train, epochs = 90, batch_size = 50, validation_data = (X_test, Y_test))
Epoch 1/90
131/131 [==============================] - 3s 4ms/step - loss: 1.5368 - accuracy: 0.4717 - val_loss: 1.3617 - val_accuracy: 0.5144
Epoch 2/90
131/131 [==============================] - 0s 2ms/step - loss: 1.1502 - accuracy: 0.6091 - val_loss: 1.1119 - val_accuracy: 0.6326
......
131/131 [==============================] - 0s 2ms/step - loss: 0.0481 - accuracy: 0.9835 - val_loss: 0.8535 - val_accuracy: 0.8653
Epoch 89/90
131/131 [==============================] - 0s 2ms/step - loss: 0.0511 - accuracy: 0.9818 - val_loss: 0.7716 - val_accuracy: 0.8694
Epoch 90/90
131/131 [==============================] - 0s 2ms/step - loss: 0.0502 - accuracy: 0.9829 - val_loss: 0.8673 - val_accuracy: 0.8630
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 16, 8, 64) 640
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 4, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 4, 128) 73856
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 2, 128) 0
_________________________________________________________________
dropout (Dropout) (None, 4, 2, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_1 (Dense) (None, 10) 10250
=================================================================
Total params: 1,134,346
Trainable params: 1,134,346
Non-trainable params: 0
_________________________________________________________________
predictions = model.predict(X_test)
score = model.evaluate(X_test, Y_test)
print(score)
69/69 [==============================] - 0s 1ms/step - loss: 0.8673 - accuracy: 0.8630
[0.8672816753387451, 0.8630325198173523]
版权声明
本文为[Classmate K]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231612566942.html
边栏推荐
- The solution of not displaying a whole line when the total value needs to be set to 0 in sail software
- Read the meaning of serial port and various level signals
- OAK-D树莓派点云项目【附详细代码】
- Meaning and usage of volatile
- Postman batch production body information (realize batch modification of data)
- Gartner 發布新興技術研究:深入洞悉元宇宙
- 04 Lua 运算符
- 【现代电子装联期末复习要点】
- Summary according to classification in sail software
- Oak-d raspberry pie cloud project [with detailed code]
猜你喜欢
随机推荐
TIA博图——基本操作
Questions about disaster recovery? Click here
Leetcode-396 rotation function
Report FCRA test question set and answers (11 wrong questions)
Using JSON server to create server requests locally
Groupby use of spark operator
5分钟,把你的Excel变成在线数据库,神奇的魔方网表excel数据库
ESXi封装网卡驱动
Day (3) of picking up matlab
Introduction notes to PHP zero Foundation (13): array related functions
What is cloud migration? The four modes of cloud migration are?
Gartner 發布新興技術研究:深入洞悉元宇宙
深度学习100例 | 第41天-卷积神经网络(CNN):UrbanSound8K音频分类(语音识别)
漫画:什么是IaaS、PaaS、SaaS?
G008-HWY-CC-ESTOR-04 华为 Dorado V6 存储仿真器配置
Day 10 abnormal mechanism
VIM uses vundle to install the code completion plug-in (youcompleteme)
JIRA screenshot
Filter usage of spark operator
Government cloud migration practice: Beiming digital division used hypermotion cloud migration products to implement the cloud migration project for a government unit, and completed the migration of n