当前位置：网站首页>Tensorflow Experiment 4 -- Boston house price forecast

Tensorflow Experiment 4 -- Boston house price forecast

2022-04-22 09:41:00 【Alone.】

Boston house price forecast

The Boston house price data set includes 506 Samples , Each sample includes 12 Two characteristic variables and the average house price in the region （ The unit price ） Obviously, it is related to multiple characteristic variables , Not univariate linear regression （ Univariate linear regression ） The problem selects multiple characteristic variables to establish a linear equation , This is multivariable linear regression （ Multiple linear regression ） Problem Boston house price forecast

Data set interpretation 、
Insert picture description here
CRIM: Crime rate per capita in cities and towns
ZN： More than 25000 sq.ft. The proportion of
INDUS: The proportion of Urban Non retail commercial land
CHAS: The boundary is the river 1, otherwise 0
NOX: Nitric oxide concentration
RM: Interpretation of residential average room data set
AGE: 1940 Proportion of self use houses built before
DIS： To Boston 5 The weighted distance between two central regions
RAD: Proximity index of radial highway
TAX : Every time 10000 The full value property tax rate of US dollars
PTRATIO: The proportion of teachers and students in the city
LSTAT: The proportion of the population in the lower ranks
MEDV: The average price of a house , Company ： Thousand dollars

Reading data

import tensorflow.compat.v1 as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
from sklearn.utils import shuffle
from sklearn.preprocessing import scale
print("Tensorflow The version is ：",tf.__version__)

Insert picture description here

adopt Pandas Import data

df = pd.read_csv("E:/wps/boston.csv",header=0)  # The path here is the absolute path where you store the Boston house price file 
print(df.describe())

Insert picture description here
Pandas Reading data
Display the first three data

df.head(3)

Insert picture description here

Display the last three pieces of data

df.tail(3)

Insert picture description here

Data set partitioning

Data preparation

ds = df.values
print(ds.shape)
print(ds)

Insert picture description here
Divide feature data and label data

x_data = ds[:,:12]
y_data = ds[:,12]

print('x_data shape=',x_data.shape)
print('y_data shape=',y_data.shape)

Insert picture description here
Divide the training set 、 Validation set and test set

train_num = 300
valid_num = 100
test_num = len(x_data) - train_num -valid_num

x_train =x_data[:train_num]
y_train =y_data[:train_num]

x_valid = x_data[train_num:train_num+valid_num]
y_valid = y_data[train_num:train_num+valid_num]

x_test = x_data[train_num+valid_num:train_num+valid_num+test_num]
y_test = y_data[train_num+valid_num:train_num+valid_num+test_num]

Convert data type

x_train = tf.cast(scale(x_train),dtype=tf.float32)
x_valid = tf.cast(scale(x_valid),dtype=tf.float32)
x_test = tf.cast(scale(x_test),dtype=tf.float32)

Notice that there is a situation , Here we use a scale() function , If this function is not applicable, the training result will be abnormal , The following picture will appear ,train_loss and valid_loss It's not worth it
Insert picture description here

Build the model

Defining models

The multiple linear regression model is still a simple linear function , Its basic form is still 𝑦=𝑤∗𝑥+𝑏, Just here 𝑤 and 𝑏 No longer a scalar , The shape will be different . According to the model definition , It performs matrix cross multiplication , So what I call here is tf.matmul() function .

def model(x,w,b):
    return tf.matmul(x,w) + b

Create variables to be optimized

W = tf.Variable(tf.random.normal([12,1],mean=0.0,stddev=1.0,dtype=tf.float32))
B = tf.Variable(tf.zeros(1),dtype = tf.float32)
print(W)
print(B)

Insert picture description here

model training

Set super parameters
This column will use the small batch gradient descent algorithm MBGD To optimize

training_epochs = 50
learning_rate = 0.001
batch_size = 10

Set up a batch_size Hyperparameters , Used to adjust the number of samples optimized for small batch training each time

Define the mean square loss function

def loss(x,y,w,b):
    err = model(x,w,b) - y
    squared_err = tf.square(err)
    return tf.reduce_mean(squared_err)

Define the gradient calculation function

def grad(x,y,w,b):
    with tf.GradientTape() as tape:
        loss_ = loss(x,y,w,b)
    return tape.gradient(loss_,[w,b])

Insert picture description here

Choose the optimizer

optimizer = tf.keras.optimizers.SGD(learning_rate)

Use tf.keras.optimizers.SGD() A gradient descent optimizer is declared （Optimizer）, Its learning rate is specified by parameters . The optimizer can help update the model parameters according to the calculated derivation results , Thus minimizing the loss function , The specific use method is to call its apply_gradients() Method .

Iterative training

loss_list_train = []
loss_list_valid = []
total_step = int(train_num/batch_size)

for epoch in range(training_epochs):
    for step in range(total_step):
        xs = x_train[step*batch_size:(step+1)*batch_size,:]
        ys = y_train[step*batch_size:(step+1)*batch_size]
        
        grads = grad(xs,ys,W,B)
        optimizer.apply_gradients(zip(grads,[W,B]))
        
    loss_train = loss(x_train,y_train,W,B).numpy()
    loss_valid = loss(x_valid,y_valid,W,B).numpy()
    loss_list_train.append(loss_train)
    loss_list_valid.append(loss_valid)
    print("epoch={:3d},train_loss{:.4f},valid_loss{:.4f}".format(epoch+1,loss_train,loss_valid)

Insert picture description here

This operation train_loss and valid_loss You have all the values

Visualization loss value

plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.plot(loss_list_train,'blue',label="Train Loss")
plt.plot(loss_list_valid,'red',label='Valid Loss')
plt.legend(loc=1)

Insert picture description here
Note here that the more times you execute , The greater the loss value , The greater the distance between the two lines , For example, below

View the loss of the test set

print("Test_loss:{:.4f}".format(loss(x_test,y_test,W,B).numpy()))

Choose one at random from the test set

test_house_id = np.random.randint(0,test_num)
y = y_test[test_house_id]
y_pred = model(x_test,W,B)[test_house_id]
y_predit=tf.reshape(y_pred,()).numpy()
print("House id",test_house_id,"Actual value",y,"Predicted value",y_predit)