df['sale_yr'] = pd.to_numeric(df.date.str.slice(0, 4))
df['sale_month'] = pd.to_numeric(df.date.str.slice(4, 6))
df['sale_day'] = pd.to_numeric(df.date.str.slice(6, 8))
X = df[['sale_yr','sale_month','sale_day',
'bedrooms','bathrooms','sqft_living','sqft_lot','floors',
'condition','grade','sqft_above','sqft_basement','yr_built',
'zipcode','lat','long','sqft_living15','sqft_lot15']]
y = df['price']
We use the train_test_split from ScikitLearn to split the data into train and test sets. Then, we split the training set into training and validation datasets. To achieve a split of 20%, 20%, 60% test, validation and training sets, we first split the data into 20% test and 80% training, and then split the training data into 25% validation and 75% training sets.
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size = 0.25,
random_state=2018)
1.
In this example, we use mean absolute error to evaluate the goodness of the model at each step, we use the efficient gradient decent algorithm "adam" for training the model, and we use mean absolute error as the metric to report the classification accuracy:
kmodel.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=[metrics.mae])