[ Can a Convolutional Neural Network diagnose COVID-19 from lungs CT scans? ]

Unfortunately, we have not got significant improvement, our model still overfits after 10–15 epochs — we can see that train loss is starting to decrease, while validation loss is starting to increase. Another problem is that since our validation loss is low from the start, we can assume that we simply got nice initial parameters and so, our model is not robust (remember that since we don’t have a test set, we want to re-evaluate our model with new split). If we check the model summary, we will see that our model has 4,987,361 parameters — a huge number for such a small dataset. Let’s try to reduce them by adding more convolutional layers with max-pooling (we will also add several dense layers to see whether this improves performance): def create_model(): model = Sequential([ Conv2D(16, 1, padding='same', activation='relu', input_shape=(img_height, img_width, 1)), MaxPooling2D(), Conv2D(32, 3, padding='same', activation='relu'), MaxPooling2D(), Conv2D(64, 5, padding='same', activation='relu'), MaxPooling2D(), Conv2D(64, 5, padding='same', activation='relu'), MaxPooling2D(), Conv2D(64, 5, padding='same', activation='relu'), MaxPooling2D(), Flatten(), Dense(128, activation='relu'), Dropout(0.4), Dense(64, activation='relu'), Dropout(0.5), Dense(8, activation='relu'), Dropout(0.3), Dense(1, activation='sigmoid') ]) model.compile(optimizer=OPTIMIZER, loss='binary_crossentropy', metrics=['accuracy', 'Precision', 'Recall']) return model Now our model has 671,185 parameters, significantly smaller numbers. However, if we try to train our model, we will see next. Our model started to be too “pessimistic” and is predicting COVID for all patients. It appears that we made our model too simple. Let’s reduce the structure to the following: def create_model(): model = Sequential([ Conv2D(16, 1, padding='same', activation='relu', input_shape=(img_height, img_width, 1)), MaxPooling2D(), Conv2D(32, 3, padding='same', activation='relu'), MaxPooling2D(), Conv2D(64, 5, padding='same', activation='relu'), MaxPooling2D(), Conv2D(64, 5, padding='same', activation='relu'), MaxPooling2D(), Flatten(), Dense(128, activation='relu'), Dropout(0.4), Dense(64, activation='relu'), Dropout(0.5), Dense(8, activation='relu'), Dropout(0.3), Dense(1, activation='sigmoid') ]) model.compile(optimizer=OPTIMIZER, loss='binary_crossentropy', metrics=['accuracy', 'Precision', 'Recall']) return model This model has 2,010,513 parameters — several times more than “not complex enough” model, but several times less than “too complex” model. It is therefore computationally cheaper and easier for the model to be trained. Now we start seeing quite good results. During training, the model is going through “predict positive for everyone” stage, but overcomes it and gets back to proper predicting. It still starts to overfit after around 40 epochs (and we see the same picture with every resplit of our data), so, we will let our model train for 40 epochs and evaluate it after.