Grayscale_rnn

 
  Samples of  grayscale_rnn's  output after approximately 1000 training epochs

Samples of grayscale_rnn's output after approximately 1000 training epochs

'grayscale_rnn' is a neural network I built and trained to generate grayscale art. See the project on Github, and adapt it for your own use.

The decision to restrict the output to the grayscale domain was made simply in order to reduce computational cost. I chose to train 'grayscale_rnn' on sumi-e artworks, i.e. ink wash painting with origins in the Far East (sumi-e specifically being the Japanese term but I can't vouch for the authenticity of my dataset), particularly those depicting natural scenes.

I chose these artworks firstly because I thought they would maintain a significant amount of their quality when reduced to plain grayscale (though in the end I would say some small aspects of their character were definitely lost - my theory is that's to do with the slight tinges of blue present in the ink and yellow on the canvas of the originals).

Another reason for using ink wash paintings as training data is because they have a fairly clear aesthetic - there are definitely certain features you would expect to see that indicate the style of work you're looking at - but within that there's space for a bit of abstraction. That means there's a reasonably solid standard by which to judge the output, but also some room for AI weirdness.

I resized my training images to 60x90 pixels, again in the interest of time and computational cost. The code goes through the dataset and extracts every single pattern of 60 consecutive pixels (over 240,000 in total). Each sequence of 60 pixel colour values forms a single input sample, and the next pixel value (i.e. the 61st) is the corresponding expected output. 

  Samples of  grayscale_rnn's  output after approximately 1000 training epochs

Samples of grayscale_rnn's output after approximately 1000 training epochs

After experimenting with the model a little, I found an LSTM-based model with 2 hidden layers of 512 units each gave the best results in the least time. Similarly a batch size of 450 seemed the best choice out of those I'd tried. The optimiser was RMSProp, and though I initially started with a fixed learning rate, I found after a few hundred epochs it benefitted from being decreased.

I didn't use cross-validation data because in my opinion the output results speak for themselves with regards to the model's performance, but the code on Github accommodates cross validation.

The end results can sometimes be quite spectacular. Equally as often, the prediction model gets stuck in a sequence of white pixels and produces total rubbish, and everything in between. Perhaps a temperature setting would allow it to break from those moments when it seems to be stuck in a loop. Nonetheless, when it does produce its best work, it shows an understanding of features ranging from mountains and coastlines to trees and even people, generally having a good understanding of the relationships these objects should have to each other from a compositional point of view.