Gram Matrix

The Gram Matrix arises from a function in a finite-dimensional space; the Gram matrix entries are then the inner products of the essential services of the finite-dimensional subspace. We have to compute the style loss. But we haven't been shown "why the style loss is computed using the Gram matrix." The Gram matrix captures the "distribution of features" of a set of feature maps in a given layer.

Note: We don't think the above question has been answered satisfactorily. For example, let us take a shot explaining it more intuitively. Suppose we have the following feature map. For simplicity, we consider only three feature maps, and two of them are entirely passive. We have a feature map set where the first feature map looks like a nature picture, and in the second feature map, the 1st feature map looks like a dark cloud. Then if we try to calculate the content and style loss manually, we will get these values.

This means that we have not lost style information between the two feature map sets. However, the content is different.

Understanding the style loss

Final loss

It is defined as,

Where α and β are user-defined hyperparameters. Here β has absorbed the M^l normalization factor defined earlier. By controlling α and β, we can control the amount of content and style injected into the generated image. We can also see a beautiful visualization of the different effects of different α and β values in the paper.

Defining the optimizer

Next, we use Adam optimizer to optimize the loss of the network.

Defining the input pipeline

Here we describe the full input pipeline. tf.data provided a very easy to use and intuitive interface to implement the input pipelines. For most of the image manipulation tasks, we use the tf. Image api, still, the ability of tf.image to dynamically sized images is minimal.

For example, if we want to crop and resize images dynamically, it is better to do in the form of the generator, as implemented below.

We have defined two input pipelines; one for style and one for content. The content input pipeline looks for only jpg images that start with a word content_, where the style pipeline looks for models beginning with style_.

	def image_gen_function(data_dir, file_match_str, do_shuffle=True):    
"""
"	The function returns a produced image, and the color channel is like values. 
	This is a generator function that is used by the service of tf.data api.
	
	
""""	# Load the filenames
	files = [f for f in os.listdir(data_dir) if f.startswith(file_match_str)]
	if do_shuffle:
	shuffle(files)
	    
	    mean = np.array([[vgg_mean]])
	
	 # For each file preprocess the image 
	for f in file: 
	img = Image.open(os.path.join(data_dir, f))
	
	width, height = img.size
	
	#Here, we crop the image to a square by cropping on the longer axis
	if width < height:
	left,right = 0, width
	top, bottom = (height-width)/2, ((height-width)/2) + width
	elif width > height:
	top, bottom = 0, height
	left, right = (width - height)/2, ((width-height)/2) + height
	else:
	arr = np.array(img.resize((image_size,image_size))).astype(np.float32)
	yield (arr, mean)
	
	arr = np.array(img.crop((left, top, right, bottom)).resize((image_size,image_size))).astype(np.float32)
	yield (arr, mean)
	
	
	def load_images_iterator(gen_func, zero_mean=False):
	
	"""This function returns a dataset iterator of tf.data API.
	    """
	    image_dataset = tf.data.Dataset.from_generator(
	        gen_func, 
	        output_types=(tf.float32, tf.float32), 
	        output_shapes=(tf.TensorShape(input_shape[1:]), tf.TensorShape([1, 1, 3]))
	    )
	    
	# If true, the mean will be subtracted

Defining the computational graph

We will be representing the full computational graph.

Define iterators which provide inputs
Define input and CNN variable
Define the content, style, and total loss
Define the optimization operation

config = tf.ConfigProto(allow_soft_placement=True)



# 1. Define the input pipeline in this step
part_style_gen_func = partial(image_gen_func, 'data', "style_")
part_content_gen_func = partial(image_gen_func, 'data', "content_")

style_iter = load_images_iterator(part_style_gen_func, zero_mean=True)
content_iter = load_images_iterator(part_content_gen_func, zero_mean=True)

# 2. Defining the inputs and weights
inputs = define_inputs(input_shape)
define_tf_weights()


layer_ids = list(vgg_layers.keys())

## gen_ph is used for initializing the generated image with the pixel value 
 
##trying initializing with white noise
.
## The init_generate gives the initialization operation. 
gen_ph = tf.placeholder(shape=input_shape, dtype=tf.float32)
init_generated = tf.assign(inputs["generated"], gen_ph)

# 3. Loss
# 3.1 Content loss in tf
c_loss = define_content_loss(
inputs=inputs, 
layer_ids=layer_ids, pool_inds=pool_inds, c_weight=alpha
)

# 3.2 Style loss
layer_weights_ph = tf.placeholder(shape=[len(layer_ids)], dtype=tf.float32, name='layer_weights')
s_loss = define_style_loss(
inputs=inputs, 
layer_ids=layer_ids, pool_inds=pool_inds, s_weight=beta, layer_weights=None
) 

Next TopicProcess of Style Transferring

← prev next →