Tensorflow - part 4: Graph in Tensorflow

Before starting, you should read this post Eager execution vs Graph execution.

They are the two types of execution in Tensorflow. Eager execution is easier to use, but Graph execution is faster. Moreover, the lastest versions of Tensorflow have already offered us a way to implement a model in Eager mode and execute it in Graph mode, so we can efficiently get the best of both worlds.

First, let's import necessary packages:

import tensorflow as tf
import timeit
from datetime import datetime

Intuitive example:

The code below is for you to have an intuition about how graph execution runs in Tensorflow. All the operations inside a_python_function are in eager execution mode. For this function to execute in graph mode, Tensorflow has provided a simple way to do this. We just need to wrap tf.function around the a_python_function.

def a_python_function(tensor_a, tensor_x, tensor_b): # Eager execution
  tensor_x = tf.matmul(tensor_a, tensor_x)
  tensor_x = tensor_x + tensor_b
  return tensor_x

a_tensorflow_function = tf.function(a_python_function) # Graph execution

# Inputs


tensor_input_a = tf.constant([[1.0, 2.0, 3.0]])
tensor_input_x = tf.constant([[4.0], [5.0], [6.0]])
tensor_input_b = tf.constant(7.0)

python_value = a_python_function(tensor_input_a, tensor_input_x, tensor_input_b)
tensorflow_value = a_tensorflow_function(tensor_input_a, tensor_input_x, tensor_input_b)
print(python_value)
print(tensorflow_value)
print(python_value.numpy())
print(tensorflow_value.numpy())

Output

tf.Tensor([[39.]], shape=(1, 1), dtype=float32)
tf.Tensor([[39.]], shape=(1, 1), dtype=float32)
[[39.]]
[[39.]]

According to the Tensorflow guide, tf.function returns a Function encapsulating several tf.Graphs behind one API. From this, we can have the graph advantage of speed and deployability.

tf.function as a decorator

Another way to utilize tf.function is to use it as a decorator of a function that you want to execute in graph mode. tf.function takes effect on that function and all the sub functions inside the function's scope.

@tf.function
def a_python_function_2(tensor_b, tensor_c, tensor_d, tensor_e, tensor_f, tensor_g):
  tensor_a = tf.concat([tensor_c, tensor_d], axis=1)
  tensor_x = tf.stack([tensor_e, tensor_f, tensor_g])

  return a_python_function(tensor_a, tensor_x, tensor_b)


tensor_input_b = tf.constant(7.0)
tensor_input_c = tf.constant([[1.0]])
tensor_input_d = tf.constant([[2.0, 3.0]])
tensor_input_e = tf.constant([4.0])
tensor_input_f = tf.constant([5.0])
tensor_input_g = tf.constant([6.0])

tensorflow_value = a_python_function_2(tensor_input_b, tensor_input_c, tensor_input_d, tensor_input_e, tensor_input_f, tensor_input_g)
print(tensorflow_value)

Output

tf.Tensor([[39.]], shape=(1, 1), dtype=float32)

Converting Python functions to graphs

The elements in a Python function may be the property of Tensorflow operations or Python operations (if-else, return, break, continue, ...). For the case of Tensorflow operations, they are effortlessly transferred to graph by tf.Graph.

To become a member of graph, however, a Python element need to be passed to AutoGraph (tf.autograph) which is internally executed by tf.function.

def leaky_relu(z, alpha):
  z = tf.cast(z, dtype=tf.float32)
  # tensor_zero = tf.zeros(tf.shape(z), dtype=tf.float32)
  if tf.greater(z, 0):
    return z
  else:
    return tf.math.multiply(alpha, z)

tensorflow_leaky_relu = tf.function(leaky_relu)
alpha = tf.constant(0.6)
z1 = tf.constant(2, dtype=tf.float32)
z2 = tf.constant(-2, dtype=tf.float32)
print("Case 1 - z > 0: ", tensorflow_leaky_relu(z1, alpha))
print("Case 2 - z < 0: ", tensorflow_leaky_relu(z2, alpha))

Output

Case 1 - z > 0:  tf.Tensor(2.0, shape=(), dtype=float32)
Case 2 - z < 0:  tf.Tensor(-1.2, shape=(), dtype=float32)

Viewing the the AutoGraph form of leaky_relu

print(tf.autograph.to_code(leaky_relu))

Look at the if_body() which represents the if clause avoce and else_body() which represents the else clause above.

Output

def tf__leaky_relu(z, alpha):
    with ag__.FunctionScope('leaky_relu', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()
        z = ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(z),), dict(dtype=ag__.ld(tf).float32), fscope)

        def get_state():
            return (do_return, retval_)

        def set_state(vars_):
            nonlocal retval_, do_return
            (do_return, retval_) = vars_

        def if_body():
            nonlocal retval_, do_return
            try:
                do_return = True
                retval_ = ag__.ld(z)
            except:
                do_return = False
                raise

        def else_body():
            nonlocal retval_, do_return
            try:
                do_return = True
                retval_ = ag__.converted_call(ag__.ld(tf).math.multiply, (ag__.ld(alpha), ag__.ld(z)), None, fscope)
            except:
                do_return = False
                raise
        ag__.if_stmt(ag__.converted_call(ag__.ld(tf).greater, (ag__.ld(z), 0), None, fscope), if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)
        return fscope.ret(retval_, do_return)

To get the exact graph

print(tensorflow_leaky_relu.get_concrete_function(tf.constant(-2, dtype=tf.float32), tf.constant(0.6)).graph.as_graph_def())

You can see there are many types of nodes in the output.

Output

node {
  name: "z"
  op: "Placeholder"
  attr {
    key: "_user_specified_name"
    value {
      s: "z"
    }
  }
  attr {
    key: "dtype"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "shape"
    value {
      shape {
      }
    }
  }
}
node {
  name: "alpha"
  op: "Placeholder"
  attr {
    key: "_user_specified_name"
    value {
      s: "alpha"
    }
  }
  attr {
    key: "dtype"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "shape"
    value {
      shape {
      }
    }
  }
}
node {
  name: "Greater/y"
  op: "Const"
  attr {
    key: "dtype"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "value"
    value {
      tensor {
        dtype: DT_FLOAT
        tensor_shape {
        }
        float_val: 0.0
      }
    }
  }
}
...

versions {
  producer: 716
  min_consumer: 12
}

According to the Tensorflow guide, tf.function will work for most of the cases except some cautions.

You can find the help from Better performance with tf.function and AutoGraph reference

Polymorphism: one Function, many graphs

Recall the tf.function that first converts a Python function to Function for graph execution. This Function can then be seen as a tf.Graph creator. With each set of new function arguments (new in dtype or new in shape), Function will also create a new tf.Graph for that set. => Because of this, a Function is said to be polymorphic.

Notes:

  • Only when there is a new type of argument or new shape does Function make a new graph.
  • This dtype and shape is called as the "signature" of inputs
  • The corresponding tf.Graph of each signature is wrapped around by a ConcreteFunction
print(tensorflow_leaky_relu(tf.constant(3), tf.constant(0.6)))
print(tensorflow_leaky_relu(tf.constant(3.3), tf.constant(0.6)))

Output

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(3.3, shape=(), dtype=float32)

To check the versions of tensorflow_leaky_relu

print(tensorflow_leaky_relu.pretty_printed_concrete_signatures())

Output

leaky_relu(z, alpha)
  Args:
    z: float32 Tensor, shape=()
    alpha: float32 Tensor, shape=()
  Returns:
    float32 Tensor, shape=()

leaky_relu(z, alpha)
  Args:
    z: int32 Tensor, shape=()
    alpha: float32 Tensor, shape=()
  Returns:
    float32 Tensor, shape=()

You can see that there are 2 versions of leaky relu: 1) the type of z is float32, 2) the type of z is int32.

Now, let's try when z is a multi-value tensor.

print(tensorflow_leaky_relu(tf.constant([1, 2]), tf.constant(0.6)))

There will be a problem of ValueError:

The condition of if statement expected to be tf.bool scalar, got Tensor("Greater:0", shape=(2,), dtype=bool); to check for None, use is not None.

Output

ValueError

The function cannot be used for multi-value tensor. Let's modify it.

Defining a leaky_relu that can receive multi-value tensor as input

We know from above that the leaky_relu cannot receive multi-value tensor as input. Let's define a function multi_leaky_relu that uses tf.map_fn to deal with this problem.

A basic map_fn receives 2 important argument f.map_fn(fn, elems). The functionality of tf.map_fn is to apply the function fn to each of the element in elems . If the fn requires more than one argument to operate, elems has to be a tuple of multiple arguments. Each argument is a multi-value tensor. Then these values of arguments are processed in an element-wise way

def multi_leaky_relu(z, alpha):
  elems = (z, alpha)
  result = tf.map_fn(lambda x:leaky_relu(x[0], x[1]), elems, dtype=tf.float32) # A basic map_fn receives 2 important argument tf.map_fn(fn, elems). 
                                                                               # The functionality of it is to apply the function ```fn``` to each of the element in ```elems```
                                                                               # If the ```fn``` requires more than one argument to operate, ```elems``` has to be a tuple of multiple arguments. 
                                                                               # Each argument is a multi-value tensor.
                                                                               # Then these values of arguments are processed in an element-wise way
  # Check these 2 links for using ```tf.map_fn``` when the function used to map has multiple arguments:
  # https://stackoverflow.com/questions/42892347/can-i-apply-tf-map-fn-to-multiple-inputs-outputs
  # https://stackoverflow.com/questions/37086098/does-tensorflow-map-fn-support-taking-more-than-one-tensor
  return result

tensorflow_multi_leaky_relu = tf.function(multi_leaky_relu)
print(tensorflow_multi_leaky_relu(tf.constant([3]), tf.constant([0.6]))) # Cannot be a scalar like above, must be 1+ dimensional Tensors
print(tensorflow_multi_leaky_relu(tf.constant([3.3]), tf.constant([0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([1, 2]), tf.constant([0.6, 0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([-1, -2]), tf.constant([0.6, 0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([-1.0, -2.0]), tf.constant([0.6, 0.6])))

Output

tf.Tensor([3.], shape=(1,), dtype=float32)
tf.Tensor([3.3], shape=(1,), dtype=float32)
tf.Tensor([1. 2.], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)

Check the versions of tensorflow_multi_leaky_relu

print(tensorflow_multi_leaky_relu.pretty_printed_concrete_signatures())

Output

multi_leaky_relu(z, alpha)
  Args:
    z: int32 Tensor, shape=(1,)
    alpha: float32 Tensor, shape=(1,)
  Returns:
    float32 Tensor, shape=(1,)

multi_leaky_relu(z, alpha)
  Args:
    z: float32 Tensor, shape=(1,)
    alpha: float32 Tensor, shape=(1,)
  Returns:
    float32 Tensor, shape=(1,)

multi_leaky_relu(z, alpha)
  Args:
    z: int32 Tensor, shape=(2,)
    alpha: float32 Tensor, shape=(2,)
  Returns:
    float32 Tensor, shape=(2,)

multi_leaky_relu(z, alpha)
  Args:
    z: float32 Tensor, shape=(2,)
    alpha: float32 Tensor, shape=(2,)
  Returns:
    float32 Tensor, shape=(2,)

You should notice the arguments and returns of each version of multi_leaky_relu. They are different in the type and shape of each tensor.

Optimally defining multi_leaky_relu

More optimally, we can reduce the work of repeating the same value in the alpha argument by using tf.tile for alpha in the function multi_leaky_relu. tf.tile helps to create an alpha tensor which has the same shape of z.

def multi_leaky_relu(z, alpha):
  alpha = tf.tile(alpha, tf.shape(z)) # tf.tile create an alpha tensor which has the same shape of z
  elems = (z, alpha)
  result = tf.map_fn(lambda x:leaky_relu(x[0], x[1]), elems, dtype=tf.float32)
  return result

tensorflow_multi_leaky_relu = tf.function(multi_leaky_relu)

print(tensorflow_multi_leaky_relu(tf.constant([1, 2]), tf.constant([0.6]))) # Now, we only give the value one time to the ```alpha``` argument
print(tensorflow_multi_leaky_relu(tf.constant([-1, -2]), tf.constant([0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([-1.0, -2.0]), tf.constant([0.6])))

Output

tf.Tensor([1. 2.], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)

Check the versions of tensorflow_multi_leaky_relu

print(tensorflow_multi_leaky_relu.pretty_printed_concrete_signatures())

Output

multi_leaky_relu(z, alpha)
  Args:
    z: int32 Tensor, shape=(2,)
    alpha: float32 Tensor, shape=(1,)
  Returns:
    float32 Tensor, shape=(2,)

multi_leaky_relu(z, alpha)
  Args:
    z: float32 Tensor, shape=(2,)
    alpha: float32 Tensor, shape=(1,)
  Returns:
    float32 Tensor, shape=(2,)

Understanding more about Eager execution and Graph execution

To understand more clearly about Eager execution and Graph execution, let's carry out this experiment. The function below is in graph mode, look at the decorator @tf.function

@tf.function
def add_and_sum(tensor_a, tensor_b):
  print("Calculating sum")
  tensor_add = tf.add(tensor_a, tensor_b)
  return tf.reduce_sum(tensor_add)

tensor_a = tf.constant([1, 2, 3])
tensor_b = tf.constant([4, 5, 6])

sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)

Output

Calculating sum

add_and_sum is called 4 times, but there is only 1 line of "Calculating sum".

By default, Function uses Graph execution. But we can actually change to make it use Eager execution by setting tf.config.run_functions_eagerly to True.

tf.config.run_functions_eagerly(True)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
tf.config.run_functions_eagerly(False) # Remember to set back to False

Now, there are exactly 4 printings of "Calculating sum" Output

Calculating sum
Calculating sum
Calculating sum
Calculating sum

Explanation:

  • In the first case of Graph execution, Function needs to initially run the Python code to create a graph. The line of "Calculating sum" is executed in this stage. In particular, this "tracing" process creates graph by choosing which operation can be integrated into the graph. The print, which is a Python function, is therefore not in the graph.
  • In the latter case of Eager execution, all the 4 lines eagerly execute in the Python context, so the 4 printings execute normally.

For printing in graph mode, we have to use tf.print

@tf.function
def add_and_sum(tensor_a, tensor_b):
  tf.print("Calculating sum") # Use tf.print
  tensor_add = tf.add(tensor_a, tensor_b)
  return tf.reduce_sum(tensor_add)

tensor_a = tf.constant([1, 2, 3])
tensor_b = tf.constant([4, 5, 6])

sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)

Output

Calculating sum
Calculating sum
Calculating sum
Calculating sum

Now, there are 4 lines of printing.

To compare the speed of eager execution and graph execution, use timeit

def add_and_sum(tensor_a, tensor_b):
  #tf.print("Calculating sum") # Use tf.print
  tensor_add = tf.add(tensor_a, tensor_b)
  return tf.reduce_sum(tensor_add)

import timeit
#tf.config.run_functions_eagerly(True)
print("[+] Eager execution: ", timeit.timeit(lambda: add_and_sum(tensor_a, tensor_b), number=1000))
#tf.config.run_functions_eagerly(False)
add_and_sum_as_graph = tf.function(add_and_sum)
print("[+] Graph execution: ", timeit.timeit(lambda: add_and_sum_as_graph(tensor_a, tensor_b), number=1000))

Output

[+] Eager execution:  0.05073176199999807
[+] Graph execution:  0.3550005580000004

Is that because running by CPU makes the graph execution slow? No.

Graph execution is known as having faster speed than eager execution. To achieve this, it requires a high time cost of creating graph at the beginning.

There are 1000 executions in each mode above. Each execution needs to create one graph. And because the time to create these graphs is even longer that the time of operations, so we gain no benefits from using graph execution.

No, it's not because of CPU. If the operations account a greater portion when compared to the process of creating graph, only then we can perceive the real advantage of using Graph execution.

x = tf.random.uniform(shape=[10, 10], minval=-1, maxval=2, dtype=tf.dtypes.int32)

def power(x, y):
  result = tf.eye(10, dtype=tf.dtypes.int32)
  for _ in range(y):
    result = tf.matmul(x, result)
  return result

print("Eager execution:", timeit.timeit(lambda: power(x, 100), number=1000))

power_as_graph = tf.function(power)
print("Graph execution:", timeit.timeit(lambda: power_as_graph(x, 100), number=1000))

Output

Eager execution: 2.267755313000009
Graph execution: 0.6547386279999898

You should try with power to 10, 100, 1000, 10000 to gradually see the increase in performance.

The end