Tensorflow - part 4: Graph in Tensorflow
Before starting, you should read this post Eager execution vs Graph execution.
They are the two types of execution in Tensorflow. Eager execution is easier to use, but Graph execution is faster. Moreover, the lastest versions of Tensorflow have already offered us a way to implement a model in Eager mode and execute it in Graph mode, so we can efficiently get the best of both worlds.
First, let's import necessary packages:
import tensorflow as tf
import timeit
from datetime import datetime
Intuitive example:
The code below is for you to have an intuition about how graph execution runs in Tensorflow. All the operations inside a_python_function
are in eager execution mode. For this function to execute in graph mode, Tensorflow has provided a simple way to do this. We just need to wrap tf.function
around the a_python_function
.
def a_python_function(tensor_a, tensor_x, tensor_b): # Eager execution
tensor_x = tf.matmul(tensor_a, tensor_x)
tensor_x = tensor_x + tensor_b
return tensor_x
a_tensorflow_function = tf.function(a_python_function) # Graph execution
# Inputs
tensor_input_a = tf.constant([[1.0, 2.0, 3.0]])
tensor_input_x = tf.constant([[4.0], [5.0], [6.0]])
tensor_input_b = tf.constant(7.0)
python_value = a_python_function(tensor_input_a, tensor_input_x, tensor_input_b)
tensorflow_value = a_tensorflow_function(tensor_input_a, tensor_input_x, tensor_input_b)
print(python_value)
print(tensorflow_value)
print(python_value.numpy())
print(tensorflow_value.numpy())
Output
tf.Tensor([[39.]], shape=(1, 1), dtype=float32)
tf.Tensor([[39.]], shape=(1, 1), dtype=float32)
[[39.]]
[[39.]]
According to the Tensorflow guide, tf.function
returns a Function
encapsulating several tf.Graph
s behind one API. From this, we can have the graph advantage of speed and deployability.
tf.function
as a decorator
Another way to utilize tf.function
is to use it as a decorator of a function that you want to execute in graph mode. tf.function
takes effect on that function and all the sub functions inside the function's scope.
@tf.function
def a_python_function_2(tensor_b, tensor_c, tensor_d, tensor_e, tensor_f, tensor_g):
tensor_a = tf.concat([tensor_c, tensor_d], axis=1)
tensor_x = tf.stack([tensor_e, tensor_f, tensor_g])
return a_python_function(tensor_a, tensor_x, tensor_b)
tensor_input_b = tf.constant(7.0)
tensor_input_c = tf.constant([[1.0]])
tensor_input_d = tf.constant([[2.0, 3.0]])
tensor_input_e = tf.constant([4.0])
tensor_input_f = tf.constant([5.0])
tensor_input_g = tf.constant([6.0])
tensorflow_value = a_python_function_2(tensor_input_b, tensor_input_c, tensor_input_d, tensor_input_e, tensor_input_f, tensor_input_g)
print(tensorflow_value)
Output
tf.Tensor([[39.]], shape=(1, 1), dtype=float32)
Converting Python functions to graphs
The elements in a Python function may be the property of Tensorflow operations or Python operations (if-else
, return
, break
, continue
, ...). For the case of Tensorflow operations, they are effortlessly transferred to graph by tf.Graph
.
To become a member of graph, however, a Python element need to be passed to AutoGraph
(tf.autograph
) which is internally executed by tf.function
.
def leaky_relu(z, alpha):
z = tf.cast(z, dtype=tf.float32)
# tensor_zero = tf.zeros(tf.shape(z), dtype=tf.float32)
if tf.greater(z, 0):
return z
else:
return tf.math.multiply(alpha, z)
tensorflow_leaky_relu = tf.function(leaky_relu)
alpha = tf.constant(0.6)
z1 = tf.constant(2, dtype=tf.float32)
z2 = tf.constant(-2, dtype=tf.float32)
print("Case 1 - z > 0: ", tensorflow_leaky_relu(z1, alpha))
print("Case 2 - z < 0: ", tensorflow_leaky_relu(z2, alpha))
Output
Case 1 - z > 0: tf.Tensor(2.0, shape=(), dtype=float32)
Case 2 - z < 0: tf.Tensor(-1.2, shape=(), dtype=float32)
AutoGraph
form of leaky_relu
Viewing the the print(tf.autograph.to_code(leaky_relu))
Look at the if_body()
which represents the if
clause avoce and else_body()
which represents the else
clause above.
Output
def tf__leaky_relu(z, alpha):
with ag__.FunctionScope('leaky_relu', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
do_return = False
retval_ = ag__.UndefinedReturnValue()
z = ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(z),), dict(dtype=ag__.ld(tf).float32), fscope)
def get_state():
return (do_return, retval_)
def set_state(vars_):
nonlocal retval_, do_return
(do_return, retval_) = vars_
def if_body():
nonlocal retval_, do_return
try:
do_return = True
retval_ = ag__.ld(z)
except:
do_return = False
raise
def else_body():
nonlocal retval_, do_return
try:
do_return = True
retval_ = ag__.converted_call(ag__.ld(tf).math.multiply, (ag__.ld(alpha), ag__.ld(z)), None, fscope)
except:
do_return = False
raise
ag__.if_stmt(ag__.converted_call(ag__.ld(tf).greater, (ag__.ld(z), 0), None, fscope), if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)
return fscope.ret(retval_, do_return)
To get the exact graph
print(tensorflow_leaky_relu.get_concrete_function(tf.constant(-2, dtype=tf.float32), tf.constant(0.6)).graph.as_graph_def())
You can see there are many types of nodes in the output.
Output
node {
name: "z"
op: "Placeholder"
attr {
key: "_user_specified_name"
value {
s: "z"
}
}
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
}
}
}
}
node {
name: "alpha"
op: "Placeholder"
attr {
key: "_user_specified_name"
value {
s: "alpha"
}
}
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
}
}
}
}
node {
name: "Greater/y"
op: "Const"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "value"
value {
tensor {
dtype: DT_FLOAT
tensor_shape {
}
float_val: 0.0
}
}
}
}
...
versions {
producer: 716
min_consumer: 12
}
According to the Tensorflow guide, tf.function
will work for most of the cases except some cautions.
You can find the help from Better performance with tf.function and AutoGraph reference
Polymorphism: one Function, many graphs
Recall the tf.function
that first converts a Python function to Function
for graph execution.
This Function
can then be seen as a tf.Graph
creator. With each set of new function arguments (new in dtype
or new in shape
), Function
will also create a new tf.Graph
for that set.
=> Because of this, a Function
is said to be polymorphic.
Notes:
- Only when there is a new type of argument or new shape does
Function
make a new graph. - This
dtype
andshape
is called as the "signature" of inputs - The corresponding
tf.Graph
of each signature is wrapped around by aConcreteFunction
print(tensorflow_leaky_relu(tf.constant(3), tf.constant(0.6)))
print(tensorflow_leaky_relu(tf.constant(3.3), tf.constant(0.6)))
Output
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(3.3, shape=(), dtype=float32)
To check the versions of tensorflow_leaky_relu
print(tensorflow_leaky_relu.pretty_printed_concrete_signatures())
Output
leaky_relu(z, alpha)
Args:
z: float32 Tensor, shape=()
alpha: float32 Tensor, shape=()
Returns:
float32 Tensor, shape=()
leaky_relu(z, alpha)
Args:
z: int32 Tensor, shape=()
alpha: float32 Tensor, shape=()
Returns:
float32 Tensor, shape=()
You can see that there are 2 versions of leaky relu: 1) the type of z is float32, 2) the type of z is int32.
Now, let's try when z
is a multi-value tensor.
print(tensorflow_leaky_relu(tf.constant([1, 2]), tf.constant(0.6)))
There will be a problem of ValueError:
The condition of if statement expected to be tf.bool
scalar, got Tensor("Greater:0", shape=(2,), dtype=bool); to check for None, use is not None
.
Output
ValueError
The function cannot be used for multi-value tensor. Let's modify it.
leaky_relu
that can receive multi-value tensor as input
Defining a We know from above that the leaky_relu
cannot receive multi-value tensor as input. Let's define a function multi_leaky_relu
that uses tf.map_fn
to deal with this problem.
A basic map_fn receives 2 important argument f.map_fn(fn, elems)
.
The functionality of tf.map_fn
is to apply the function fn
to each of the element in elems
. If the fn
requires more than one argument to operate, elems
has to be a tuple of multiple arguments.
Each argument is a multi-value tensor.
Then these values of arguments are processed in an element-wise way
def multi_leaky_relu(z, alpha):
elems = (z, alpha)
result = tf.map_fn(lambda x:leaky_relu(x[0], x[1]), elems, dtype=tf.float32) # A basic map_fn receives 2 important argument tf.map_fn(fn, elems).
# The functionality of it is to apply the function ```fn``` to each of the element in ```elems```
# If the ```fn``` requires more than one argument to operate, ```elems``` has to be a tuple of multiple arguments.
# Each argument is a multi-value tensor.
# Then these values of arguments are processed in an element-wise way
# Check these 2 links for using ```tf.map_fn``` when the function used to map has multiple arguments:
# https://stackoverflow.com/questions/42892347/can-i-apply-tf-map-fn-to-multiple-inputs-outputs
# https://stackoverflow.com/questions/37086098/does-tensorflow-map-fn-support-taking-more-than-one-tensor
return result
tensorflow_multi_leaky_relu = tf.function(multi_leaky_relu)
print(tensorflow_multi_leaky_relu(tf.constant([3]), tf.constant([0.6]))) # Cannot be a scalar like above, must be 1+ dimensional Tensors
print(tensorflow_multi_leaky_relu(tf.constant([3.3]), tf.constant([0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([1, 2]), tf.constant([0.6, 0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([-1, -2]), tf.constant([0.6, 0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([-1.0, -2.0]), tf.constant([0.6, 0.6])))
Output
tf.Tensor([3.], shape=(1,), dtype=float32)
tf.Tensor([3.3], shape=(1,), dtype=float32)
tf.Tensor([1. 2.], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)
Check the versions of tensorflow_multi_leaky_relu
print(tensorflow_multi_leaky_relu.pretty_printed_concrete_signatures())
Output
multi_leaky_relu(z, alpha)
Args:
z: int32 Tensor, shape=(1,)
alpha: float32 Tensor, shape=(1,)
Returns:
float32 Tensor, shape=(1,)
multi_leaky_relu(z, alpha)
Args:
z: float32 Tensor, shape=(1,)
alpha: float32 Tensor, shape=(1,)
Returns:
float32 Tensor, shape=(1,)
multi_leaky_relu(z, alpha)
Args:
z: int32 Tensor, shape=(2,)
alpha: float32 Tensor, shape=(2,)
Returns:
float32 Tensor, shape=(2,)
multi_leaky_relu(z, alpha)
Args:
z: float32 Tensor, shape=(2,)
alpha: float32 Tensor, shape=(2,)
Returns:
float32 Tensor, shape=(2,)
You should notice the arguments and returns of each version of multi_leaky_relu
. They are different in the type and shape of each tensor.
multi_leaky_relu
Optimally defining More optimally, we can reduce the work of repeating the same value in the alpha
argument by using tf.tile for alpha in the function multi_leaky_relu
. tf.tile
helps to create an alpha
tensor which has the same shape of z.
def multi_leaky_relu(z, alpha):
alpha = tf.tile(alpha, tf.shape(z)) # tf.tile create an alpha tensor which has the same shape of z
elems = (z, alpha)
result = tf.map_fn(lambda x:leaky_relu(x[0], x[1]), elems, dtype=tf.float32)
return result
tensorflow_multi_leaky_relu = tf.function(multi_leaky_relu)
print(tensorflow_multi_leaky_relu(tf.constant([1, 2]), tf.constant([0.6]))) # Now, we only give the value one time to the ```alpha``` argument
print(tensorflow_multi_leaky_relu(tf.constant([-1, -2]), tf.constant([0.6])))
print(tensorflow_multi_leaky_relu(tf.constant([-1.0, -2.0]), tf.constant([0.6])))
Output
tf.Tensor([1. 2.], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)
tf.Tensor([-0.6 -1.2], shape=(2,), dtype=float32)
Check the versions of tensorflow_multi_leaky_relu
print(tensorflow_multi_leaky_relu.pretty_printed_concrete_signatures())
Output
multi_leaky_relu(z, alpha)
Args:
z: int32 Tensor, shape=(2,)
alpha: float32 Tensor, shape=(1,)
Returns:
float32 Tensor, shape=(2,)
multi_leaky_relu(z, alpha)
Args:
z: float32 Tensor, shape=(2,)
alpha: float32 Tensor, shape=(1,)
Returns:
float32 Tensor, shape=(2,)
Understanding more about Eager execution and Graph execution
To understand more clearly about Eager execution and Graph execution, let's carry out this experiment. The function below is in graph mode, look at the decorator @tf.function
@tf.function
def add_and_sum(tensor_a, tensor_b):
print("Calculating sum")
tensor_add = tf.add(tensor_a, tensor_b)
return tf.reduce_sum(tensor_add)
tensor_a = tf.constant([1, 2, 3])
tensor_b = tf.constant([4, 5, 6])
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
Output
Calculating sum
add_and_sum
is called 4 times, but there is only 1 line of "Calculating sum".
Function
uses Graph execution. But we can actually change to make it use Eager execution by setting tf.config.run_functions_eagerly
to True.
By default, tf.config.run_functions_eagerly(True)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
tf.config.run_functions_eagerly(False) # Remember to set back to False
Now, there are exactly 4 printings of "Calculating sum" Output
Calculating sum
Calculating sum
Calculating sum
Calculating sum
Explanation:
- In the first case of Graph execution,
Function
needs to initially run the Python code to create a graph. The line of "Calculating sum" is executed in this stage. In particular, this "tracing" process creates graph by choosing which operation can be integrated into the graph. Theprint
, which is a Python function, is therefore not in the graph. - In the latter case of Eager execution, all the 4 lines eagerly execute in the Python context, so the 4 printings execute normally.
tf.print
For printing in graph mode, we have to use @tf.function
def add_and_sum(tensor_a, tensor_b):
tf.print("Calculating sum") # Use tf.print
tensor_add = tf.add(tensor_a, tensor_b)
return tf.reduce_sum(tensor_add)
tensor_a = tf.constant([1, 2, 3])
tensor_b = tf.constant([4, 5, 6])
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
sum = add_and_sum(tensor_a, tensor_b)
Output
Calculating sum
Calculating sum
Calculating sum
Calculating sum
Now, there are 4 lines of printing.
To compare the speed of eager execution and graph execution, use timeit
def add_and_sum(tensor_a, tensor_b):
#tf.print("Calculating sum") # Use tf.print
tensor_add = tf.add(tensor_a, tensor_b)
return tf.reduce_sum(tensor_add)
import timeit
#tf.config.run_functions_eagerly(True)
print("[+] Eager execution: ", timeit.timeit(lambda: add_and_sum(tensor_a, tensor_b), number=1000))
#tf.config.run_functions_eagerly(False)
add_and_sum_as_graph = tf.function(add_and_sum)
print("[+] Graph execution: ", timeit.timeit(lambda: add_and_sum_as_graph(tensor_a, tensor_b), number=1000))
Output
[+] Eager execution: 0.05073176199999807
[+] Graph execution: 0.3550005580000004
Is that because running by CPU makes the graph execution slow? No.
Graph execution is known as having faster speed than eager execution. To achieve this, it requires a high time cost of creating graph at the beginning.
There are 1000 executions in each mode above. Each execution needs to create one graph. And because the time to create these graphs is even longer that the time of operations, so we gain no benefits from using graph execution.
No, it's not because of CPU. If the operations account a greater portion when compared to the process of creating graph, only then we can perceive the real advantage of using Graph execution.
x = tf.random.uniform(shape=[10, 10], minval=-1, maxval=2, dtype=tf.dtypes.int32)
def power(x, y):
result = tf.eye(10, dtype=tf.dtypes.int32)
for _ in range(y):
result = tf.matmul(x, result)
return result
print("Eager execution:", timeit.timeit(lambda: power(x, 100), number=1000))
power_as_graph = tf.function(power)
print("Graph execution:", timeit.timeit(lambda: power_as_graph(x, 100), number=1000))
Output
Eager execution: 2.267755313000009
Graph execution: 0.6547386279999898
You should try with power to 10, 100, 1000, 10000 to gradually see the increase in performance.