/ TensorFlow

A Guide to TensorFlow (Part 1)

TensorFlow is an open source software library for numerical computation using data flow graphs. It is an extremely popular symbolic math library and is widely used for machine learning applications such as neural networks.

Series Introduction

This blog is a part of "A Guide To TensorFlow", where we will explore the TensorFlow API and use it to build multiple machine learning models for real- life examples. In this blog we shall uncover TensorFlow Graph, understand the concept of Tensors and also explore TensorFlow data types. (Instructions for installing TensorFlow: tensorflow.org)

Tensorflow Graph

At the heart of a TensorFlow program is the computation graph described in code. A computation graph is essentially a series of functions chained together, each passing its output to zero, one, or more functions further along in the chain. Using graphs a user can construct a complex transformation on data by using blocks of smaller, well-understood mathematical functions.
The two fundamental building blocks of graphs are nodes and edges

  • Nodes represent some sort of computation or action being done on or with data in the graph’s context. Nodes are typically expressed as circles
  • Edges are the actual values that get passed to and from Operations, and are typically drawn as arrows. Conceptually, we can think of edges as the link between different Operations as they carry information from one node to the next.

Let us look at a simple graph, here we have three types of nodes input, add and multiply, it is a good practice to define an input node explicitly, this allows us to relay a single input value to a large number of nodes in the graph.
A Simple Dataflow Graph
The above graph can be written using a series of equations.
$$
a= input_1;\ b=input_2 \\
c = a + b;\ d = a \cdot b \\
e = c + d
$$
With our input 5 and 3, we get the output of \(e\) to be 23

Graph Dependency

Before we start to write code for our first TensorFlow Graph, we need to understand that there are certain types of connections between nodes that are not allowed, the most common of which is one that creates an unresolved circular dependency. A dependency can be seen in the graph above, here node \(c\) is dependent on \(a\) and \(b\), similarly, \(d\) is dependent on \(a\) and \(b\). However, nodes \(c\) and \(d\) are independent of each other. Also, it must be noticed that although nodes \(a\) and \(b\) do not have an edge directly going to node \(e\) however since \(c\) and \(d\) are its dependencies, \(a\) and \(b\) become indirect dependencies of \(e\).

Now suppose we consider a graph like this, where the output of node \(e\) is fed back to a node in the graph.
Graph Circular Dependency
For such a graph, the problem is that node \(b\) now has node \(e\) as a direct dependency, while at the same time, node \(e\) is dependent on node \(b\) (as we saw previously). The result of this is that neither \(b\) nor \(e\) can execute, as they are both waiting for the other node to complete its computation.

If you think that by assigning an initial value to node \(e\), we can solve this, you need to think beyond the first pass, the graph will be set in an endless feedback loop, each time the output changes values the graph output is computed again. Because of this, the program might end up in an overflow or an underflow depending on the graph operations. This makes it impossible for truly circular dependencies to be modelled in TensorFlow, which is not really a huge limitation. In practice if we at all need to model a similar iteration we place copies of the graph side-by-side, and feeding them into one another in sequence, this process is called "unrolling" the graph. This is a very fundamental concept you will learn when studying about recurrent neural networks.

Building your first Graph

We start off by importing tensorflow on our python console

import tensorflow as tf

Now, in TensorFlow, any computation node in the graph is called an Operation, or Op for short. Ops take in zero or more Tensor objects as input and output zero or more Tensor objects. TensorFlow provides several operations that we can use to generate constants, some of them are

  • tf.zeros: Creates a tensor with all elements set to zero.
  • tf.ones : Creates a tensor with all elements set to 1.
  • tf.constant: Creates a constant tensor.

To create an Operation, we call its associated Python constructor - in this case, tf.constant() creates a “constant” Op. We create our node \(a\) and \(b\) and also pass in an optional string name parameter. It is often convenient to give names to each component of an element, for example, if they represent different features of a training example. Also, names are used in TensorBoard to nicely show names of edges. (Don't worry if you don't know about TensorBoard, it will be covered in future blogs)

a = tf.constant(5, name="input_a")
b = tf.constant(3, name="input_b")

Now node \(c\) uses tf.add and node \(d\) used tf.multiply, these are one of the many tensorflow operations that we can use to add basic arithmetic operators to our graph. tf.multiply takes in two inputs and outputs the result of multiplying them together and tf.add() outputs the result of adding two inputs together. Here is a list of other arithmetic operations

c = tf.add(a,b, name="add_c")
d = tf.multiply(a,b, name="multiply_d")

Node \(e\) uses tf.add just like node c

e = tf.add(c,d, name="add_e")

Now with this, our graph is defined, however, this won't produce any output, in fact it won't even actually perform the operations we have defined. A TensorFlow program is written in two parts

  1. Defining the computation graph
  2. Running the graph (with data)

To execute our python script we need to add the following two lines, we need to run the following two statements. For now, just assume that this is how we run our code, we shall cover sessions in a future blog.

sess = tf.Session()
output = sess.run(e)

Note: If you want to run this code as a python file and not in console, just print the output variable and execute the python file

Here's the link to the final code, and also a small peek at TensorBoard

Tensors

Up till now, we used a single number as our data in the node, in other words, what we were dealing with was a simple scalar number, it is good to keep things simple when learning, however, TensorFlow allows us to use n-dimensional arrays instead of those simple scalars. This n-dimensional array is called a Tensor, tensors are a superset of matrices. So a 1-D tensor would be equivalent to a vector, a 2-D tensor is a matrix, and above that, we can just say N-D tensor. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

When writing a TensorFlow program, the main object we manipulate and pass around is the tf.Tensor. A tf.Tensor object represents a partially defined computation that will eventually produce a value. TensorFlow programs work by first building a graph of tf.Tensor objects, detailing how each tensor is computed based on the other available tensors and then by running parts of this graph to achieve the desired results. A tf.Tensor has two properties, a data type (float32, int32, or string, for example) and a shape. Each element in the Tensor has the same data type, and the data type is always known. The shape is, the number of dimensions it has and the size of each dimension.

So in our previous example, we can replace nodes \(a\) and \(b\) with a single node, that sends a 1-D tensor or a vector. (Note, a scalar is a 0-D tensor)

Graph using Tensors

Now, instead of having two separate input nodes, we have a single node that can take in a vector (or 1-D tensor) of numbers. Using tensors here was a better way to define our graph since the client has to worry about a single node only, moreover, we are reducing the number of dependencies for preceding nodes as well. Also allowing us to use multi-dimensional data gives use great amount of flexibility. We can also have the graph enforce a strict requirement, and force inputs to be of length two (or any length we’d like)

We can implement this change in TensorFlow by modifying our previous code:

import tensorflow as tf

a = tf.constant([5,3], name="input_a")
b = tf.reduce_sum(a, name="sum_b")
c = tf.reduce_prod(a, name="prod_c")
d = tf.add(b,c, name="add_d")

sess = tf.Session()
output = sess.run(d)

You might have noticed in the graph that we have replaced 'add' with 'sum' and 'mul' with 'prod'. A similar change is in the code as well. tf.reduce_sum computes the sum of elements across dimensions of a tensor and tf.reduce_prod computes the product of elements across dimensions of a tensor. Both of these take additional parameters as well, they are axis, a boolean argument keep_dims and reduction_indices along with the usual tensor and name. You can read more about them here and here.

Some types of tensors are special, and these will be covered in future blogs. The main ones are:

  • tf.Variable
  • tf.Constant
  • tf.Placeholder
  • tf.SparseTensor

With the exception of tf.Variable, the value of a tensor is immutable, which means that in the context of a single execution tensors only have a single value. However, evaluating the same tensor twice can return different values; for example, that tensor can be the result of reading data from disk, or generating a random number.

Shape

The shape, in TensorFlow terminology, describes both the number dimensions in a tensor as well as the length of each dimension. Shapes can be represented with the tf.TensorShape, or via Python lists or tuples of ordered integers. For example, the list [2, 3] describes the shape of a 2-D tensor of length 2 in its first dimension and length 3 in its second dimension. The following table illustrates more examples of shape.

Dimension Example Description
0-D s_0 = [ ] Shapes that specify a 0-D Tensor (scalar)
1-D s_1 = [3] Shape that describes a vector of length 3
2-D s_2 = (3, 2) Shape that describes a 3-by-2 matrix.

In addition to being able to specify fixed lengths for each dimension, we can also assign a flexible length by passing in None as a dimension’s value.

Dimension Example Description
1-D s_1_flex = [None] Shape for a vector of any length
2-D s_2_flex = (None, 3) Shape for a matrix that is any amount of rows tall,
and 3 columns wide
3-D s_3_flex = [2, None, None] Shape of a 3-D Tensor with length 2 in its first dimension,
and variable length in its second and third dimensions
Any s_any = None Shape that could be any Tensor

We can use the tf.shape Op to find the shape of a tensor. It simply takes in the Tensor object you’d like to find the shape for and returns it as an int32 vector.

shape = tf.shape(your_tensor, name="unknown_shape")

Datatypes

TensorFlow can take in Python numbers, booleans, strings, or lists of any of the above. Single values will be converted to a 0-D Tensor (or scalar), lists of values will be converted to a 1-D Tensor (vector), lists of lists of values will be converted to a 2-D Tensor (matrix), and so on. It is not possible to have a tf.Tensor with more than one data type. Here’s a small table showcasing this:

Data type Python type Description
DT_FLOAT tf.float32 32 bits floating point.
DT_DOUBLE tf.float64 64 bits floating point.
DT_INT8 tf.int8 8 bits signed integer.
DT_INT16 tf.int16 16 bits signed integer.
DT_INT32 tf.int32 32 bits signed integer.
DT_INT64 tf.int64 64 bits signed integer.
DT_UINT8 tf.uint8 8 bits unsigned integer.
DT_UINT16 tf.uint16 16 bits unsigned integer.
DT_STRING tf.string Variable length byte arrays. Each element of a Tensor is a byte array.
DT_BOOL tf.bool Boolean.
DT_COMPLEX64 tf.complex64 Complex number made of two 32 bits floating points: real and imaginary parts.
DT_COMPLEX128 tf.complex128 Complex number made of two 64 bits floating points: real and imaginary parts.
DT_QINT8 tf.qint8 8 bits signed integer used in quantized Ops.
DT_QINT32 tf.qint32 32 bits signed integer used in quantized Ops.
DT_QUINT8 tf.quint8 8 bits unsigned integer used in quantized Ops.

It is possible to cast tf.Tensors from one datatype to another using tf.cast:

float_tensor = tf.cast(tf.constant([1, 2, 3]), dtype=tf.float32)

To inspect a tf.Tensor's data type we can use the Tensor.dtype property.

>>> x = tf.Variable(tf.random_normal([256, 100]))
>>> x.dtype
<dtype: 'float32_ref'>

TensorFlow and NumPY

TensorFlow is very tightly integrated with NumPy, TensorFlow’s data types are based on those from NumPy, to verify this you can run the following statement np.int64 == tf.int64 and it will return True.
NumPy array can be passed into any TensorFlow Op, and the beauty is that we can easily specify the data type you need with minimal effort.
TensorFlow API is designed to make it easy to convert data to and from NumPy array, for example, while initializing a tensor with a constant value, we can pass a NumPy array to the tf.constant Op, and the result will be a TensorFlow tensor with that value. We can also use tf.convert_to_tensor to converts Python objects of various types to Tensor objects. It accepts Tensor objects, numpy arrays, Python lists, and Python scalars.

Note: Although NumPy is able to automatically detect data types, it really is best to start getting in the habit of being explicit about the numeric properties you want your Tensor objects to have. This is useful when we are dealing with huge graphs, it is extremely useful in situations where we have to hunt down an object causing a TypeMismatchError.

That's it for this guide, in the next blog we shall learn more about TensorFlow Ops, managing multiple graphs and also get to learn about Sessions in TensorFlow.