Normal Gradient Descent uses the entire training set (the summation sums the error for all training data) to compute the gradients, which can be resource intensive.

Stochastic Gradient Descent attempts to avoid this by using only a random subset of the training set to compute the gradients.