# How to implement the Softmax derivative independently from any loss function?

Mathematically, the derivative of Softmax **σ(j)** with respect to the logit **Zi** (for example, Wi*X) is

where the red delta is a Kronecker delta.

If you implement iteratively:

import numpy as npdefsoftmax_grad(s):

# Take the derivative of softmax element…