Abstract The problem of reinforcement learning in the case of a continuous action set remains largely unsolved. This paper offers a possible solution by attempting to model softmax action selection in the continuous case through the use of a distribution whose moments are modified using the TD-error update. Appropriate updates for all moments of the distribution are derived and an actor-critic implementation of the method is described. The effectiveness of this approach is demonstrated by a set of experiments.