As a wrap-up of the last posts about optimization of a Python simulation, here are the results of several different optimization techniques, all the way down to the ultimate, undesirable silver bullet: converting the application to C.
Vanilla version: around 4:40
Vanilla version with some duplicated calculations elided: 2:17
Optimized with Psyco (psyco.full()): 0:23
Black-Scholes functions converted to Cython, with static typing, and without Psyco: 0:40
Psyco + Cython initial optimization: 0:10 (good enough, this is the version in current use)
random.gauss() (which profiling shown to be expensive) converted to Cython: 0:07
Some max() calls in Cython code replaced by a Cython local function, plus a power of 0.5 replaced by sqrt(): 0.06
Cython cdivision flag set to True: 0.05.7
Cython-written module compiled with -O9: 0:05.4
Lipo in stop-loss function, avoid if..elif chain for several strategies: 0:05.1
Simplified operation value calculation, returns only total value instead of each option's value (which is only necessary when logging/debugging): 0.04.8
Optimization regarding volatility skew (save one call to gauss() when there is none) and avoid value recalculation on stop loss when possible: 0:04:4
Whole simulation converted to C: 0:02.1
C version compiled with -O9: 0:01.5 (guess what, it does make a difference!)
Save a call to gauss() when there is no volatility skew: 0:01:3
Whole simulation converted to asm: ... just kidding :)
Note that several "optimizations" are in fact performance bugs that were fixed.