## Sunday, May 6, 2012

### Trade Series Optimization by Regression Filter

I'm starting a series on something I call Trade Series Optimization by Regression Filter or TSOBRF. Any trading system will produce a time series of trades which may or may not be profitable. Even if it's unprofitable, as long as some of the trades are profitable, it may be possible the make the overall system profitable by screening out or filtering the losing trades. To construct a filter it is necessary to have some data, as much as possible, which may be giving an indication of whether a particular trade is likely to be a winner or a loser. It is important that the data we are going to use to make the filter (X) be completely independent of the trade series (Y). What this means is that the X data must precede the Y data in time. If you wish to try to predict what the market will do today, you may only use data from the previous day or data that existed prior to the market open for the same day. Note that this is not the same as saying the Y values must be independent of the X values. We hope the opposite is true. In the spreadsheet image below is a simple example of how to set up the filter calculation using Excel.
The data in this is example are all random numbers and have no connection with any market or trading. It's just to show how to construct a filter.
In Excel there is a built in function called LINEST. LINEST uses least squares analysis to find a first order equation which best fits known data. Its inputs are the independent variables (x values) which hopefully are drivers for the dependent variable or Y values. In this example, the X values are in the columns labeled X1 through X5, rows 12 through 19. The dependent or Y values are in the Y column in the corresponding rows. When we apply the LINEST function to our data set, its output is stored in the spreadsheet cells beginning in cell B2 down to cell G6. The LINEST function returns a lot of values, but we are only concerned with the ones in the first row. The values in the first row are the coefficients of an equation of the form y = C1*X1 + C2*X2 + C3*X3 + C4*X4 + C5*X5 + b. The LINEST output are in reverse order of the inputs, so I have reversed them in cells A8 through F8 so that they correspond directly to the X values in the cells below. The value b is a constant. So now we have an equation which fits our data as best as we can using a first order equation.
We then apply the equation to the X values one row at a time and put the result of this calculation in the column Yr. If our curve fitting were perfect, each value in column Yr would be exactly the same the correspond value in column Y. However, because this is just some random numbers it's hard to get any kind of fit at all. But, as a check on the calculation we sum up the values in columns Y and Yr for rows 12 through 19 and we see that the sums are exactly the same. If these sums are not the same, then some thing is wrong with the calculation.

What is happening is that this equation produced by the regression analysis is trying to predict what the Y value should be for a given set of X values. Because the data are random, it can't really do this, but it is still trying to give us an indication of what we might expect Y to be based on a given set of data. If the Y data were trading results, then we would want to take trades when the Yr values are positive, i.e. the trade was profitable. So, we now filter the trades by taking only the ones where the Yr values are grater than zero. We thus filter the values from column Y and put them in column Yf when the value Yr is greater than zero. When this is done and we sum up the Yf values we see the total is plus 15 whereas the total of all the trades is minus 9. If you get to this point and the sum of column Yf is still negative, then there little more you can do. Your theory about the relationship between the X and Y values must have been wrong. But even if your theory is wrong, you may still see a correlation as we do here with totally random data. In the next step we put the filter to the true test by applying it to data it has not previously seen.

In rows 21 and 22 we have two more data points which were not used to develop the filter. If we apply the filter to this out of sample (OOS) data and the filtered trades are positive then we may have something. In this case our OOS trades were +3 and -9, but the Yr values are both greater than zero, so the trades are selected, i.e. in actual trading we would have taken these trades. However, the sum of the trades is -6, so the filtering process did not produce profitable trades, and it's back to the drawing board.

I the next post I will show an example with real world data.