Learning Center

Data mining is a procedure to find relationships among a group of variables in a database. SAFE TOOLBOXES® comes with two data mining models: forward stepwise regression and backward stepwise regression. Both methods are based on running multivariate linear regression models multiple times, but they differ in how the variables are included or excluded.

The forward stepwise regression starts from an empty model, allowing variables to be added and removed. The backward stepwise regression starts from a complete model, allowing variables to be added and removed. The result of the process is a multivariate linear regression that supposedly contains the “best” set of explanatory variables.

Now, let’s illustrate this tool with an example.

Suppose that you have the following database:

A

B

C

D

E

F

1

Y

X1

X2

X3

X4

X5

2

80.9

21.3

79.3

92.1

96.0

191.8

3

46.3

11.9

46.8

135.8

100.5

89.9

4

61.2

23.6

62.1

101.3

83.7

241.8

5

61.5

25.7

62.8

91.9

80.9

231.7

6

78.0

11.4

77.4

78.1

68.6

325.9

7

58.6

5.5

57.6

104.0

65.7

213.1

8

95.2

12.2

97.4

73.8

72.4

136.7

9

47.0

21.3

47.7

148.5

87.2

77.1

10

95.3

25.3

92.7

63.3

86.6

187.3

11

48.8

8.5

50.3

102.2

79.1

197.0

12

74.8

14.8

74.1

86.3

87.0

391.9

13

57.7

10.2

57.0

141.0

85.6

207.2

14

59.8

17.0

57.6

70.8

70.9

169.3

...

...

...

...

...

...

...

288

59.2

-0.7

60.7

99.3

44.9

240.0

289

65.2

13.8

65.8

118.7

90.0

198.9

290

62.0

16.4

64.6

143.7

59.6

187.7

291

92.8

26.7

91.7

115.8

62.9

230.1

292

88.9

3.6

88.5

73.7

100.8

182.2

293

63.7

1.6

62.7

27.4

89.1

78.3

294

73.3

18.7

72.3

121.3

71.1

92.6

295

37.7

7.0

38.0

82.0

90.1

135.8

296

56.3

16.5

57.9

92.0

101.0

86.2

297

97.9

3.5

97.4

66.5

94.1

203.6

298

51.1

8.5

51.5

61.0

79.9

249.9

299

37.2

27.3

37.7

87.4

76.7

218.4

300

60.5

21.7

60.4

167.9

105.6

214.5

301

44.9

16.6

44.3

68.2

98.8

178.2

302

 

 

 

 

 

 

If you want, for instance, to run a forward stepwise regression, follow these steps:

  1. Select the Econometrics Toolbox tab
  2. Select any cell containing the Y series (let’s say, cell A5) and then hold the CTRL key
  3. Include all the other series, one by one, selecting just one cell for each series still holding the CTRL key. This step will define the set of candidate variables to the best model
  4. Adjust the start sample to 0% and the final sample to 100% (to use all 300 sample points)
  5. Set option Category = “Data mining” and item = “ForwardStepwise” under the “Analysis” group and click on the button to confirm your choice. This will add the following text in the command window: ForwardStepwise(PValueEnter,PValueLeave): Y[ ] = C + X1[ ] + X2[ ] + X3[ ] + X4[ ] + X5[ ]. Alternatively, you can type the equation directly in the command window
  6. Substitute the text PValueEnter and PValueLeave with the desired parameters for the Forward Stepwise model, let’s say, ForwardStepwise(0.1,0.2)
  7. Click on the button , to run the regression

Examining the final equation model of the regression in tab , we will find:

© 2016 Safe Quantitative Technologies, ltd. All rights reserved.