forecastingArticleEffects of the Swiss Franc/Euro Exchange Rate Flooron the Calibration of Probability ForecastsBrian D. DeatonWalter F. and Virginia Johnson School of Business, McMurry University, Abilene 79697, TX, USA;[email protected]; Tel.: +1-325-793-3854 Received: 26 March 2018; Accepted: 26 April 2018; Published: 2 May 2018 Abstract: Probability forecasts of the Swiss franc/euro (CHF/EUR) exchange rate are generatedbefore, surrounding and after the placement of a floor on the CHF/EUR by the Swiss NationalBank (SNB). The goal is to determine whether the exchange rate floor has a positive, negative orinsignificant effect on the calibration of the probability forecasts from three time-series models:a vector autoregression (VAR) model, a VAR model augmented with the LiNGAM causal learningalgorithm, and a univariate autoregressive model built on the independent components (ICs) of anindependent component analysis (ICA). Score metric rankings of forecasts and plots of calibrationfunctions are used in an attempt to identify the preferred time-series model based on forecastperformance. The study not only finds evidence that the floor on the CHF/EUR has a negative impacton the forecasting performance of all three time-series models but also that the policy change by theSNB altered the causal structure underlying the six major currencies.Keywords: probability forecasting; calibration; evaluating forecasts; causality; exchange rates; vectorautoregression models1. IntroductionOn 6 September 2011, the Swiss National Bank (SNB) began intervening in the Swiss franc/euro(CHF/EUR) exchange rate market to prohibit the franc from appreciating beyond 1.20 francs per euro,and it continued this intervention throughout 2012 [1,2]. The objective of this study is to assess theimpact of this currency intervention on the probability forecasts of the CHF/EUR from three time-seriesmodels: a vector autoregression (VAR) model, a VAR model augmented with the LiNGAM causallearning algorithm, and a univariate autoregressive model built on the independent components (ICs)of an independent component analysis (ICA). One-step-ahead forecasts of the CHF/EUR probabilitydistribution are generated from each time-series model and are based on a series of intraday data forsix exchange rates (all versus the Swiss franc). The forecasted probability distributions are tested forcalibration and ranked with two different scoring techniques in periods of time before, surrounding,after and long after the beginning of the CHF/EUR exchange rate intervention.In contrast to other literature on exchange rate forecasting that examines point forecasts ofexchange rates, this study follows the example set by [3] and evaluates forecasted probabilitydistributions. A brief summary of the most relevant literature concerning the exchange rate forecastingperformance of multivariate time-series models is as follows. Reference [4] determines that theforecasting accuracy of restricted VAR models is better than that of unrestricted VAR models forforecasting the US dollar/yen, US dollar/Canadian dollar and US dollar/Deutsche Mark monthlyexchange rates. Reference [5] uses VAR, Bayesian VAR and vector error correction (VEC) models toforecast the Australian Dollar/United States Dollar monthly exchange rate and concludes that the VECexhibits superior forecasting performance. Reference [6] uses a VAR, restricted VAR, Bayesian VAR,VEC and Bayesian VEC to forecast five Central and Eastern European monthly exchange rates andForecasting 2019, 1, 3–25; doi:10.3390/forecast1010002 www.mdpi.com/journal/forecastingForecasting 2019, 1 4concludes that none of the models outperform the others for three-month forecasts and that theBayesian models tend to perform better than the others for five-month forecasts. Reference [7]forecasts the monthly exchange rates of 33 exchange rates against the US dollar using a large BayesianVAR model; the results indicate that the Bayesian VAR model forecasts better than a random walkmodel for most of the currencies.There are many other techniques used to forecast exchange rates in addition to VAR andVEC models. For instance, Reference [8] surveys the literature on exchange rate forecastingand reports that factor-based models and time-varying parameter models outperform a varietyof other models, but the results are sensitive to the chosen sample periods and time horizons.Machine learning algorithms are also popular for forecasting foreign exchange. Reference [9] usesartificial neural network, k-nearest neighbor, decision tree, and naïve Bayesian classifier learningalgorithms to predict the USD/GBP daily exchange rate. All algorithms had a similar performanceand there was a high degree of correlation between their predictions. Reference [10] compared theperformance of several machine learning algorithms including multi-layer perceptron, support vectorregression, and gamma classifier to the performance of more traditional time-series models includingautoregressive, autoregressive moving-average, and autoregressive integrated moving-average models.Results were mixed and depended upon which exchange rate (MXN/USD, JPY/USD, or USD/GBP)was being forecasted. Other studies such as [11] and [12] have focused on forecasting exchange ratesusing various artificial neural network models.2. Materials and Methods2.1. Probabilistic ForecastingLet xt = (x1t, . . . , xmt) be the observed values of an m × 1 vector time series Xt at time period t.Suppose that at any time n, the forecaster knows values xt, t = 1, . . . , n and must issue a set ofprobability distributions Pn+1 for the next observation Xn+1. A prequential forecasting system (PFS) isa rule which associates a choice of Pn+1 with each value of n and with any possible set of outcomesxt, t = 1, . . . , n [13]. A PFS is so named because it is the combination of probability forecasting andsequential prediction; this concept is also known as “probabilistic forecasting” or “density forecasting”.Reference [13] suggests that the adequacy of a PFS as a probabilistic explanation of the datashould depend only on the sequence of forecasts that the PFS in fact made; this is called theprequential principle. In practice, the prequential principle is implemented by using the calibrationcriterion to judge whether or not a PFS issues adequate probabilities. For a PFS to be well calibratedaccording to the calibration criterion, the PFS must assign a probability to each event that matches thatevent’s ex post relative frequency.Formal testing of calibration relies on the probability integral transform as shown in [13] andsummarized as follows. For a continuous random variable Xi,t+1 (i.e., the one period forecast fortime series i), let Ui,t+1 = Fi,t+1(Xi,t+1) be the continuous distribution function of Pi,t+1. Under Pi,t+1the Ui,t+1 are independent uniform U[0, 1] random variables so that Pi,t+1 is considered to be wellcalibrated if the observed sequence of fractiles ui,t+1 = Fi,t+1(xi,t+1) “looks like” a random samplefrom U[0, 1]. In other words, the PFS is well calibrated if the observed sequence ui,t+1 = Fi,t+1(xi,t+1)has cumulative distribution function G(ui,t+1) = ui,t+1.The cumulative distribution function G(Ui,t+1) for Ui,t+1 is estimated by arranging the observedsequence ui,t+1 = Fi,t+1(xi,t+1), t = 1, . . . , N in order of ascending value ui,t+1(1), . . . , ui,t+1(N)and calculatingGˆ[ui,t+1(j)] = j/N, j = 1, . . . , N. (1)Calibration performance can be shown graphically as a plot of the PFS’s observed fractiles(ui,t+1’s) on the x-axis against the estimated cumulative distribution function Gˆ(Ui,t+1) on the y-axis.This calibration plot will be approximately a 45-degree line for a well-calibrated PFS.Forecasting 2019, 1 5In practice, a chi-squared goodness-of-fit test can be performed to test a PFS for calibration. This testuses the sequence of observed fractiles (ui,t+1’s) from the sequence of probability forecasts Pi,t+1. Underthe null hypothesis that the forecasts are well calibrated, the distribution of a sequence of N observedfractiles is a uniform distribution on the interval [0, 1], whereas the alternative hypothesis is that thedistribution of observed fractiles is not uniform. If the interval [0, 1] is divided into J nonoverlappingsubintervals of length L (where 0 ≤ L ≤ 1), the goodness-of-fit statistic is calculated asX2 =J∑j=1aj – LjN2/LjN (2)where aj is the actual number of observed fractiles in interval j and Lj is the length of interval j [3].The goodness-of-fit statistic is compared to the chi-squared distribution with J – 1 degrees of freedom.This test and all other chi-squared goodness-of-fit tests share a common form which is a sum of termscontaining the square of a difference between an observed count and an expected count divided by theexpected count∑(observed – expected)2/expected. (3)For more information on the goodness-of-fit test see [14].2.2. Scoring ForecastsIn addition to calibration plots and calibration tests, prequential forecasting systems can beevaluated by metrics such as the mean-squared error (MSE) criterion or the probability score (Brier1950) [15]. The MSE criterion is most often used to evaluate point forecasts, but it can also be usedto evaluate predictive distributions [3]. The MSE is calculated for probability forecasts by using theexpected value of the forecast distribution. Let Pi,n+1, n = 1, . . . , K be a sequence of probabilityforecasts for the ith element Xi,n+1 of the random time-series vector Xn+1 and Mi,n+1 be the expectedvalue of the distribution Pi,n+1. The MSE of the forecasts for Xi,n+1 is calculated as followsMSE = 1KK∑n=1(xi,n+1 – Mi,n+1)2 (4)where xi,n+1 is the observed value of Xi,n+1. The sequence of forecasts with the smallest MSE ispreferred; a PFS P is chosen over an alternative PFS Q if the PFS P has the smallest MSE.In contrast to the MSE, the probability score evaluates the entire forecasted probability distribution(Brier 1950) [15]. On any occasion n + 1, suppose that there are R possible outcomes for Xi,n+1 withprobabilities fnj+1, j = 1, . . . , R so thatR ∑j=1fnj+1 = 1, n = 1, . . . , K(5) The probability score is defined asPS = 1KR∑j=1K∑n=1 fnj+1 – Enj +12 (6)where Ejn+1 takes the value 1 if outcome j occurred and 0 otherwise. The usage of the probability scoreis similar to that of the MSE; the sequence of forecasts with the smallest probability score is preferred.A PFS P is chosen over an alternative PFS Q if the PFS P has the smallest probability score.Forecasting 2019, 1 62.3. Independent Component AnalysisIn basic independent component analysis, there are n observed variables x1, . . . , xn that are linearcombinations of underlying statistically mutually independent source variables s1, . . . , sn xi = ai1s1 + ai2s2 + . . . + ainsn f or all i = 1, . . . , n,which in vector-matrix form is written asx = As(7)(8) where A is the unknown mixing coefficient matrix and s is a vector of unobserved independentcomponents. The observed variables x are used to estimate both A and s. Both x and s can be assumedto have zero mean; if this is not true, then the preprocessing step x = xo – E(xo)(9)will center the original observed variables xo if they are not already centered. The independentcomponents will then also have zero mean sinceE(s) = A-1E(x).Basic ICA model estimation relies on the following assumptions [16](10)1.The independent components are assumed to be statistically independent, but this does not need to be exactly true in application.2. The mixing matrix A is assumed to be square and invertible for the sake of convenienceand simplicity.3. The independent components must have non-Gaussian distributions.Many ICA models differ from the basic ICA model and have their own assumptions. Foradditional details see [16].The independent components s are not only uncorrelated, but they are also as statisticallyindependent as possible. Because achieving this requires more information than a correlationmatrix can provide, the estimation of independent components uses higher-order moments orother information such as the autocovariance structure for time-series variables in addition tocorrelation information.The observed random variables x can be linearly transformed into uncorrelated variables thathave unit variances via a process called whitening. The whitened vector z is computed as z = Vx(11)where the decorrelating matrix V isV = D-1/2ET(12) In the above equation, E = (e1 . . . en) is the matrix whose columns are the unit-norm eigenvectorsof the covariance matrix Cx = ExxT and D = diag(d1 . . . dn) is the diagonal matrix of the eigenvaluesof Cx. Basic ICA estimation requires the higher-order moments of non-Gaussian distributions becausethere are an infinite number of matrices V that can create decorrelated components.2.4. ICA Time SeriesIf the independent components are time series, as opposed to independent random variables inthe basic ICA model, then the ICA model takes the following form [16]x(t) = As(t), t = 1, . . . , T (13)Forecasting 2019, 1 7where t is the time index. Since time-series variables have more structure than independent randomvariables, the time-series autocovariances may be used for estimation instead of the higher-orderinformation that is required in the basic ICA model.The AMUSE algorithm provides one method to estimate the time-series ICA model [16].This algorithm requires the time-lagged covariance matrix in place of the higher-order momentsused in the basic ICA model. The time-lagged covariance matrix is computed asCxt = Enx(t)x(t – t)To (14)where t is a lag constant, t = 1, 2, 3, . . .. This matrix contains the autocovariances of each signal andthe covariances between signals.The algorithm is based on the fact that the instantaneous and lagged covariances of s(t) are zerodue to independence. Hence, the time-lagged covariance matrix is used to find a matrix B so that all ofthe instantaneous and lagged covariances ofy(t) = Bx(t) (15)are equal to zero.The AMUSE algorithm assumes that all of the ICs have autocovariances different from zero anddifferent from each other. This assumption replaces the assumption of the basic ICA model that theindependent components must have non-Gaussian distributions.The AMUSE algorithm uses whitened, zero mean data z(t) as input and generates the separatingmatrix W as output so that Wz(t) = s(t)Wz(t – t) = s(t – t).(16)(17) The time-lagged covariance matrix is modified to be symmetric by the following computationCzt =1 2hCzt + (Czt)Ti (18)so that an eigenvalue decomposition on this new symmetric matrix is well defined. The steps of theAMUSE algorithm are as follows [16]:1. Center and whiten the observed data x(t) to obtain z(t).2. Compute the eigenvalue decomposition of the symmetric, time-lagged covariance matrix(Equation (18)) for some time lag t.3. The rows of the estimated separating matrix Wˆ are given by the eigenvectors.4. The estimated separating matrix for the unwhitened data x is Bˆ = WV ˆ in which V is defined inEquation (12).Time-series models are typically built using observed returns, which are represented in vectorform by the notation R(t) =…, t 2 1, . . . , T(19) [email protected]R1(t)RN(t)1CAwhere Ri(t) is the return on a particular asset i 2 1, . . . , N at time t 2 1, . . . , T. In thefollowing discussion, the vector of observed time-series variables is the vector of observed returns,i.e., x(t) = R(t). A prequential forecasting system can be created with the independent componentsby building on the forecasting method described in [17]. The following procedure is used to create aprequential forecasting system for a set of observed returnsForecasting 2019, 1 81. Compute the independent components using the estimated separating matrixs(t) = BR ˆ (t), t = 1, . . . , T (20)2. Model each independent component with an autoregressive (AR) modelsi(t) = c +k∑t=1jtsi(t – t) + #i(t), i 2 1, . . . , N (21)where c is a constant, k is the number of time-delays (lags) of the autoregression, jt are coefficients,and #i(t) is the innovation process.3. Compute the estimates of the innovation process as follows#i(t) = si(t) – c –k∑t=1jtsi(t – t), i 2 1, . . . , N (22)and estimate the probability distributions of the innovations with a method such as kerneldensity estimation. For an overview of kernel density estimation see [18].4. Obtain samples from the estimated probability distributions of the innovations with a samplingtechnique such as Latin hypercube sampling. A stratified sampling technique such as Latinhypercube sampling is generally more accurate when there are low-probability outcomes, which islikely to be the case in this application [19].5. Use the samples of the innovations in conjunction with historical data and parameter estimates tocompute the estimated probability distribution for the one-step-ahead independent componentsusing Equation (21). 6.Finally, transform the samples of the estimated probability distributions of the independentcomponents into estimated probability distributions of the original variablesx(t) = As ˆ (t), t = 1, . . . , T.(23) 2.5. LiNGAM AlgorithmThe LiNGAM algorithm assumes that the observed variables can be arranged in a causal orderso that the data generating process can be represented by a directed acyclic graph (DAG), that thevalue assigned to each variable is a linear function of values assigned to variables positioned earlierin the causal order, that there are no latent common causes, and that the disturbance terms aremutually independent with non-Gaussian distributions and non-zero variances [20]. The non-Gaussianassumption is important because this allows LiNGAM to estimate the full causal model with noundetermined parameters.LiNGAM assumes that the observed variables are linear functions of the disturbance variables.When the mean is subtracted from each variable, this is expressed as x = Bx + e.(24)Solving for x, this becomesx = Ae(25) where A = (I – B)-1. Equation (24) in addition to the assumption that the disturbance terms areindependent and have non-Gaussian distributions is the independent component analysis model.The ICA model has two indeterminacies that must be resolved before a graphical model can beconstructed: neither the order nor the scaling of the independent components is defined. LiNGAMresolves both of these issues by permuting and normalizing the ICA output (i.e., the mixing matrix)Forecasting 2019, 1 9to obtain a matrix B containing the DAG connection strengths. The graphical representation of thismatrix is the causal DAG model.Because LiNGAM uses the non-Gaussian information contained in the disturbance terms,its output is just one DAG instead of the class of equivalent DAGs found by most causal learningalgorithms. As noted earlier, this output includes parameter estimates for the linear model.The LiNGAM procedure is implemented both in MATLAB (version 7.7) provided by [20] and in theTETRAD IV software package (version 4.3.10) provided by [21]. In the application below, the MATLABcode is used to produce coefficient estimates, and TETRAD IV is used to produce DAG illustrations.2.6. VAR ModelsA vector autoregression (VAR) built using a time series of return observations (Equation (19)) iswritten asR(t) =k∑t=1MtR(t – t) + n(t) (26)where k is the number of time-delays (lags) of the autoregression, Mt are n × n matrices of coefficients,and n(t) is the innovation process.To find an estimate nˆ(t) of the innovation process, estimate the vector autoregressive model usingany least squares method and compute the estimate of the innovation process asnˆ(t) = R(t) –k∑t=1Mˆ tR(t – t). (27)In the application below, the VAR model is used as a one-step-ahead prequential forecastingsystem by using a multivariate normal distribution as the distribution of the innovations nˆ(t).Estimates of the expected value vector and covariance matrix of nˆ(t) are used as parameters ofthe multivariate normal distribution. The multivariate normal distribution of the innovations is usedin Equation (26) with historical data and parameter estimates to create a probability distribution forthe one-step-ahead return vector R(t).2.7. Dynamic Directed Graph Discovery (VAR-LiNGAM)LiNGAM can be combined with the VAR model in a specific way so that the VAR model becomesfully identified as described in [22]; in the following text, this combined model is called VAR-LiNGAM.The VAR-LiNGAM model is a combination of an autoregressive model with time-delays and astructural equation model, which does not consider the time-series structure in data. The autoregressiveportion of VAR-LiNGAM isR(t) =k∑t=1BtR(t – t) + e(t) (28)where k is the number of time-delays (lags) of the autoregression, Bt are n × n matrices of coefficients,and e(t) is the innovation process. The structural equation portion of VAR-LiNGAM isR = BR + e (29)where e is a vector of disturbances and the diagonal of B is defined to be zero.The complete VAR-LiNGAM model is the combination of Equations (28) and (29)R(t) =k∑t=0BtR(t – t) + e(t) (30)where k is the number of time-delays (lags) of the autoregression, Bt are the n × n matrices containingthe causal effects between returns R(t – t) with time lag t = 0, . . . , k, and e(t) are randomForecasting 2019, 1 10disturbances. The Bt matrices for t > 0 correspond to effects from the past to the present, while B0corresponds to instantaneous effects. The VAR-LiNGAM model is based on three assumptions:1. e(t) are mutually independent and temporally uncorrelated, both with each other and over time.2. e(t) are non-Gaussian.3. The matrix B0 corresponds to an acyclic graph.The model is estimated in two stages. First, estimate a traditional vector autoregressive modeland compute the residuals of the model as described above. Then perform a LiNGAM analysis on theestimate of the innovation process to obtain an estimate of the matrix B0, which is the solution to theinstantaneous causal model nˆ(t) = B0nˆ(t) + e(t).Finally, use B0 to compute Bt for t > 0Bˆ t = I – Bˆ 0Mˆ t for t > 0where Mˆ t are estimated coefficient matrices of the VAR model in Equation (26).(31)(32) The VAR-LiNGAM model becomes a prequential forecasting system for the one-step-ahead returnvector R(t) with the following procedure. Compute an estimate of the independent components eˆ(t)from the estimates of the innovations ˆ n(t)eˆ(t) = (I – B0)nˆ(t). (33)Because there is essentially no stochastic dependence between the independent components, theprobability distributions of the individual independent components can be estimated with a univariateestimation method such as kernel density estimation.Next, obtain samples from the estimated probability distributions of the individual independentcomponents with a sampling technique such as Latin hypercube sampling. Transform the samples ofthe independent components into samples of the innovationsnˆ(t) = (I – B0)-1eˆ(t). (34)Finally, samples of the innovations in conjunction with historical data and parameter estimatesare used to compute the estimated probability distribution for the one-step-ahead return vector R(t)using Equation (26).2.8. ApplicationIn the remainder of the paper, probability forecasts of the CHF/EUR exchange rate are generatedfrom the three time-series models. Forecast calibration is evaluated with calibration plots andgoodness-of-fit calibration tests. The mean-squared error and the probability score metrics are thenused to compare the forecasting accuracy of the models. The code used for forecast generation,calibration, and scoring metrics was programmed and executed with MATLAB [23].2.9. Description of the DataData is obtained from the Sierra Chart historical data service using Sierra Chart software(version 842) [24]. Both spot and futures data are available from the data service, and virtuallyidentical model estimation and forecast evaluation results are obtained regardless of which is used.The results presented later in the paper are all reported using futures data. The rationale for presentingthese results is that the futures data originates from a globally accessible exchange whereas the SierraChart spot data which consists of transactions between a small forex dealer and its clients.The data consists of futures contracts that are traded on the CME Group exchange for theAustralian dollar (AUD), Canadian dollar (CAD), euro (EUR), Great Britain pound sterling (GBP),Forecasting 2019, 1 11Japanese yen (JPY) and the Swiss franc (CHF). These currencies are chosen because they had the largestmarket turnover rates in 2010 according to the Triennial Central Bank Survey [25].Sierra Chart software is used to join each currency’s future contracts into a single continuous timeseries for the corresponding currency; for instance, all futures contracts for the AUD (June 2010, …,July 2012) were joined in sequence to form a single continuous time series for the AUD. The originaldata has one-minute periodicity and is aggregated across time into fifteen-minute intervals so thatthe resulting data used in this analysis has fifteen-minute periodicity. A fifteen-minute periodicity isused because it is large enough to give the currencies plenty of time to respond to each other and smallenough to provide the LiNGAM algorithm with a sufficient number of observations.The exchange rates for the six currencies are converted to direct quotations where the domesticcurrency is the CHF so that the data used for the analysis consists of observations of the AUD, CAD,EUR, GBP, JPY and USD quoted as CHF/X where X is one of the stated currencies.Missing data is replaced by the most recent observation in each currency series. Log returns arethen computed by taking the natural logarithm and first-differencing the exchange rates (in that order).All log returns in all time periods are stationary based on Dickey–Fuller tests.2.10. Brief History of the Swiss FrancDuring the second and third quarter of 2011, the SNB became worried that the appreciation ofthe franc against the euro was hurting the Swiss economy and increasing the risk of deflation. InAugust, the SNB drove interest rates to nearly zero and flooded the market with liquidity in an attemptto mitigate the franc’s appreciation, but neither of these actions were completely effective. Finally,the franc’s appreciation was halted in September when the SNB placed a floor on the CHF/EURexchange rate. The sequence of SNB actions were as follows [1]:• 3 August 2011: the SNB lowered the upper limit of its target range for the three-month Libor to0–0.25 percent (from 0 to 0.75 percent).• 10 August 2011: the SNB announced additional measures to increase liquidity and reduce theappreciation of the franc. These included pumping more liquidity into the Swiss money marketand conducting foreign exchange swap transactions (a policy last used in late 2008).• 11 August 2011: an SNB official said that a temporary peg to the euro was possible.• 6 September 2011: the SNB announced that it was establishing a floor on the CHF/EUR exchangerate (ceiling on the EUR/CHF exchange rate). The franc would not be allowed to appreciatebeyond 1.20 francs per euro.2.11. Model EstimationTo analyze forecasts surrounding the establishment of the floor on the CHF/EUR exchange rate,the futures contract time-series data is segmented into four two-month data sets. These four forecastdata sets have corresponding estimation data sets on which estimates of the econometric modelsare made. Note that it is the forecast data sets (not the estimation data sets) that are arranged aroundthe 11 August 2011 intervention announcement, while the matching estimation data sets simplycontain data in the prior six months. The names and descriptions of these four forecast datasets areas follows. In the before data set, the CHF/EUR exchange rate is unencumbered. The surroundingdata set begins on 11 August 2011 when an SNB official announced that a temporary peg was possible;the SNB formally established a floor on the CHF/EUR exchange rate near the middle of this data set on6 September 2011. The after data set begins after the floor has been in effect for just more than a month.The long after data set begins six months after the exchange rate floor has been in place. The exactdates of the forecast data sets and the dates of their accompanying estimation data sets are shownin Table 1. Expected values of the currency log returns in each forecast data set are shown in Table A1,and correlation matrices of the currency log returns in the estimation and forecast data sets are shownForecasting 2019, 1 12in Tables A2 and A3. The estimation results for each of the models on all the estimation data sets arereported in Tables A4–A7.Table 1. The table shows the data set starting and ending dates.Data Set Starting Date Ending DateEstimation Data Setsbefore 11 December 2010 10 June 2011surrounding 11 February 2011 10 August 2011after 11 April 2011 10 October 2011long after 7 September 2011 6 March 2012Forecast Data Setsbefore 11 June 2011 10 August 2011surrounding 11 August 2011 10 October 2011after 11 October 2011 10 December 2011long after 7 March 2012 6 May 2012Model estimation is performed using SAS software, Version 9.2 [26]. The lag lengths for theestimated VAR models are chosen by using the Hannan–Quinn information criterion and the Schwarz’sBayesian criterion [27]. For VAR models in all estimation data sets, both the Hannan–Quinn informationcriterion and the Schwarz’s Bayesian criterion are best (most negative) for lag 1. The VAR modelestimates of the autoregressive matrices M1 for the estimation data sets are shown in Table A4.Each VAR-LiNGAM model is built on an estimated VAR model by applying the LiNGAMstructural learning algorithm to the VAR model’s estimated innovation processes. As evidence that theVAR-LiNGAM non-Gaussian assumption holds on every estimation data set, a Kolmogorov–Smirnovtest performed on each currency’s corresponding independent factor confirms that the null hypothesisof normality is rejected with p-value less than 0.01 for each factor. The VAR-LiNGAM model estimatesof the autoregressive matrices M1 correspond to those of the VAR model and are shown in Table A4.The VAR-LiNGAM model estimates of the causal effect matrices B0 are shown in Table A5.Independent component analysis is performed on the currency time series and the independentcomponents are modeled with univariate autoregressive processes. The separating matrices B foundby the AMUSE algorithm are shown in Table A6. Independent components are computed using theseparating matrices as described in Equation (20). A Kolmogorov–Smirnov test is performed on eachindependent component to verify the ICA model’s non-Gaussian assumption; the test’s null hypothesisof normality is rejected with p-value less than 0.01 for each independent component.The lag lengths for the estimated AR models are chosen by using Schwarz’s Bayesian criterion.For the AR models in all estimation data sets, Schwarz’s Bayesian criterion is best (most negative)for lag 1. Thus, the independent components are modeled with AR(1) processes whose parameterestimates are shown in Table A7.3. Results3.1. Forecast GenerationA multivariate normal distribution is used to model the one-step-ahead probability distribution ofthe VAR model innovation process. Latin hypercube samples from the multivariate normal distributionin conjunction with the VAR model parameter estimates and historical data are used to computeone-step-ahead probability distributions for the exchange rate returns.An estimate of the independent factor process of the VAR-LiNGAM model is obtained fromits estimated innovation process. Kernel density estimation with a normal probability window isused to estimate the probability distributions of the VAR-LiNGAM independent factor processes.Latin hypercube samples from the independent factor process distributions are transformed intoForecasting 2019, 1 13one-step-ahead distributions of the VAR-LiNGAM innovation processes. The innovation processdistribution samples plus the VAR-LiNGAM model parameter estimates and historical data are usedto compute one-step-ahead probability distributions for the exchange rate returns.Kernel density estimation with a normal probability window is used to estimate the probabilitydistribution of each AR innovation process. Latin hypercube samples from the innovation processdistributions plus the AR model estimates and historical data are used to compute one-step-aheadprobability distributions for the independent components. The forecasted probability distributions ofthe independent components are transformed into forecasted probability distributions of the exchangerate returns as described in Equation (23).Sample one-step-ahead cumulative predictive distributions in each of the forecast data sets forthe VAR-LiNGAM model are shown in Figure 1. These sample predictive cdfs are similar to thosegenerated by the VAR and AR models. Forecasting 2018, 1, x FOR PEER REVIEW 11 of 23(a) (b)(c) (d)Figure 1. Sample Cumulative Predictive Distributions. The plots show the sample one-step-aheadcumulative predictive distributions generated by the VAR-LiNGAM model in the before (a),surrounding (b), after (c) and long after (d) forecast data sets.3.2. Forecast EvaluationThe only forecasts considered here are those for the CHF/EUR exchange rate; the forecasts ofother currencies are not evaluated. For the computation of calibration functions, the fractile of eachoutcome is determined by comparing the outcome to the estimated cumulative predictivedistribution. These fractiles are used in conjunction with the estimated cumulative predictivedistributions to compute the calibration functions. The calibration functions are both plotted andused to compute goodness-of-fit test statistics.Calibration plots of the CHF/EUR for the before, surrounding, after and long after forecast datasets are in Figures 2–5. The calibration plots for the AR, VAR and VAR-LiNGAM models in aparticular forecast data set in addition to a 45-degree line for reference are shown in each figure.Underconfidence in probability assessments is indicated where the calibration function maps abovethe 45-degree line, while overconfidence in assessments is indicated where the calibration functionmaps below the 45-degree line.Figure 1. Sample Cumulative Predictive Distributions. The plots show the sample one-step-aheadcumulative predictive distributions generated by the VAR-LiNGAM model in the before (a),surrounding (b), after (c) and long after (d) forecast data sets.3.2. Forecast EvaluationThe only forecasts considered here are those for the CHF/EUR exchange rate; the forecasts ofother currencies are not evaluated. For the computation of calibration functions, the fractile of eachoutcome is determined by comparing the outcome to the estimated cumulative predictive distribution.These fractiles are used in conjunction with the estimated cumulative predictive distributions tocompute the calibration functions. The calibration functions are both plotted and used to computegoodness-of-fit test statistics.Calibration plots of the CHF/EUR for the before, surrounding, after and long after forecast datasets are in Figures 2 and 4–6. The calibration plots for the AR, VAR and VAR-LiNGAM models inForecasting 2019, 1 14a particular forecast data set in addition to a 45-degree line for reference are shown in each figure.Underconfidence in probability assessments is indicated where the calibration function maps abovethe 45-degree line, while overconfidence in assessments is indicated where the calibration functionmaps below the 45-degree line. Forecasting 2018, 1, x FOR PEER REVIEW 12 of 23(a)(b) (c)Figure 2. CHF/EUR Calibration Functions in the Before Forecast Data Set. The plots show calibrationfunctions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a), VAR (b)and VAR-LiNGAM (c) models in the before forecast data set (11 June 2011–10 August 2011). A modelis well calibrated if it maps onto the 45-degree reference line.(a)Figure 2. CHF/EUR Calibration Functions in the Before Forecast Data Set. The plots show calibrationfunctions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a), VAR (b) andVAR-LiNGAM (c) models in the before forecast data set (11 June 2011–10 August 2011). A model iswell calibrated if it maps onto the 45-degree reference line.Forecasting 2018, 1, x FOR PEER REVIEW 12 of 23(a)(b) (c)Figure 2. CHF/EUR Calibration Functions in the Before Forecast Data Set. The plots show calibrationfunctions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a), VAR (b)and VAR-LiNGAM (c) models in the before forecast data set (11 June 2011–10 August 2011). A modelis well calibrated if it maps onto the 45-degree reference line.(a)Figure 3. Cont.Forecasting Forecasting2019 2018 , 1, 1, x FOR PEER REVIEW 13 of 1523(b) (c)Figure 3. CHF/EUR Calibration Functions in the Surrounding Forecast Data Set. The plots showcalibration functions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a),VAR (b) and VAR-LiNGAM (c) models in the surrounding data set (11 August 2011–10 October 2011).A model is well calibrated if it maps onto the 45-degree reference line.(a)(b) (c)Figure 4. CHF/EUR Calibration Functions in the After Forecast Data Set. The plots show calibrationfunctions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a), VAR (b)and VAR-LiNGAM (c) models in the after data set (11 October 2011–10 December 2011). A model iswell calibrated if it maps onto the 45-degree reference line.Figure 4. CHF/EUR Calibration Functions in the Surrounding Forecast Data Set. The plots showcalibration functions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a),VAR (b) and VAR-LiNGAM (c) models in the surrounding data set (11 August 2011–10 October 2011).A model is well calibrated if it maps onto the 45-degree reference line.Forecasting 2018, 1, x FOR PEER REVIEW 13 of 23(b) (c)Figure 3. CHF/EUR Calibration Functions in the Surrounding Forecast Data Set. The plots showcalibration functions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a),VAR (b) and VAR-LiNGAM (c) models in the surrounding data set (11 August 2011–10 October 2011).A model is well calibrated if it maps onto the 45-degree reference line.(a)(b) (c)Figure 4. CHF/EUR Calibration Functions in the After Forecast Data Set. The plots show calibrationfunctions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a), VAR (b)and VAR-LiNGAM (c) models in the after data set (11 October 2011–10 December 2011). A model iswell calibrated if it maps onto the 45-degree reference line.Figure 5. CHF/EUR Calibration Functions in the After Forecast Data Set. The plots show calibrationfunctions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a), VAR (b) andVAR-LiNGAM (c) models in the after data set (11 October 2011–10 December 2011). A model is wellcalibrated if it maps onto the 45-degree reference line.Forecasting 2019, 1 16Forecasting 2018, 1, x FOR PEER REVIEW 14 of 23(a)(b) (c)Figure 5. CHF/EUR Calibration Functions in the Long after Forecast Data Set. The plots showcalibration functions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a),VAR (b) and VAR-LiNGAM (c) models in the long after data set (7 March 2012–6 May 2012). A modelis well calibrated if it maps onto the 45-degree reference line.For the before forecast data set, each model exhibits underconfidence on the lower end of thecalibration function and overconfidence on the upper end. For the surrounding forecast data set, theAR and VAR models exhibit overconfidence on the lower end of the calibration function andunderconfidence on the upper end; the extreme ends of both of these calibration functions show theopposite behavior. The calibration function for the VAR-LiNGAM model on the surrounding dataset displays the opposite behavior of the AR and VAR models with underconfidence on the lowerend and overconfidence on the upper end. For the after and long after data sets, the calibrationfunctions for all models exhibit a large degree of overconfidence on the lower end and a large degreeof underconfidence on the upper end.Overall, the calibration plots show that all models are better calibrated (i.e., map closer to the 45-degree line) in the before and surrounding data sets than in the after and long after data sets. Forecastsare less calibrated after the placement of the floor on the CHF/EUR exchange rate; it appears that theSwiss National Bank’s market intervention had a negative effect on the calibration of the time-seriesmodels in the longer run.Chi-squared goodness-of-fit tests are performed to test each time-series model for calibrationduring each forecast data set. The null hypothesis that the forecasts are well calibrated is rejected witha p-value near zero in every data set for every time-series model; no time-series model forecasts arewell calibrated in any of the time periods under consideration. Some of the calibration functionsappear to map closely to the 45-degree reference line, such as in Figure 2a,b. Nevertheless, none ofthe calibration functions shown in any of Figures 2–5 reflect forecasts that are well calibratedaccording to the goodness-of-fit test.In some of Figures 2–5, the calibration problems appear to be in the tails of the distributions,such as in Figure 2a,b. Generating forecasts with distributions estimated via kernel density estimationFigure 6. CHF/EUR Calibration Functions in the Long after Forecast Data Set. The plots showcalibration functions for the CHF/EUR exchange rate that are generated by forecasts from the AR (a),VAR (b) and VAR-LiNGAM (c) models in the long after data set (7 March 2012–6 May 2012). A modelis well calibrated if it maps onto the 45-degree reference line.For the before forecast data set, each model exhibits underconfidence on the lower end of thecalibration function and overconfidence on the upper end. For the surrounding forecast data set,the AR and VAR models exhibit overconfidence on the lower end of the calibration function andunderconfidence on the upper end; the extreme ends of both of these calibration functions show theopposite behavior. The calibration function for the VAR-LiNGAM model on the surrounding dataset displays the opposite behavior of the AR and VAR models with underconfidence on the lowerend and overconfidence on the upper end. For the after and long after data sets, the calibrationfunctions for all models exhibit a large degree of overconfidence on the lower end and a large degreeof underconfidence on the upper end.Overall, the calibration plots show that all models are better calibrated (i.e., map closer to the45-degree line) in the before and surrounding data sets than in the after and long after data sets.Forecasts are less calibrated after the placement of the floor on the CHF/EUR exchange rate; it appearsthat the Swiss National Bank’s market intervention had a negative effect on the calibration of thetime-series models in the longer run.Chi-squared goodness-of-fit tests are performed to test each time-series model for calibrationduring each forecast data set. The null hypothesis that the forecasts are well calibrated is rejectedwith a p-value near zero in every data set for every time-series model; no time-series model forecastsare well calibrated in any of the time periods under consideration. Some of the calibration functionsappear to map closely to the 45-degree reference line, such as in Figure 2a,b. Nevertheless, none ofthe calibration functions shown in any of Figures 2 and 4–6 reflect forecasts that are well calibratedaccording to the goodness-of-fit test.Forecasting 2019, 1 17In some of Figures 2 and 4–6, the calibration problems appear to be in the tails of the distributions,such as in Figure 2a,b. Generating forecasts with distributions estimated via kernel density estimationwith a normal probability window might be the source of this bad tail behavior. In the calibration plotsthat show bad tail behavior, the miscalibration of each tail is in the opposite direction; for example,in Figure 2b, the calibration function shows underconfidence on the low end and overconfidence onthe upper end. If the normal probability widow was to blame for this poor tail performance, it wouldlikely produce tails that were too heavy or too light at both ends of the distribution. For instance,if kernel density estimation with a normal probability window produced a distribution with tails thatwere too light to reflect the distribution of returns, then the corresponding calibration function wouldshow underconfidence at both ends of the plot. Additionally, since other figures show that the problemwith calibration is more in the central part of the distribution than in the tails, such as Figure 4a,b, it isunlikely that the normal probability window is the culprit for bad calibration.In addition to the calibration tests, the mean-squared error (MSE) and the probability score metricsare used to rank the probability forecasting systems. The mean-squared errors of each model’s forecastsare reported in Table 2, and the probability scores of each model’s forecasts are reported in Table 3.The VAR and VAR-LiNGAM models both have the same MSE on each data set because they are bothdriven by the innovations of the VAR model (see Equation (26)).Table 2. The table shows the mean-squared errors of the CHF/EUR forecasts from the AR, VAR andVAR-LiNGAM models on each forecast data set.Data Set AR VAR &VAR-LiNGAMbefore 1.614 × 10-6 1.610 × 10-6surrounding 2.982 × 10-6 2.983 × 10-6after 3.409 × 10-7 3.415 × 10-7long after 2.873 × 10-8 2.722 × 10-8Table 3. The table shows the probability scores of the CHF/EUR forecasts from the AR, VAR andVAR-LiNGAM models on each forecast data set.Data Set AR VAR VAR-LiNGAMbefore 0.99876 0.99860 0.99914surrounding 0.99820 0.99803 0.99877after 0.99715 0.99713 0.99776long after 0.99700 0.99697 0.99696The MSE results indicate that no model consistently outperforms the others. The VAR andVAR-LiNGAM models perform the best in the before and long after data sets, while the AR modelperforms the best in the surrounding and after data sets. This may indicate that all models haveroughly the same forecasting performance or that the VAR and VAR-LiNGAM models perform betterin periods isolated from structural change.In contrast, the probability score rankings show that the VAR model outperforms the other modelsin all but the long after data set in which the VAR-LiNGAM’s performance is slightly better. Becausethe simple VAR model outperforms the other models that are built using independent components,the probability score results indicate that there is no gain in forecasting performance when usingindependent components. Additionally, the probability score ranks the AR forecasts higher than theVAR-LiNGAM forecasts in all periods but the last; this may indicate that in some cases the multivariateVAR-LiNGAM model provides no advantage over the univariate AR model.The VAR and VAR-LiNGAM models generate better forecasts in the long after period accordingto the MSE and the probability score. This is some indication that the VAR-LiNGAM model performsbetter than the AR model after market intervention has been in effect for some period of time.Forecasting 2019, 1 183.3. Change in the Causal StructureThe results from the LiNGAM algorithm show that there is evidence that the causal relationshipsamong the exchange rates changed after the intervention by the Swiss National Bank. Table A5 reportsthe causal effect matrices for the different estimation data sets. These matrices show the causal effectsfrom currencies listed in the columns to the currencies listed in the rows. For example, the firstrow of Table A5a shows that the AUD exchange rate is positively affected by the CAD, EUR, GBPand USD and negatively affected by the JPY. The causal effects contained in these matrices can berepresented graphically by directed acyclic graphs. The causal structure of the currencies before theSNB intervention is shown in Figure 7a and the structure after the intervention is shown in Figure 7b.Forecasting 2018, 1, x FOR PEER REVIEW 16 of 233.3. Change in the Causal StructureThe results from the LiNGAM algorithm show that there is evidence that the causal relationshipsamong the exchange rates changed after the intervention by the Swiss National Bank. Table A5reports the causal effect matrices for the different estimation data sets. These matrices show the causaleffects from currencies listed in the columns to the currencies listed in the rows. For example, the firstrow of Table A5a shows that the AUD exchange rate is positively affected by the CAD, EUR, GBPand USD and negatively affected by the JPY. The causal effects contained in these matrices can berepresented graphically by directed acyclic graphs. The causal structure of the currencies before theSNB intervention is shown in Figure 6a and the structure after the intervention is shown in Figure6b.(a) (b)Figure 6. Causal effects represented as directed acyclic graphs in the before (a) and the long after (b)estimation data sets. These correspond to the before and long after

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA PhD. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

**NO PLAGIARISM**– CUSTOM PAPER