Movatterモバイル変換


[0]ホーム

URL:


Next Article in Journal
Information Structures for Causally Explainable Decisions
Next Article in Special Issue
Memory Effects in Quantum Dynamics Modelled by Quantum Renewal Processes
Previous Article in Journal
Adding Prior Information in FWI through Relative Entropy
Previous Article in Special Issue
Quantum Maps with Memory from Generalized Lindblad Equation
 
 
Search for Articles:
Title / Keyword
Author / Affiliation / Email
Journal
Article Type
 
 
Section
Special Issue
Volume
Issue
Number
Page
 
Logical OperatorOperator
Search Text
Search Type
 
add_circle_outline
remove_circle_outline
 
 
Journals
Entropy
Volume 23
Issue 5
10.3390/e23050600
Font Type:
ArialGeorgiaVerdana
Font Size:
AaAaAa
Line Spacing:
Column Width:
Background:
Article

Dynamical Analysis of the Dow Jones Index Using Dimensionality Reduction and Visualization

1
LAETA/INEGI, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal
2
Department of Electrical Engineering, Institute of Engineering, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, 431, 4249-015 Porto, Portugal
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy2021,23(5), 600;https://doi.org/10.3390/e23050600
Submission received: 7 April 2021 /Revised: 8 May 2021 /Accepted: 9 May 2021 /Published: 13 May 2021
(This article belongs to the Special IssueProcesses with Memory in Natural and Social Sciences)

Abstract

:
Time-series generated by complex systems (CS) are often characterized by phenomena such as chaoticity, fractality and memory effects, which pose difficulties in their analysis. The paper explores the dynamics of multidimensional data generated by a CS. The Dow Jones Industrial Average (DJIA) index is selected as a test-bed. The DJIA time-series is normalized and segmented into several time window vectors. These vectors are treated as objects that characterize the DJIA dynamical behavior. The objects are then compared by means of different distances to generate proper inputs to dimensionality reduction and information visualization algorithms. These computational techniques produce meaningful representations of the original dataset according to the (dis)similarities between the objects. The time is displayed as a parametric variable and the non-locality can be visualized by the corresponding evolution of points and the formation of clusters. The generated portraits reveal a complex nature, which is further analyzed in terms of the emerging patterns. The results show that the adoption of dimensionality reduction and visualization tools for processing complex data is a key modeling option with the current computational resources.

    1. Introduction

    Complex systems (CS) are composed of several autonomous entities, described by simple rules, that interact with each other and their environment. The CS give rise to a collective behavior that exhibits a much richer dynamical phenomena than the one presented by the individual elements. Often, CS exhibit evolution, adaptation, self-organization, emergence of new orders and structures, long-range correlations in the time–space domain, chaoticity, fractality, and memory effects [1,2,3,4]. The CS are not only pervasive in nature, but also in human-related activities, and include molecular dynamics, living organisms, ecosystems, celestial mechanics, financial markets, computational systems, transportation and social networks, and world and country economies, as well as many others [5,6,7,8].
    Time-series analysis has been successfully adopted to study CS [9,10]. The CS outputs are measured over time and the data collected are interpreted as manifestations of the CS dynamics. Therefore, the study of the time-series allows for conclusions about the CS behavior to be reached [11,12]. Nonetheless, real-word time-series may be affected by noise, distortion and incompleteness, requiring advanced processing methods for the extraction of significant information from the data [13]. Information visualization plays a key role in time-series analysis, as it provides an insight into the data characteristics. Information visualization corresponds to the computer generation of dataset visual representations. Its main goal is to expose features hidden in the data, in order to understand the system that generated such data [14,15]. Dimensionality reduction [16] plays a key role in information visualization, since the numerical data are often multidimensional. Dimensionality reduction-based schemes try to preserve, in lower-dimensional representations, the information present in the original datasets. They include linear methods, such as classic multidimensional scaling (MDS) [17], principal component [18], canonical correlation [19], linear discriminant [20] and factor analysis [21], as well as nonlinear approaches, such as non-classic MDS, or Sammon’s projection [22], isomap [23], Laplacian eigenmap [24], diffusion map [25], t-distributed stochastic neighbor embedding (t-SNE) [26] and uniform manifold approximation and projection (UMAP) [27].
    Financial time series have a complex nature and their dynamic characterization is challenging. The Dow Jones Industrial Average (DJIA) is an important financial index and is adopted in this paper as a dataset generated by a CS. The paper explores an alternative strategy to the classical time-domain analysis, by combining the concepts of distance and dimensionality reduction with computational visualization tools. The DJIA time-series of daily close values is normalized and segmented, yielding a number of objects that characterize the DJIA dynamics. These objects are vectors, whose time-length and partial time-overlap represent a compromise between time resolution and memory length. The objects are compared using various distances and their dissimilarities are used as the input to different dimensionality reduction and information visualization algorithms, namely hierarchical clustering (HC), MDS, t-SNE and UMAP. The aforementioned algorithms construct representations of the original dataset, where time is a parametric variable. The  structure of the plots is further analyzed in terms of the emerging patterns. The formation of clusters and the evolution of the patterns over time maps a dynamical behavior with discontinuities for periods where the memory is somehow lost. Numerical experiments illustrate the feasibility and effectiveness of the method for the processing of complex data.
    The paper organization is summarized as follows.Section 2 reviews mathematical fundamental concepts, namely the distances and the algorithms adopted in the study for processing and visualizing data.Section 3 introduces the DJIA dataset.Section 4 analyses the data and interprets the results in the light of the distances used.Section 5 assesses the effect of the time-length and overlap of the segmenting window.Section 6 presents the conclusions.

    2. Mathematical Concepts and Tools

    2.1. Distances

    Given two pointsvi andvj in a setX, the functiond(vi,vj):X×X[0,+] represents a distance between the points if, and only if, it satisfies the conditions: identity of indiscernibles, symmetry and triangle inequality [28].
    In this paper, the distances {Arccosine, Canberra, Dice, Divergence, Euclidean, Jaccard, Lorentzian, Manhattan, Sørenson, Generalized}={d1,,d10} are considered. Therefore, givenvi=(vi1,,viP) andvj=(vj1,,vjP) in aP-dimensional space,P, the 10 distances are given by [28]:
    Arccosine:d1(vi,vj)=arccosk=1Pvik·vjkk=1Pvik2k=1Pvjk2,
    Canberra:d2(vi,vj)=k=1P|vikvjk||vik|+|vjk|,
    Dice:d3(vi,vj)=k=1P(vikvjk)2k=1Pvik2+k=1Pvjk2,
    Divergence:d4(vi,vj)=2k=1P(vikvjk)2(vik+vjk)2,
    Euclidean:d5(vi,vj)=k=1P(vikvjk)2,
    Jaccard:d6(vi,vj)=k=1P(vikvjk)2k=1Pvik2+k=1Pvjk2k=1Pvikvjk,
    Lorentzian:d7(vi,vj)=k=1Pln1+|vikvjk|,
    Manhattan:d8(vi,vj)=k=1P|vikvjk|,
    Sørenson:d9(vi,vj)=k=1P|vikvjk|k=1P|vik|+|vjk|,
    Generalized:d10(vi,vj)=r=19λrdr(vi,vj)max[dr(vi,vj)],
    whereλrR,i=19λr=1.
    The distances (1)–(9) have advantages and disadvantages, meaning that they unravel specific features embedded in the data, while neglecting others. Therefore, the ‘generalized’ distanced10 may eventually capture a multi-perspective information by combining (1)–(9) in a complementary form.
    Other techniques [29] and distances [28] can also be adopted to compare the data. However, a more extensive overview and utilization of a larger number of alternatives is out of the scope of the paper.

    2.2. Dimensionality Reduction and Visualization

    In the next subsections, the dimensionality reduction and visualization techniques that are adopted for data processing are presented, namely the HC, MDS, t-SNE and UMAP.
    Given a set ofN objects,vi,i=1,,N, in spaceP, all methods require the definition of a distanced(vi,vj),i,j=1,,N, between the objectsi andj.

    2.2.1. The Hierarchical Clustering

    The HC groups similar objects and represents them in a 2-dim locus. The algorithm involves two main steps [30]. In the first, the HC constructs a matrix of distances,D=[d(vi,vj)], of dimensionN×N, whered(vi,vj)=d(vj,vi). In the second step, the algorithm arranges the objects in a hierarchical structure and depicts them in a graphical portrait, namely, a hierarchical tree or a dendrogram. This is achieved by means of two alternative techniques: the divisive and the agglomerative procedures. In the divisive scheme, all objects start in one single cluster and end in separate clusters. This is done by iteratively removing the ‘outsiders’ from the least cohesive cluster. In the agglomerative scheme, each object starts in its own cluster and all end in one single cluster. This is accomplished by successive iterations that join the most similar clusters. The HC requires the specification of a linkage criterion for measuring the dissimilarity between clusters. Often, the average-linkage,davR,S, is adopted [31], whereR andS represent two clusters. Therefore, denotingdvR,vS the distance between a pair of objectsvRR andvSS, in the clustersR andS, respectively, we have:
    davR,S=1RSvRR,vSSdvR,vS.
    The reliability of the clustering can be assessed by the cophenetic coefficientcc [32]
    cc=i<jd(vi,vj)avd(vi,vj)d^(ti,tj)avd^(ti,tj)i<jd(vi,vj)avd(vi,vj)2i<jd^(ti,tj)avd^(ti,tj)2,
    where{vi,vj} and{ti,tj} stand for the original objects and their HC representations, respectively,av(·) denotes the average of the input argument, and d^(ti,tj) represents the cophenetic distance betweenti andtj. We always obtain0cc1, with the limits corresponding to bad and good clustering, respectively. Additionally, the original and the cophenetic distances can be represented in a scatter plot denoted by Shepard diagram. A good clustering corresponds to points located close to a 45 line.

    2.2.2. The Multidimensional Scaling

    The MDS includes a class of methods that represent high-dimensional data in a lower dimensional space, while preserving the inter-point distances as much as possible. The matrixD=[d(vi,vj)] feeds the MDS dimensionality reduction and visualization algorithm. The MDS tries to find the positions ofQ-dimensional objects,ti, withi=1,,N, represented by points in spaceQ, so thatQP, while producing a matrixT=[d^(ti,tj)] that approximatesD. This is accomplished by means of an optimization procedure that tries to minimize a fitness function. Usually, the stress cost function,S, is adopted
    S=i<jd(vi,vj)d^(ti,tj)212.
    The Sammon criterion is an alternative, yielding
    S=i<jd(vi,vj)d^(ti,tj)2i<jd(vi,vj)212.
    The ‘quality’ of the MDS is assessed by comparing the original and the reproduced information. This can be accomplished by means of the Shepard diagram, which depictsd(vi,vj) versusd^(ti,tj). Additionally, since the stressS decreases monotonically with the dimensionQ, the user can establish a compromise between the two variables. Often, the practical choice isQ=2 orQ=3, since those values yield a direct graphical representation in the embedding space. Nevertheless, if the MDS locus is unclear, then the user must adopt another measured(vi,vj) until a suitable representation is obtained.

    2.2.3. The t-Distributed Stochastic Neighbor Embedding

    The t-SNE [26] is a technique for visualizing high-dimensional datasets, with  applications including computer security [33], music analysis [34], bioinformatics [35] and other areas [36,37].
    The algorithm comprises two main stages. In the first, for each pair of objects (vi,vj),i,j=1,,N, the t-SNE constructs a joint probability distributionpij measuring the similarity betweenvi andvj, in such a way that similar (dissimilar) objects are assigned a higher (lower) probability
    pij=pj|i+pi|j2N,
    pj|i=exp[d(vi,vj)2/(2σi2)]kiexp[d(vi,vk)2/(2σi2)],ji0,j=i,
    wherepij=pji,pii=0,i,jpij=1 andjpj|i=1,i,j. The parameterσi2 represents the variance of the Gaussian kernel that is centered onvi. A particular value ofσi induces a probability distributionPi, over all of the other datapoints. In other words,Pi represents the conditional probability distribution over all other datapoints given the datapointvi. The t-SNE searches for the value ofσi that generates a distributionPi with the value of perplexity specified by
    perplexity(Pi)=2H(Pi),
    whereH(Pi) is the Shannon entropy ofPi
    H(Pi)=jpj|ilog2(pj|i).
    As a result, the variation in the Gaussian kernel is adapted to the density of the data, meaning that smaller (larger) values ofσi are used in denser (sparser) parts of the data space. The perplexity can be interpreted as a smooth measure of the effective number ofvi neighbors. Typical values ofperplexity(Pi) are in the interval[5,50].
    In the second stage, the t-SNE calculates the similarities between pairs of points inQ
    qij=qj|i+qi|j2N,
    qij=(1+||titj||2)1kl(1+||tktl||2)1,ji0,j=i,
    where the symbol||·|| denotes the 2-norm of the argument,qij=qji,qii=0,i,jqij=1 andjqj|i=1,i,j.
    The t-SNE performs an optimization, while attempting to minimize the Kullback–Leibler (KL) divergence between the Gaussian distribution of the points in spaceP and the Studentt-distribution of the points in the embedding spaceQ:
    KL=ijpijlnpijqij.
    The minimization scheme starts with a given initial set of points inQ, and the algorithm uses the gradient descent
    KLti=4j(pijqij)(titj)(1+||titj||2)1.
    TheKL divergence between the modeled input and output distributions is often used as a measure of the quality of the results.

    2.2.4. The Uniform Manifold Approximation and Projection

    The UMAP is a recent technique [27] for clustering and visualizing high-dimensional datasets, which seeks to accurately represent both the local and global structures embedded in the data [38,39].
    Given a distance,d(vi,vj), between pairs of objectsvi andvj,i,j=1,,N, and the number of neighbors to consider,k, the UMAP starts by computing thek-nearest neighbors ofvi,Ni, with respect tod(vi,vj). Then, the algorithm calculates the parametersρi andσi, for each datapointvi. The parameterρi represents a nonzero distance fromvi to its nearest neighbor and is given by
    ρi=minjNi{d(vi,vj)|d(vi,vj)>0}.
    The parameterρi is important to ensure the local connectivity of the manifold. This means that it yields a locally adaptive exponential kernel for each point.
    The constantσi must satisfy the condition
    log2k=jNiexpmax(0,d(vi,vj)ρi)σi,
    determined using binary search.
    The algorithm constructs a joint probability distributionpij measuring the similarity betweenvi andvj, in such a way that similar (dissimilar) objects are assigned a higher (lower) probability
    pij=pj|i+pi|jpj|ipi|j,
    pj|i=expmax(0,d(vi,vj)ρi)σi,ji0,j=i,
    wherepij=pji,pii=0,i,jpij=1 andjpj|i=1,i,j.
    In the second stage, the UMAP computes the similarities between each pair of points in the spaceQ
    qij=qj|i+qi|jqj|iqi|j,
    qij=1+a||titj||2b1,ji0,j=i,
    whereqij=qji,qii=0,i,jqij=1 andjqj|i=1,i,j. The constantsa,bR are either user-defined or are determined by the algorithm given the desired separation between close points,δR+, in the embedding spaceQ
    1+a||titj||2b11,titjδexp[(titj)δ],titj>δ.
    The UMAP performs an optimization while attempting to minimize the cross-entropyCE between the distribution of points inP andQ
    CE=ijpijlnpijqij(1pij)ln1pij1qij.
    The minimization scheme starts with a given initial set of points inQ. The UMAP uses the Graph Laplacian to assign initial low-dimensional coordinates, and then proceeds with the optimization using the gradient descent
    CEti=j2ab[d(ti,tj)]2(b1)1+a[d(ti,tj)]2bpij2b[d(ti,tj)]2(1+a[d(ti,tj)]2b)(1pij)(titj).

    3. Description of the Dataset

    The prototype dataset representative of a given CS corresponds to the DJIA daily closing values from 28 December 1959 up to 12 March 2021. Each week includes 5 working days. Occasional missing data are obtained by means of linear interpolation. The resulting time seriesx={xk:k=1,,L} comprisesL=15970 values,xk, covering approximately half a century.
    Often, we pre-processx in order to reduce the sensitivity to a high variation in the numerical values, yieldingx˜={Φq(xk):k=1,,L}. FunctionsΦq(·), which are commonly adopted, are the logarithm of the values, the logarithm of the returns and the normalization by the arithmetic mean,av(x), and the standard deviation,σ(x), given by
    Φ1(xk)=lnxk,
    Φ2(xk)=lnxk+1xk,k=1,,L10,k=L,
    Φ3(xk)=xkav(x)σ(x).
    Figure 1 depicts the evolution ofx, as well as the logarithm of the returnsx˜={Φ2(xk):k=1,,L}, which reveals a fractal nature. We verify the existence of 13 main periods denoted fromA toM. For k[1,640], corresponding to the periodsA andB, the values ofxk are small, starting with a decrease, followed by a recovering trend. This behavior is followed by a sustainable increase in the DJIA duringk[640,1555], periodC. The intervalk[1555,5890] corresponds to the periodsD,E andF, which are characterized by an overall stagnation in the between of severe crises. For k[5890,7237], that is, periodG, we have an important rising trend, interrupted abruptly, but rapidly recovered, marking the beginning of periodH fork[7237, 10,340]. For k[10,340, 11,240], corresponding to periodI, the DJIA reveals a decreasing trend. This behavior is followed by the periodJ, during the intervalk[11,240, 12,500], characterized by a sustained increase in the DJIA values. For k[12,500, 12,840], the periodK reveals a strong falling trend. Then, recovery initiates and a rising trend is verified during the periodL, that is, for k[12,840, 15,690]. This period is interrupted suddenly, but rapidly, recovered, signaling the beginning ofM, corresponding tok[15,690, 15,970].Table 1 summarizes the DJIA main periods and some historical events occurred during 28 December 1959 up to 12 March 2021.
    To assess the dynamics of the DJIA, the time-seriesx˜ is segmented intoN=1+LW(1α)W, whereW is the window length,α[0,1] stands for the window overlapping factor, and · denotes the floor function. Therefore, the ith,i=1,,N, window consists of the vectorvi={Φq(xp):p=(i1)(1α)W+1,,(i1)(1α)W+W}.
    Figure 2 portraits the histogram ofx˜={Φ2(xk):k=1,,L} for consecutive disjoint windows (α=0) andW=60. We verify the existence of fat tails in the statistical distribution, as well as a ‘noisy’ behavior, which are also verified for other functionsΦq and values ofα andW.

    4. Analysis and Visualization of the DJIA

    The DJIA time-seriesx is normalized using expression (34), yieldingx˜={Φ3(xk):k=1,,L}. Naturally, other types of pre-processing are possible, but the linear transform (34) is common in signal processing [40] and several experiments showed that it yields good results.
    In the next subsections,x˜ is segmented using consecutive disjoint (α=0) time windows of lengthW=60 days, which yieldN=266 objects,vi, with i=1,,N. These objects are processed by the dimensionality reduction and visualization methods, while adopting different distances (1)–(9) to quantify the dissimilarities between objects. For the generalized distanced10, given by expression (10), since no a priori preference for a given formula is set, we adopt identical weights, that is,λr=19,r=1,,9. The values ofα andW were chosen experimentally. Obviously, other values could have been adopted, but those used lead to a good compromise between time resolution and suitable visualization.

    4.1. The HC Analysis and Visualization of the DJIA

    The neighbor-joining method [41] and the successive (agglomerative) clustering using average-linkage are adopted, as implemented by the softwarePhylip [42] with the optionneighbor.Figure 3 depicts the HC trees with the distancesd2,d3,d5 andd10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time. We verify that the HC has difficulty in separating the periodsA-F and, for distanced5, this difficulty is also observed for the periodsH-J. For other distances, we obtain loci of the same type.
    The HC loci reflect the relationships between objects, but the interpretation of such loci is difficult due to the presence of many objects and because we are constrained to 2-dim visual representations. The reliability of the clustering, that is, how well the hierarchical trees reproduce the original dissimilarities of the original objects in the dataset, was verified. Nevertheless, we do not include the Shepard diagrams for the sake of parsimony.

    4.2. The MDS Analysis and Visualization of the DJIA

    We now visualize the DJIA behavior using the MDS. The Matlab functionmdscale with the Sammon nonlinear mapping criterion is adopted.Figure 4 depicts the 3-dim loci obtained forα=0 andW=60 (N=266) with the distancesd2,d3,d5 andd10.
    The reliability of the 3-dim loci was verified through the standard Shepard and stress plots, which showed that the objects in the embedding spaceQ reproduce those in the original spaceP. Those diagrams are not depicted for the sake of parsimony. We verify that the MDS unravels patterns compatible with the DJIA 13 periodsA-M. However, the algorithm cannot discriminate between them. The patterns are composed by two ‘segments’ formed by objects that reveal an almost continuous and smooth evolution in time. Each segment translates into a DJIA dynamics exhibiting strong memory effects that are captured by the visualization technique with the adopted distance. The transition between segments corresponds to some discontinuity where the memory of past values is somehow lost.
    For other distances, we obtain loci of several types. However, it should be noted that often the definition of an adequate distance (in the sense of assessing the dynamical effects) necessitates some numerical trials. Different distances can lead to valid visual representations, but may be unable to capture the features of interest. For example, the correlation distance,d11, given by
    correlation:d12(vi,vj)=1k=1P[vikav(vi)][vjkav(vj)]k=1P[vikav(vi)]2k=1P[vjkav(vj)]212,
    leads to the loci shown inFigure 5, revealing that neither the HC nor the MDS can capture the memory effects embedded in the dataset.

    4.3. The t-SNE Analysis and Visualization of the DJIA

    TheMatlab functiontsne was adopted to visualize the datasetx˜={Φ3(xk):k=1,,L}. The algorithm was set toexact and the value 5 was given to theExaggeration and thePerplexity. These values were adjusted by trial in order to obtain good visualization. The Exaggeration corresponds to the size of natural clusters in data. A large exaggeration creates relatively more space between clusters in the embedding spaceQ. The Perplexity is related to the number of local neighbors of each point. All other parameters kept their default values.Figure 6 depicts the 3-dim loci obtained for the distancesd2,d3,d5 andd10. The loci reveal that the t-SNE can arrange objects according to their the periodsA-M and that the plots generated with the different distances are similar.

    4.4. The UMAP Analysis and Visualization of the DJIA

    For implementing the UMAP dimensionality reduction and visualization, we adopted theMatlab UMAP code, version 2.1.3, developed by Stephen Meehan et al. [43]. The functionrun_umap was used with parametersn_neighbors andmin_dist set to 5 and 0.2, respectively, adjusted by trial and error in order to obtain good visualization. These parameters correspond directly tok andδ introduced inSection 2.2.4. All other parameters are set to their default values.Figure 7 depicts the 3-dim loci obtained for the distancesd2,d3,d5 andd10.
    The UMAP can organize objects inQ according to their characteristics, identifying well the periodsA-M, independently of the adopted distance. Therefore, we conclude that both the t-SNE and the UMAP perform better than the MDS in representing the DJIA dynamics. The visualization has only slight variations with the distance adopted to compare objects.

    5. Assessing the Effect ofW andα in the Visualization of the DJIA Dynamics

    The window width and overlap,W andα, represent a compromise between time resolution and memory length. In this section, we study the effect of these parameters on the patterns generated by the HC, MDS, t-SNE and UMAP. The analysis was performed for all distances and several combinations ofW andα. The results are presented for the Canberra distance,d2, and the cases summarized inTable 2, whereW={90,60,30,10} andα={0,0.2,0.5}. For other distances, we obtain similar conclusions.
    Figure 8,Figure 9,Figure 10 andFigure 11 depict the loci generated. Regarding the HC, we verify that the loci are quite insensitive to the parameterW, with the exception ofW=10. For this value of window length, the HC can discriminate objects in the periodsA-F, despite the fact that capability depends on the overlapα. ForW=10 andα=0.5, the objects inA-F spread out in space, but their clusters are still unclear. Concerning the MDS, besides the density of objects, which, naturally, varies withN, the 3-dim loci are almost invariant with respect to the parametersW andα. The t-SNE and UMAP reveal a superior ability to generate patterns that correspond to dissimilarities between objects and, therefore, are able to identify the 13 periodsA-M. However, for the t-SNE, this ability is weakened as the number of objects increases,N, meaning small values ofW and high values ofα. For such cases, the generated loci are difficult to interpret. The UMAP reveals the 13 periodsA-M for all combinations ofW andα. Moreover, for small values ofW several sub-periods are unraveled, which directly relate to the time evolution of the DJIA.

    6. Conclusions

    This paper explored a strategy representing an alternative to the classical time analysis in the study multidimensional data generated by CS. The DJIA index of daily closing values from 28 December 1959 up to 12 March 2021 was adopted for the numerical experiments. In the proposed scheme, the original time-series was normalized and segmented, yielding a number of objects. These objects are vectors, whose dimension and overlap represent a compromise between time resolution and memory length. The objects were compared using various distances and their dissimilarities are used as the input to the four dimensionality reduction and information visualization algorithms, namely, HC, MDS, t-SNE and UMAP. These algorithms construct representations of the original dataset, where time is a parametric variable, with no a priori requirements. The algorithms are based on the minimization of the difference between the original and approximated data. The plots were analyzed in terms of the emerging patterns. Those graphical representations are composed of a number of ‘segments’, formed by objects with an almost continuous evolution in time, interlaid, eventually, by some discontinuities. This translates into the DJIA dynamics that depicts phases with visible correlation. Consequently, memory effects and transitions corresponding to some discontinuities where the memory of past values is not present. Numerical experiments illustrated the feasibility and effectiveness of the method for processing complex data. The approach can be easily extended to deal with more features and richer descriptions of the data involving a higher number of dimensions.

    Author Contributions

    A.M.L. and J.A.T.M. conceived, designed and performed the experiments, analyzed the data and wrote the paper. All authors have read and agreed to the published version of the manuscript.

    Funding

    This research received no external funding.

    Institutional Review Board Statement

    Not applicable.

    Informed Consent Statement

    Not applicable.

    Data Availability Statement

    Not applicable.

    Conflicts of Interest

    The authors declare no conflict of interest.

    References

    1. Pinto, C.; Mendes Lopes, A.; Machado, J. A review of power laws in real life phenomena.Commun. Nonlinear Sci. Numer. Simul.2012,17, 3558–3578. [Google Scholar] [CrossRef] [Green Version]
    2. Tarasova, V.V.; Tarasov, V.E. Concept of dynamic memory in economics.Commun. Nonlinear Sci. Numer. Simul.2018,55, 127–145. [Google Scholar] [CrossRef] [Green Version]
    3. Tarasov, V.E. Fractional econophysics: Market price dynamics with memory effects.Phys. A Stat. Mech. Its Appl.2020,557, 124865. [Google Scholar] [CrossRef]
    4. Tarasov, V.E.; Tarasova, V.V.Economic Dynamics with Memory: Fractional Calculus Approach; Walter de Gruyter GmbH & Co KG: Berlin, Germany; Boston, MA, USA, 2021; Volume 8. [Google Scholar]
    5. Lopes, A.M.; Tenreiro Machado, J.; Huffstot, J.S.; Mata, M.E. Dynamical analysis of the global business-cycle synchronization.PLoS ONE2018,13, e0191491. [Google Scholar] [CrossRef] [PubMed] [Green Version]
    6. Lopes, A.M.; Tenreiro Machado, J.; Galhano, A.M. Multidimensional scaling visualization using parametric entropy.Int. J. Bifurc. Chaos2015,25, 1540017. [Google Scholar] [CrossRef] [Green Version]
    7. Meyers, R.A.Complex Systems in Finance and Econometrics; Springer Science & Business Media: New York, NY, USA, 2010. [Google Scholar]
    8. Xia, P.; Lopes, A.M.; Restivo, M.T. A review of virtual reality and haptics for product assembly: From rigid parts to soft cables.Assem. Autom.2013. [Google Scholar] [CrossRef]
    9. Li, J.; Shang, P.; Zhang, X. Financial time series analysis based on fractional and multiscale permutation entropy.Commun. Nonlinear Sci. Numer. Simul.2019,78, 104880. [Google Scholar] [CrossRef]
    10. Machado, J.T.; Lopes, A.M. Fractional state space analysis of temperature time series.Fract. Calc. Appl. Anal.2015,18, 1518. [Google Scholar] [CrossRef] [Green Version]
    11. Lopes, A.M.; Machado, J.T. Dynamical analysis and visualization of tornadoes time series.PLoS ONE2015,10, e0120260. [Google Scholar] [CrossRef] [Green Version]
    12. Lopes, A.M.; Tenreiro Machado, J. Power law behavior and self-similarity in modern industrial accidents.Int. J. Bifurc. Chaos2015,25, 1550004. [Google Scholar] [CrossRef] [Green Version]
    13. Nigmatullin, R.R.; Lino, P.; Maione, G.New Digital Signal Processing Methods: Applications to Measurement and Diagnostics; Springer Nature: Cham, Switzerland, 2020. [Google Scholar]
    14. Ware, C.Information Visualization: Perception for Design; Elsevier: Waltham, MA, USA, 2012. [Google Scholar]
    15. Spence, R.Information Visualization: An Introduction; Springer: Cham, Switzerland, 2001; Volume 1. [Google Scholar]
    16. Van Der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: A comparative.J. Mach. Learn. Res.2009,10, 66–71. [Google Scholar]
    17. Tenreiro Machado, J.; Lopes, A.M.; Galhano, A.M. Multidimensional scaling visualization using parametric similarity indices.Entropy2015,17, 1775–1794. [Google Scholar] [CrossRef] [Green Version]
    18. Dunteman, G.H.Principal Components Analysis; Number 69; Sage: Newbury Park, CA, USA, 1989. [Google Scholar]
    19. Thompson, B. Canonical correlation analysis. InEncyclopedia of Statistics in Behavioral Science; John Wiley & Sons: Chichester, UK, 2005. [Google Scholar]
    20. Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E. Linear discriminant analysis: A detailed tutorial.AI Commun.2017,30, 169–190. [Google Scholar] [CrossRef] [Green Version]
    21. Child, D.The Essentials of Factor Analysis; Cassell Educational: London, UK, 1990. [Google Scholar]
    22. France, S.L.; Carroll, J.D. Two-way multidimensional scaling: A review.IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)2010,41, 644–661. [Google Scholar] [CrossRef]
    23. Lee, J.A.; Lendasse, A.; Verleysen, M. Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis.Neurocomputing2004,57, 49–76. [Google Scholar] [CrossRef]
    24. Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Comput.2003,15, 1373–1396. [Google Scholar] [CrossRef] [Green Version]
    25. Coifman, R.R.; Lafon, S. Diffusion maps.Appl. Comput. Harmon. Anal.2006,21, 5–30. [Google Scholar] [CrossRef] [Green Version]
    26. Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE.J. Mach. Learn. Res.2008,9, 2579–2605. [Google Scholar]
    27. McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.arXiv2018, arXiv:1802.03426. [Google Scholar]
    28. Deza, M.M.; Deza, E.Encyclopedia of Distances; Springer: New York, NY, USA, 2009. [Google Scholar]
    29. Nigmatullin, R.R. Discrete Geometrical Invariants in 3D Space: How Three Random Sequences Can Be Compared in Terms of “Universal” Statistical Parameters.Front. Phys.2020,8, 76. [Google Scholar] [CrossRef]
    30. Hartigan, J.A.Clustering Algorithms; John Wiley & Sons: New York, NY, USA, 1975. [Google Scholar]
    31. Aggarwal, C.C.; Hinneburg, A.; Keim, D.A.On the Surprising Behavior of Distance Metrics in High Dimensional Space; Springer: Berlin, Germany, 2001. [Google Scholar]
    32. Sokal, R.R.; Rohlf, F.J. The comparison of dendrograms by objective methods.Taxon1962,11, 33–40. [Google Scholar] [CrossRef]
    33. Hamid, Y.; Sugumaran, M. A t-SNE based non linear dimension reduction for network intrusion detection.Int. J. Inf. Technol.2020,12, 125–134. [Google Scholar] [CrossRef]
    34. Rao, A.; Aditya, A.; Adarsh, B.; Tripathi, S. Supervised Feature Learning for Music Recommendation. InCommunications in Computer and Information Science, Proceedings of the International Symposium on Signal Processing and Intelligent Recognition Systems, Chennai, India, 14–17 October 2020; Springer: Singapore, 2020; pp. 122–130. [Google Scholar]
    35. Li, W.; Cerise, J.E.; Yang, Y.; Han, H. Application of t-SNE to human genetic data.J. Bioinform. Comput. Biol.2017,15, 1750017. [Google Scholar] [CrossRef]
    36. Kobak, D.; Berens, P. The art of using t-SNE for single-cell transcriptomics.Nat. Commun.2019,10, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
    37. Cieslak, M.C.; Castelfranco, A.M.; Roncalli, V.; Lenz, P.H.; Hartline, D.K. t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis.Mar. Genom.2020,51, 100723. [Google Scholar] [CrossRef] [PubMed]
    38. Becht, E.; McInnes, L.; Healy, J.; Dutertre, C.A.; Kwok, I.W.; Ng, L.G.; Ginhoux, F.; Newell, E.W. Dimensionality reduction for visualizing single-cell data using UMAP.Nat. Biotechnol.2019,37, 38–44. [Google Scholar] [CrossRef] [PubMed]
    39. Dorrity, M.W.; Saunders, L.M.; Queitsch, C.; Fields, S.; Trapnell, C. Dimensionality reduction by UMAP to visualize physical and genetic interactions.Nat. Commun.2020,11, 1–6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
    40. Papoulis, A.Signal Analysis; McGraw-Hill: New York, NY, USA, 1977. [Google Scholar]
    41. Saitou, N.; Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic trees.Mol. Biol. Evol.1987,4, 406–425. [Google Scholar]
    42. Felsenstein, J.PHYLIP (Phylogeny Inference Package), Version 3.5 c; University of Washington: Seattle, WA, USA, 1993. [Google Scholar]
    43. Meehan, C.; Ebrahimian, J.; Moore, W.; Meehan, S. Uniform Manifold Approximation and Projection (UMAP). 2021. Available online:https://www.mathworks.com/matlabcentral/fileexchange/71902 (accessed on 12 February 2021).
    Entropy 23 00600 g001 550
    Figure 1. The evolution of the time-seriesx andx˜={Φ2(xk):k=1,,L}, in the period from 28 December 1959 up to 12 March 2021.
    Figure 1. The evolution of the time-seriesx andx˜={Φ2(xk):k=1,,L}, in the period from 28 December 1959 up to 12 March 2021.
    Entropy 23 00600 g001
    Entropy 23 00600 g002 550
    Figure 2. The histogram ofx˜={Φ2(xk):k=1,,L} for consecutive disjoint time windows (α=0) andW=60.
    Figure 2. The histogram ofx˜={Φ2(xk):k=1,,L} for consecutive disjoint time windows (α=0) andW=60.
    Entropy 23 00600 g002
    Entropy 23 00600 g003 550
    Figure 3. The hierarchical trees obtained by the HC forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Figure 3. The hierarchical trees obtained by the HC forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Entropy 23 00600 g003
    Entropy 23 00600 g004 550
    Figure 4. The 3-dim loci obtained by the MDS forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Figure 4. The 3-dim loci obtained by the MDS forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Entropy 23 00600 g004
    Entropy 23 00600 g005 550
    Figure 5. The loci obtained forα=0 andW=60 (N=266) with the correlation distanced11: (a) hierarchical tree; (b) MDS locus.
    Figure 5. The loci obtained forα=0 andW=60 (N=266) with the correlation distanced11: (a) hierarchical tree; (b) MDS locus.
    Entropy 23 00600 g005
    Entropy 23 00600 g006 550
    Figure 6. The 3-dim maps obtained by the t-SNE forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Figure 6. The 3-dim maps obtained by the t-SNE forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Entropy 23 00600 g006
    Entropy 23 00600 g007 550
    Figure 7. The 3-dim loci obtained by the UMAP forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Figure 7. The 3-dim loci obtained by the UMAP forα=0 andW=60 (N=266) with four distances: (a)d2; (b)d3; (c)d5; (d)d10. The circular marks correspond to objects (window vectors) and the colormap represents the arrow of time.
    Entropy 23 00600 g007
    Entropy 23 00600 g008 550
    Figure 8. The 3-dim loci obtained ford2 andW=90: (a) HC andα=0 (E1); (b) HC andα=0.2 (E2); (c) HC andα=0.5 (E3); (d) MDS andα=0 (E1); (e) MDS andα=0.2 (E2); (f) MDS andα=0.5 (E3); (g) t-SNE andα=0 (E1); (h) t-SNE andα=0.2 (E2); (i) t-SNE andα=0.5 (E3); (j) UMAP andα=0 (E1); (k) UMAP andα=0.2 (E2); (l) UMAP andα=0.5 (E3).
    Figure 8. The 3-dim loci obtained ford2 andW=90: (a) HC andα=0 (E1); (b) HC andα=0.2 (E2); (c) HC andα=0.5 (E3); (d) MDS andα=0 (E1); (e) MDS andα=0.2 (E2); (f) MDS andα=0.5 (E3); (g) t-SNE andα=0 (E1); (h) t-SNE andα=0.2 (E2); (i) t-SNE andα=0.5 (E3); (j) UMAP andα=0 (E1); (k) UMAP andα=0.2 (E2); (l) UMAP andα=0.5 (E3).
    Entropy 23 00600 g008
    Entropy 23 00600 g009 550
    Figure 9. The 3-dim loci obtained ford2 andW=60: (a) HC andα=0 (E4); (b) HC andα=0.2 (E5); (c) HC andα=0.5 (E6); (d) MDS andα=0 (E4); (e) MDS andα=0.2 (E5); (f) MDS andα=0.5 (E6); (g) t-SNE andα=0 (E4); (h) t-SNE andα=0.2 (E5); (i) t-SNE andα=0.5 (E6); (j) UMAP andα=0 (E4); (k) UMAP andα=0.2 (E5); (l) UMAP andα=0.5 (E6).
    Figure 9. The 3-dim loci obtained ford2 andW=60: (a) HC andα=0 (E4); (b) HC andα=0.2 (E5); (c) HC andα=0.5 (E6); (d) MDS andα=0 (E4); (e) MDS andα=0.2 (E5); (f) MDS andα=0.5 (E6); (g) t-SNE andα=0 (E4); (h) t-SNE andα=0.2 (E5); (i) t-SNE andα=0.5 (E6); (j) UMAP andα=0 (E4); (k) UMAP andα=0.2 (E5); (l) UMAP andα=0.5 (E6).
    Entropy 23 00600 g009
    Entropy 23 00600 g010 550
    Figure 10. The 3-dim loci obtained ford2 andW=30: (a) HC andα=0 (E7); (b) HC andα=0.2 (E8); (c) HC andα=0.5 (E9); (d) MDS andα=0 (E7); (e) MDS andα=0.2 (E8); (f) MDS andα=0.5 (E9); (g) t-SNE andα=0 (E7); (h) t-SNE andα=0.2 (E8); (i) t-SNE andα=0.5 (E9); (j) UMAP andα=0 (E7); (k) UMAP andα=0.2 (E8); (l) UMAP andα=0.5 (E9).
    Figure 10. The 3-dim loci obtained ford2 andW=30: (a) HC andα=0 (E7); (b) HC andα=0.2 (E8); (c) HC andα=0.5 (E9); (d) MDS andα=0 (E7); (e) MDS andα=0.2 (E8); (f) MDS andα=0.5 (E9); (g) t-SNE andα=0 (E7); (h) t-SNE andα=0.2 (E8); (i) t-SNE andα=0.5 (E9); (j) UMAP andα=0 (E7); (k) UMAP andα=0.2 (E8); (l) UMAP andα=0.5 (E9).
    Entropy 23 00600 g010
    Entropy 23 00600 g011 550
    Figure 11. The 3-dim loci obtained ford2 andW=10: (a) HC andα=0 (E10); (b) HC andα=0.2 (E11); (c) HC andα=0.5 (E12); (d) MDS andα=0 (E10); (e) MDS andα=0.2 (E11); (f) MDS andα=0.5 (E12); (g) t-SNE andα=0 (E10); (h) t-SNE andα=0.2 (E11); (i) t-SNE andα=0.5 (E12); (j) UMAP andα=0 (E10); (k) UMAP andα=0.2 (E11); (l) UMAP andα=0.5 (E12).
    Figure 11. The 3-dim loci obtained ford2 andW=10: (a) HC andα=0 (E10); (b) HC andα=0.2 (E11); (c) HC andα=0.5 (E12); (d) MDS andα=0 (E10); (e) MDS andα=0.2 (E11); (f) MDS andα=0.5 (E12); (g) t-SNE andα=0 (E10); (h) t-SNE andα=0.2 (E11); (i) t-SNE andα=0.5 (E12); (j) UMAP andα=0 (E10); (k) UMAP andα=0.2 (E11); (l) UMAP andα=0.5 (E12).
    Entropy 23 00600 g011
    Table
    Table 1. The DJIA main periods and some historical events occurred during 28 December 1959 up to 12 March 2021.
    Table 1. The DJIA main periods and some historical events occurred during 28 December 1959 up to 12 March 2021.
    PeriodInterval,kStart DateEnd DateMain Events
    A[1,200]28 December 195930 September 19601961 Berlin Wall; Bay of Pigs
    B[200,640]30 September 19608 June 1962
    C[640,1555]8 June 196210 December 19651962 Cuban Missile Crisis; 1963 John F. Kennedy Assassination;
    1964 Vietnam War Begins; 1965 The Great Inflation Begins
    D[1555,2720]10 December 196529 May 19701967 The Six Day War
    E[2720,3878]29 May 19706 November 19741972 Watergate; Munich Olympics Massacre;
    1973 U.S. Involvement in Vietnam Ends;
    Arab Oil Embargo; 1974 President Nixon Resigns
    F[3878,5890]6 November 197423 July 19821977 Panama Canal Treaty; 1979 Iran Hostage Crisis;
    1980 Iraq - Iran War; 1981 President Reagan Shot;
    1982 Falkland Islands War
    G[5890,7237]23 July 198222 September 19871983 Grenada Invasion; 1986 U.S. Attacks Libya;
    Chernobyl Accident; 1987 Financial Panic;
    Stock Market Crash
    H[7237, 10,340]22 September 198713 August 19991989 U.S. Invades Panama; German Unification;
    1991 The Golf War; Soviet Union Collapse;
    1992 Civil War in Bosnia; 1993 World Trade Center
    Terrorist Attack; 1995 Oklahoma Terrorist Attack; 1997 Asian
    Currency Crisis; Global Stock Market Rout
    I[10,340, 11,240]13 August 199924 January 20032000 Bush - Gore Election Crisis; 2001 Terrorist Attack
    on World Trade Center & Pentagon; Enron Crisis;
    2003 War in Iraq
    J[11,240, 12,500]24 January 200323 November 20072004 Global War on Terror; 2005 Record High Oil Prices;
    2007 Subprime Mortgage; Credit Debacle
    K[12,500, 12,840]23 November 200713 March 20092008 Credit Crisis;
    Financial institution Failures
    L[12,840, 15,690]13 March 200914 February 20202010 European Union Crisis; Massive Debt;
    2011 U.S. Credit Downgrade; 2012 European Debt;
    2013 U.S. Government Shutdown; 2014 Oil Price Decline;
    2015 Refugee Crisis; 2016 Brexit Referendum;
    2017 Trump Administration; 2018 Warnings About Climate
    Change; U.S. - China Trade War;
    President Trump Impeachment Process
    M[15,690, 15,970]14 February 202012 March 20212020 COVID19 Pandemics; Black Lives Matter
    Table
    Table 2. List of experiments varyingW andα.
    Table 2. List of experiments varyingW andα.
    WαNWαN
    E1900177E7300532
    E2900.2221E8300.2665
    E3900.5353E9300.51063
    E4600266E101001597
    E5600.2332E11100.21996
    E6600.5531E12100.53193
    Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

    © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

    Share and Cite

    MDPI and ACS Style

    Lopes, A.M.; Tenreiro Machado, J.A. Dynamical Analysis of the Dow Jones Index Using Dimensionality Reduction and Visualization.Entropy2021,23, 600. https://doi.org/10.3390/e23050600

    AMA Style

    Lopes AM, Tenreiro Machado JA. Dynamical Analysis of the Dow Jones Index Using Dimensionality Reduction and Visualization.Entropy. 2021; 23(5):600. https://doi.org/10.3390/e23050600

    Chicago/Turabian Style

    Lopes, António M., and Jóse A. Tenreiro Machado. 2021. "Dynamical Analysis of the Dow Jones Index Using Dimensionality Reduction and Visualization"Entropy 23, no. 5: 600. https://doi.org/10.3390/e23050600

    APA Style

    Lopes, A. M., & Tenreiro Machado, J. A. (2021). Dynamical Analysis of the Dow Jones Index Using Dimensionality Reduction and Visualization.Entropy,23(5), 600. https://doi.org/10.3390/e23050600

    Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further detailshere.

    Article Metrics

    No
    No

    Article Access Statistics

    For more information on the journal statistics, clickhere.
    Multiple requests from the same IP address are counted as one view.
    Entropy, EISSN 1099-4300, Published by MDPI
    RSSContent Alert

    Further Information

    Article Processing Charges Pay an Invoice Open Access Policy Contact MDPI Jobs at MDPI

    Guidelines

    For Authors For Reviewers For Editors For Librarians For Publishers For Societies For Conference Organizers

    MDPI Initiatives

    Sciforum MDPI Books Preprints.org Scilit SciProfiles Encyclopedia JAMS Proceedings Series

    Follow MDPI

    LinkedIn Facebook X
    MDPI

    Subscribe to receive issue release notifications and newsletters from MDPI journals

    © 1996-2025 MDPI (Basel, Switzerland) unless otherwise stated
    Terms and Conditions Privacy Policy
    We use cookies on our website to ensure you get the best experience.
    Read more about our cookieshere.
    Accept
    Back to TopTop
    [8]ページ先頭

    ©2009-2025 Movatter.jp