This process is called head/tail breaks 1.0. This threshold is suggested to be 40% by Jiang et al. The criterion to stop the iterative classification process using the head/tail breaks method is that the remaining data (i.e., the head part) are not heavy-tailed, or simply, the head part is no longer a minority (i.e., the proportion of the head part is no longer less than a threshold such as 40%). The resulting number of classes is referred to as ht-index, an alternative index to fractal dimension for characterizing complexity of fractals or geographic features: the higher the ht-index, the more complex the fractals. the tail for data values less the mean the head for data values greater the mean Rank the input data values from the biggest to the smallest īreak the data (around the mean) into the head and the tail In general, it can be represented as a recursive function as follows:Īn Illustration of the head/tail breaks classification with 10 numbers Recursive function Head/tail Breaks: This classification leads to four classes:, (m1, m2], (m2, m3], (m3, maximum]. For simplicity, we assume there are three means, m1, m2, and m3. In the same recursive way, we can get m3 depending on whether the ending condition of no longer far more small x than large ones is met. Then calculate the second mean for those xi greater than m1, and obtain m2. Take the average of all xi, and obtain the first mean m1. Given some variable X that demonstrates a heavy-tailed distribution, there are far more small x than large ones. It opens up new avenues of analyzing data from a holistic and organic point of view while considering different types of scales and scaling in spatial analysis. Through the head/tail breaks, a dataset is seen as a living structure with an inherent hierarchy with far more smalls than larges, or recursively perceived as the head of the head of the head and so on. Head/tail breaks uses the mean or average to dichotomize a dataset into small and large values, rather than to characterize classes by average values, which is unlike k-means clustering or natural breaks. In this connection, the notion should be interpreted as far more unpopular (or less-connected) things than popular (or well-connected) ones, or far more meaningless things than meaningful ones. Note that the notion of far more small things than large one is not only referred to geometric property, but also to topological and semantic properties. The head/tail breaks is motivated by inability of conventional classification methods such as equal intervals, quantiles, geometric progressions, standard deviation, and natural breaks - commonly known as Jenks natural breaks optimization or k-means clustering to reveal the underlying scaling or living structure with the inherent hierarchy (or heterogeneity) characterized by the recurring notion of far more small things than large ones. 2.3 Other Indices based on the head/tail breaks.Head/tail breaks can be applied not only to vector data such as points, lines and polygons, but also to raster data like digital elevation model (DEM). Head/tail breaks is not just for classification, but also for visualization of big data by keeping the head, since the head is self-similar to the whole. The classification is done through dividing things into large (or called the head) and small (or called the tail) things around the arithmetic mean or average, and then recursively going on for the division process for the large things or the head until the notion of far more small things than large ones is no longer valid, or with more or less similar things left only. The heavy-tailed distribution can be simply referred to the scaling pattern of far more small things than large ones, or alternatively numerous smallest, a very few largest, and some in between the smallest and largest. Head/tail breaks is a clustering algorithm scheme for data with a heavy-tailed distribution such as power laws and lognormal distributions. The left pattern is produced by head/tail breaks, while the right one by natural breaks, also known as Jenks natural breaks optimization. 1024 cities that follow exactly Zipf's law, which implies that the first largest city is size 1, the second largest city is size 1/2, the third largest city is size 1/3.