Friday, November 05, 2010

Frequency, Redundancy, or Cellulite

Google’s search capability, especially the subset that is Google Scholar (scholar.google.com), provides a quick way to evaluate the impact of publication A insofar it is cited by others. And, it also provides additional insight into the impact of those publications B, C, D… that cite it. More generally, given a particular concept or research focus, searching for citations of the idea or focus provide some insight into its breadth and depth.


Consider a couple of examples. A search for Mandelbrot fractal produces a maximum of 1000 results (at up to 100 per webpage – selected in Advanced Scholar Search).


(Click graphic to enlarge.)


The top of the first page may look like this:




Note the Cited by link. Copying each of the ten pages of the search into Excel, combining duplicates, and filtering down the resulting total citations (somewhat less than 1000) to the Cited by ##, parsing out the numbers, and sorting them largest to smallest produces:




Publication Index 24 has the most citations, 17,314, and is assigned Rank 1. As might be expected, this publication is Mandelbrot’s The fractal geometry of nature.


Charting number of citations (##) versus rank produces this graph:






A large number of citations of a publication which contains Mandelbrot and fractal is restricted to the first few articles or books. Converting the axes to logarithmic scale (base 10) illuminates this effect:






A log-log-linear trend is apparent from the largest number of citations about the 200th publication, with a slight drop-off to the 700th, beyond which a very rapid drop-off is apparent. Fitting a power function to the citation counts from ranks 1 to 224 produces a log-log-linear curve.






Is there a fractal structure implicit in this observed trend? By undertaking a cell count (how many intervals of a particular count are occupied), a fractal trend can be detected.






Combining the two charts:






The fractal dimension according to the cell counts is approximately 0.529 (note that all data points are incorporated into the analysis.


What about citation counts of citations of a particular publication, e.g., Mandelbrot’s The fractal geometry of nature?






Click Cited by 17,316 and get:




We could keep on clicking… But, before doing that, analyze the number of citations of each citation of Mandelbrot’s classic as above, getting the final chart:






Note that the log-linear fit to Counts versus Rank and to the Fractal cell counts have similar exponents, -0.896 and -0.909, unlike the simple Mandelbrot fractal search (this is not the usual observation). Recall what the latter search has produced: It is Google Scholar’s search result for articles that cite Mandelbrot’s classic book, from which the number of citations of each of those publications has been extracted and analyzed (in the illustration above).


Let’s go one level deeper, click Cited by 10,771:






Resulting analysis:






There is a departure from a simple log-log-linear curve for the citations of citations, especially at the low ranks, almost as if there were two trends; the low rank citations are close in slope to the fractal trend.


What's going on here? Citations of Mandelbrot's classic book seem to demonstrate a fractal structure, as do citations of the most cited article which cites his book. We might anticipate that such structures would appear in repeated analysis of the most cited articles at each level. Fractals within fractals within fractals... as Swift anticipated:
"So nat'ralists observe, a flea
Hath smaller fleas that on him prey,
And these have smaller fleas that bite 'em,
And so proceed ad infinitum."

No comments: