
2023-09-12 21:19:45 作者:西决丶丶


I've read a bunch of tutorials about the proper way to generate a logarithmic distribution of tagcloud weights. Most of them group the tags into steps. This seems somewhat silly to me, so I developed my own algorithm based on what I've read so that it dynamically distributes the tag's count along the logarthmic curve between the threshold and the maximum. Here's the essence of it in python:

from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, threshold=0, maxsize=1.75, minsize=.75):
    countdist = []
    # mincount is either the threshold or the minimum if it's over the threshold
    mincount = threshold<min(count) and min(count) or threshold
    maxcount = max(count)
    spread = maxcount - mincount
    # the slope of the line (rise over run) between (mincount, minsize) and ( maxcount, maxsize)
    delta = (maxsize - minsize) / float(spread)
    for c in count:
        logcount = log(c - (mincount - 1)) * (spread + 1) / log(spread + 1)
        size = delta * logcount - (delta - minsize)
        countdist.append({'count': c, 'size': round(size, 3)})
    return countdist


Basically, without the logarithmic calculation of the individual count, it would generate a straight line between the points, (mincount, minsize) and (maxcount, maxsize).

该算法确实的两个点之间的曲线的良好近似,但是从一个缺点受损。所述mincount是一种特殊情况,和它的对数产生零。这意味着mincount的大小将小于MINSIZE。我试过炮制数字来尝试解决这种特殊情况,但似乎无法得到它的权利。目前,我只是对待mincount作为一个特例,增加或1 的logcount行。

The algorithm does a good approximation of the curve between the two points, but suffers from one drawback. The mincount is a special case, and the logarithm of it produces zero. This means the size of the mincount would be less than minsize. I've tried cooking up numbers to try to solve this special case, but can't seem to get it right. Currently I just treat the mincount as a special case and add " or 1" to the logcount line.


Is there a more correct algorithm to draw a curve between the two points?

更新3月3日:如果我没有记错的话,我以计数的日志,然后将其插入一个线性方程。放的特殊情况下的说明中,换句话说,在Y = LNX在x = 1,Y = 0。这是在mincount会发生什么。但mincount不能为零,标签没有被使用0次。

Update Mar 3: If I'm not mistaken, I am taking the log of the count and then plugging it into a linear equation. To put the description of the special case in other words, in y=lnx at x=1, y=0. This is what happens at the mincount. But the mincount can't be zero, the tag has not been used 0 times.


Try the code and plug in your own numbers to test. Treating the mincount as a special case is fine by me, I have a feeling it would be easier than whatever the actual solution to this problem is. I just feel like there must be a solution to this and that someone has probably come up with a solution.


UPDATE Apr 6: A simple google search turns up a many of the tutorials I've read, but this is probably the most complete example of stepped tag clouds.


UPDATE Apr 28: In response to antti.huima's solution: When graphed, the curve that your algorithm creates lies below the line between the two points. I've been trying to juggle the numbers around but still can't seem to come up with a way to flip that curve to the other side of the line. I'm guessing that if the function was changed to some form of logarithm instead of an exponent it would do exactly what I'd need. Is that correct? If so, can anyone explain how to achieve this?



Thanks to antti.huima's help, I re-thought out what I was trying to do.


Taking his method of solving the problem, I want an equation where the logarithm of the mincount is equal to the linear equation between the two points.

weight(MIN) = ln(MIN-(MIN-1)) + min_weight
min_weight = ln(1) + min_weight


While this gives me a good starting point, I need to make it pass through the point (MAX, max_weight). It's going to need a constant:

weight(x) = ln(x-(MIN-1))/K + min_weight


Solving for K we get:

K = ln(MAX-(MIN-1))/(max_weight - min_weight)

所以,把所有这一切放回一些Python code:

So, to put this all back into some python code:

from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, threshold=0, maxsize=1.75, minsize=.75):
    countdist = []
    # mincount is either the threshold or the minimum if it's over the threshold
    mincount = threshold<min(count) and min(count) or threshold
    maxcount = max(count)
    constant = log(maxcount - (mincount - 1)) / (maxsize - minsize)
    for c in count:
        size = log(c - (mincount - 1)) / constant + minsize
        countdist.append({'count': c, 'size': round(size, 3)})
    return countdist