Ficdom Structure 2: Author heat map · 2:07am Jun 9th, 2017

In "Author clusters question", I wrote about finding clusters of authors on fimfiction using data on who followed them. What I'm going to show you now is less sophisticated, but visually easier to grasp. Actually I did this first, back in November. See that earlier post to explain the derivation of my measure m2(A,B), a measure of distance between authors.

(I never explained the adjustment to the ratio in that post; see Math Stuff at the bottom of this post if you care.)

This was my first attempt to try to find some structure or clustering among the authors on fimfiction, using scraped data from 2015 on who people follow. I selected the 54 authors who had at least 4 stories and 1000 followers and followed at least 10 other authors meeting those requirements. (The 10 other authors requirement was because at first I was studying which authors followed which other authors.)

Heat map

I made a heat map from the distances between them using the R gplots library function heatmap.2, using default parameters. Redder values indicate authors are closer together. heatmap.2 uses these values to build a dendrogram, drawn above and to the left, which clusters authors with similar distance vectors. It does this using hierarchical clustering with complete linkage.

Fig. 1. Heat map showing “distance” between authors. Click to enlarge!

Unfortunately, the algorithm never produces the tightest clusterings, because which branch of a binary tree it shows on the left and which on the right is arbitrary. It could be improved a lot by taking into account the similarity of the leftmost part of a subtree to the rightmost part of whatever subtree is displayed to its left. By looking for sharp color boundaries that match up to deep cuts in the tree, you can see that the very first split in the tree should be flipped left/right, moving the red block at bottom left (called the “gore” block below) to the top right, and that would bring the 4 red areas in the 4 corners together.

The red area in the upper-right I’ll call the "popular authors". Their stories usually have the comedy, romance, adventure, or mature+sex tags. Here also be crackfic territory. The 4x4 dark red block in that block’s center shows the similarity of the set {Marshal Twilight, The Abyss, Shakespearicles, TittySparkles} to itself. It's the core of clop in this dataset.

The large red block near the lower-left I’ll call the ~~"cool kids"~~ character authors. Notice Bad Horse is next to Cold in Gardez and Skywriter. Says Science!

See the small red block in the very lower-left? That shows that {Vengeful Spirit, Pedro Hander, ed2481, Tatsurou, Distorted Flare, MadMaxtheBlack} all have high similarity to each other. They're authors who write stories with the gore tag.

See the nearly-white blocks above and to the right of that red block? That shows that people who read the gore authors are unlikely to read {GhostOfHeraclitus, Friendly Uncle, AbsoluteAnonymous, Cloudy Skies, kits}, who usually write sweet stories. Hardly surprising.

If you follow horizontally over to the right from the gore x gore red block, you'll see it's mostly white & yellow until you come to Mr101, who marks the left edge of the "popular authors". A taste for gore and for popular authors goes together, while a taste for gore and for character-based fiction does not.

Is Science unjust? Are there authors in one block you think belong in another? Write down their names in the comments, and then we'll see what the next graph has to say in another post.

Math stuff
At first, I computed the sample probability that a user who watched user B would watch user A:
P(watches(X,A) | watches(X,B))
That didn't work well--it turned out that the number of watches that users make does not have a Poisson distribution. Instead, some people are just more likely to follow people in general. Most follows are made by a small number of people who follow lots of people.
BUT, those people who follow lots of people can only follow at most 10 of the 10 most-popular authors. Do you see the problem?
No?
Well, I didn't either, but the data eventually made it clear: People who follow just a few people follow a few popular authors, and a few other people. People who follow lots of people follow a few popular authors, and lots of other people. SO, if user X follows user B, and user B is not at all popular, user X is probably one of those people who follows a lot of people--and that means P(watches(X,A )| watches(X,B)) is a lot higher if B is not popular than if B is popular.
I had hoped that P(watches(X, Bad Horse) | watches(X, Cold in Gardez)) would be high, but it wouldn't, because CiG has so many followers that only a small percentage of them are people who follow lots of people. Whereas P(watches(X, Bad Horse) | watches(X, shitfic_author_1337)) would be higher just because the 3 people who watched shitfic_author_1337 each watched thousands of people.
So I redefined my measure as a likelihood ratio:
ratio(A,B) = P(watches(X,A) | watches(X,B)) / [P(watches(X,A) | numOfWatches(X) = ave(numOfWatches(Y | watches(Y,B))]
where numOfWatches(X) is the number of users that user X watches. This normalizes the former ratio by A's tendency to watch people. It messes up the math and coding, though, because a probability always falls in [0..1], whereas a ratio of probabilities can be any positive number.

Report Bad Horse · 1,077 views · #fimfiction #computation #data #graphs #R

Comments ( 19 )

Viewing 1 - 50 of 19
- Newest First
- Oldest First

TheJediMasterEd

TheJediMasterEd #1 · Jun 9th, 2017 · 9 · ·

This is what Vulcan Fantasy Football looks like.

Georg

Georg #2 · Jun 9th, 2017 · · ·

Cool. I’ve always been fascinated by probability, although my math skills top out at multiplication, and only with a calculator.

Admiral Biscuit

Admiral Biscuit #3 · Jun 9th, 2017 · · ·

I'm not on the chart, probably because I broke his algorithm.

Trick Question

Trick Question #4 · Jun 9th, 2017 · · ·

Quick nag: the scale seems backwards to me. I’d expect you’re comparing author similarity, in which case 1.0 would be identical. This suggests to the reader that you’re scoring dissimilarity, which is rather strange.

Trick Question

Trick Question #5 · Jun 9th, 2017 · · ·

Also, the map shows that Skywriter, Cold In Gardez, and Bad Horse appear to be very highly similar.

How suspiciously flattering...

Bad Horse

Bad Horse #6 · Jun 9th, 2017 · 1 · ·

4565310 A distance metric is a dissimilarity metric. I want next to draw a map of author-space, where being close together on the map means being similar. Can’t do that with a similarity metric.

Trick Question

Trick Question #7 · Jun 9th, 2017 · · ·

4565320
Derp, normalized difference as a dissimilarity metric. I get it.

Trick Question

Trick Question #8 · Jun 9th, 2017 · · ·

4565274
I think it’s just assumed your row and column (except for you) are entirely snow-white.

Southpaw

Southpaw #9 · Jun 9th, 2017 · · ·

4565274 Time to start that SilverPip's Wasteland Journal you were thinking about.

Catalysts Cradle

Catalysts Cradle #10 · Jun 9th, 2017 · · ·

Neat. The heatmap does a really nice job of summarizing a lot of information about the similarities between authors' audiences. Are the clusters from the hierarchical clustering pretty similar to the ones you found earlier by PCA and k-means clustering? It looks to be the case, but it'd be interesting to see what the differences were.

GaPJaxie

GaPJaxie #11 · Jun 9th, 2017 · · ·

As someone not featured in your study, I reject science.

More seriously, very cool. Glad to see a larger version of this posted.

Skywriter

Skywriter #12 · Jun 9th, 2017 · 4 · ·

Bad Horse is next to Cold in Gardez and Skywriter

We must not have seen you coming.

MrNumbers

MrNumbers #13 · Jun 9th, 2017 · · ·

Huh. I’m pretty similar to Chuck Finley, Eakin and WandererD

I’m pretty chuffed!

Cold in Gardez

Cold in Gardez #14 · Jun 9th, 2017 · · ·

You need better colors. Everything from .3 to .0 looks identically red.

archonix

archonix #15 · Jun 9th, 2017 · · ·

4565653

A grand grouping. Meanwhile, all I can do is back in a tiny sliver of the reflected glory of my low bacon number to all of you.

Actually now I think of it, personal relationship distance between authors would be an interesting thing to quantify. I'd also be a pain in the arse to quantify.

Bad Horse

Bad Horse #16 · Jun 10th, 2017 · · ·

4565963

Actually now I think of it, personal relationship distance between authors would be an interesting thing to quantify. I'd also be a pain in the arse to quantify.

Not so hard. Number of comments on each others' stories. If you ran the website, you could also count PMs between people.

Icy Shake

Icy Shake #17 · Jun 10th, 2017 · 1 · ·

Like it, easily grokked. With 4565667 in thinking you need greater color range.

Whom might you add?
Maybe some people who definitely meet the criteria now and were around then, with a good number of stories but below the follower count. Granted, can’t catch full-scale data for them, but could catch what their early adopters have in common with those who already qualified. People I can think of: Horizon, Estee, GaPJaxie, Ponydora Prancypants, Titanium Dragon, Present Perfect.
Get a reviewer/blogger block (though this even now many of these would have <1000 follower counts). Examples: Titanium Dragon, Present Perfect, Chris, JohnPerry, Bradel, RBDash47.

Admiral Biscuit

Admiral Biscuit #18 · Jun 10th, 2017 · · ·

4565328

I think it’s just assumed your row and column (except for you) are entirely snow-white.

As pure as the driven snow. . . .

4565358

Time to start that SilverPip's Wasteland Journal you were thinking about.

That would be an interesting project.

Titanium Dragon

Titanium Dragon #19 · Jul 5th, 2017 · · ·

4566551
4566891
I strongly suspect that I'd end up in the big red block in the middle-lower-left. I may do a lot of reviewing, and I may follow the other reviewers, but the upper-right hand block, with only a few exceptions (Rainbow Bob, Rated Ponystar, Bad_Seed_72) have almost uniformly not been reviewed by me at all. Conversely, I've reviewed almost everyone in that mid-lower-left block (only Alexstraza and SleeplessBrony have not been reviewed by me - clearly major blind spots for me).

I haven't read a single person in the far lower-left block other than Vengeful Spirit.

I have read a few stories by The Abyss but I have never reviewed any of their stuff.

My guess would be that my taste in what I read and my taste in what I write overlap pretty heavily, so I'd expect to share a lot of followers with most of the lower-left crew.

Though I guess I did get some followers from Rainbow Bob way back in the day.

Viewing 1 - 50 of 19
- Newest First
- Oldest First

Bad Horse

More Blog Posts759

6 days
Paul Asaran gave me (almost) an entire issue of his review!

9 weeks
Experimental Fiction, part 1: A list of experimental pony stories

10 weeks
"Our minds must be too highly trained."

10 weeks
The pony personality space

11 weeks
Nikolai Gogol and me on realism and truth vs. style

Ficdom Structure 2: Author heat map · 2:07am Jun 9th, 2017

Stats

FIMFiction

Follow & Support Us

Bad Horse

More Blog Posts759

6 days Paul Asaran gave me (almost) an entire issue of his review!

9 weeks Experimental Fiction, part 1: A list of experimental pony stories

10 weeks "Our minds must be too highly trained."

10 weeks The pony personality space

11 weeks Nikolai Gogol and me on realism and truth vs. style

Ficdom Structure 2: Author heat map · 2:07am Jun 9th, 2017

Stats

FIMFiction

Follow & Support Us

6 days
Paul Asaran gave me (almost) an entire issue of his review!

9 weeks
Experimental Fiction, part 1: A list of experimental pony stories

10 weeks
"Our minds must be too highly trained."

10 weeks
The pony personality space

11 weeks
Nikolai Gogol and me on realism and truth vs. style