Analysis

Note

All examples assume the line

col = Collection()

Or col = Collection("/path/to/col.anki2"), etc.

In which deck are the most leeches?

1
2
3
cards = col.cards.merge_notes()
counts = cards[cards.has_tag("leech")]["cdeck"].value_counts()
counts.plot.pie(title="Leeches per deck")
_images/leeches_per_deck.png

Which deck has the longest average retention rates?

1
2
3
4
5
6
grouped = col.cards.groupby("cdeck")
data = grouped.mean()["civl"].sort_values().tail()
ax = data.plot.barh()
ax.set_ylabel("Deck name")
ax.set_xlabel("Average expected retention length/review interval [days]")
ax.set_title("Average retention length per deck")
_images/retention_rate_per_deck.png

Repetitions vs type

Minimal:

col.cards.hist("crepts", by="ctype")

Prettier:

1
2
3
4
axs = col.cards.hist(column="creps", by="ctype", layout=(1, 2), figsize=(12, 3))
for ax in axs:
    ax.set_xlabel("#Reviews")
    ax.set_ylabel("Count")
_images/repetitions_per_type.png

Repetitions vs deck

One liner:

col.cards.hist(column="creps", by="cdeck")

Prettier:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
interesting_decks = list(col.cards.cdeck.unique())
interesting_decks.remove("archived::physics")
selected = col.cards[col.cards.cdeck.isin(interesting_decks)]
axss = selected.hist(
    column="creps",
    by="cdeck",
    sharex=True,
    layout=(5, 4),
    figsize=(15, 15),
    density=True,
)
for axs in axss:
    for ax in axs:
        ax.set_xlabel("#Reviews")
        ax.set_ylabel("Count")
_images/repetitions_per_deck.png

Retention distribution vs deck

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import matplotlib.pyplot as plt
import numpy as np

ax = plt.gca()
for deck in col.cards.cdeck.unique():
    selected = col.cards[col.cards.cdeck == deck]["civl"]
    if len(selected) < 1000:
        continue
    selected.plot.hist(
        ax=ax,
        label=deck,
        histtype="step",
        linewidth=2,
        xlim=(0, 365),
        bins=np.linspace(0, 365, 10),
    )
ax.set_xlabel("Predicted retention length (review interval)")
ax.set_ylabel("Number of cards")
ax.set_title("Expected retention length per deck [days]")
ax.legend(frameon=False)
_images/retention_distribution_vs_deck.png

Reviews vs retention length vs deck

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pandas as pd

xs = []
ys = []
decks = []
for deck in col.cards.cdeck.unique():
    selected = col.cards[col.cards["cdeck"] == deck]
    if len(selected) < 500:
        continue
    decks.append(deck)
    binned = pd.qcut(selected["creps"], 15, duplicates="drop")
    results = selected.groupby(binned)["civl"].mean()
    y = results.tolist()
    x = results.index.map(lambda x: x.mid).tolist()
    xs.append(x)
    ys.append(y)

ax = plt.gca()
for i in range(len(xs)):
    ax.plot(xs[i], ys[i], "o-", label=decks[i])
ax.set_xlabel("#Reviews")
ax.set_ylabel("Expected retention length/review interval [days]")
ax.set_title("Number of reviews vs retention length")
ax.legend(frameon=False)
_images/reviews_vs_ease.png