Analysis

Note

All examples assume the line

col = Collection()

Or col = Collection("/path/to/col.anki2"), etc.

In which deck are the most leeches?

1cards = col.cards.merge_notes()
2counts = cards[cards.has_tag("leech")]["cdeck"].value_counts()
3counts.plot.pie(title="Leeches per deck")
_images/leeches_per_deck.png

Which deck has the longest average retention rates?

1grouped = col.cards.groupby("cdeck")
2data = grouped.mean()["civl"].sort_values().tail()
3ax = data.plot.barh()
4ax.set_ylabel("Deck name")
5ax.set_xlabel("Average expected retention length/review interval [days]")
6ax.set_title("Average retention length per deck")
_images/retention_rate_per_deck.png

Repetitions vs type

Minimal:

col.cards.hist("crepts", by="ctype")

Prettier:

1axs = col.cards.hist(column="creps", by="ctype", layout=(1, 2), figsize=(12, 3))
2for ax in axs:
3    ax.set_xlabel("#Reviews")
4    ax.set_ylabel("Count")
_images/repetitions_per_type.png

Repetitions vs deck

One liner:

col.cards.hist(column="creps", by="cdeck")

Prettier:

 1interesting_decks = list(col.cards.cdeck.unique())
 2interesting_decks.remove("archived::physics")
 3selected = col.cards[col.cards.cdeck.isin(interesting_decks)]
 4axss = selected.hist(
 5    column="creps",
 6    by="cdeck",
 7    sharex=True,
 8    layout=(5, 4),
 9    figsize=(15, 15),
10    density=True,
11)
12for axs in axss:
13    for ax in axs:
14        ax.set_xlabel("#Reviews")
15        ax.set_ylabel("Count")
_images/repetitions_per_deck.png

Retention distribution vs deck

 1import matplotlib.pyplot as plt
 2import numpy as np
 3
 4ax = plt.gca()
 5for deck in col.cards.cdeck.unique():
 6    selected = col.cards[col.cards.cdeck == deck]["civl"]
 7    if len(selected) < 1000:
 8        continue
 9    selected.plot.hist(
10        ax=ax,
11        label=deck,
12        histtype="step",
13        linewidth=2,
14        xlim=(0, 365),
15        bins=np.linspace(0, 365, 10),
16    )
17ax.set_xlabel("Predicted retention length (review interval)")
18ax.set_ylabel("Number of cards")
19ax.set_title("Expected retention length per deck [days]")
20ax.legend(frameon=False)
_images/retention_distribution_vs_deck.png

Reviews vs retention length vs deck

 1import pandas as pd
 2
 3xs = []
 4ys = []
 5decks = []
 6for deck in col.cards.cdeck.unique():
 7    selected = col.cards[col.cards["cdeck"] == deck]
 8    if len(selected) < 500:
 9        continue
10    decks.append(deck)
11    binned = pd.qcut(selected["creps"], 15, duplicates="drop")
12    results = selected.groupby(binned)["civl"].mean()
13    y = results.tolist()
14    x = results.index.map(lambda x: x.mid).tolist()
15    xs.append(x)
16    ys.append(y)
17
18ax = plt.gca()
19for i in range(len(xs)):
20    ax.plot(xs[i], ys[i], "o-", label=decks[i])
21ax.set_xlabel("#Reviews")
22ax.set_ylabel("Expected retention length/review interval [days]")
23ax.set_title("Number of reviews vs retention length")
24ax.legend(frameon=False)
_images/reviews_vs_ease.png