All Articles

Custom statistics for Anki flashcard reviews

Anki stores a record for each review. This powers the built-in stats page, which is quite limited. Luckily, it’s easy to get the data:

  1. File -> Export
  2. Choose Anki Deck Package, Include Scheduling Information and Support older Anki versions Screenshot of Anki export options
  3. Unzip the resulting file
  4. Extract the review log into a CSV:

    sqlite3 collection.anki21 -header -csv "SELECT * FROM revlog;" > output.csv
  5. Load the CSV into pandas
  6. Add date, datetime and week columns:

    df['datetime'] = pd.to_datetime(df['id'], unit='ms')
    df['date'] = df['datetime'].dt.date
  7. Now you can do things like see the median number of cards per session, seconds per card etc. for a particular period:

    (df.groupby('date')['time']
    .agg(['sum', 'count'])
    .assign(minutes=lambda x: round(x['sum'] / 1000 / 60, 0).astype(int))
    .assign(seconds_per_card=lambda x: round(x['sum'] / x['count'] / 1000,0).astype(int))
    .tail(21)
    ).median()

Aggregation

(df.groupby('date')['time']
.agg(['sum', 'count'])
.assign(minutes=lambda x: x['sum'] / 1000 / 60)
.assign(seconds_per_card=lambda x: x['sum'] / x['count'] / 1000)
.tail(21)
)

Custom percentiles (instead of histogram)

custom_percentiles = [0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99]
df['values'].describe(percentiles=custom_percentiles)