← Rahim Nathwani

Custom statistics for Anki flashcard reviews

Anki's built-in stats page shows streaks and review counts, but it won't answer questions like "what's my median session length over the last 3 weeks?" or "how many seconds do I spend per card on average?". Fortunately, Anki stores a complete review log in an SQLite database, so you can run any query you like.

Exporting the data

  1. File → Export
  2. Select Anki Deck Package (.apkg), then check Include Scheduling Information and Support older Anki versions Screenshot of Anki export options
  3. Unzip the resulting .apkg file (it's just a zip archive)
  4. Extract the review log into a CSV:
    sqlite3 collection.anki21 -header -csv "SELECT * FROM revlog;" > revlog.csv
    

Note: Checking "Support older Anki versions" produces a collection.anki21 file. Without it you may get collection.anki21b, which uses a different internal format.

The revlog schema

The revlog table has one row per review. The columns you'll use most are:

ColumnDescription
idReview timestamp in milliseconds (also serves as a unique ID)
cidCard ID
easeButton pressed: 1=Again, 2=Hard, 3=Good, 4=Easy
ivlNew interval after review (positive = days, negative = seconds for learning steps)
factorEase factor × 1000 (e.g. 2500 means 2.5×)
timeTime spent on the card in milliseconds
typeReview type: 0=Learning, 1=Review, 2=Relearn, 3=Filtered

Loading into pandas

import pandas as pd

df = pd.read_csv('revlog.csv')

# Convert the millisecond timestamp to datetime
df['datetime'] = pd.to_datetime(df['id'], unit='ms')
df['date'] = df['datetime'].dt.date
df['week'] = df['datetime'].dt.to_period('W')

Session summary statistics

Cards reviewed and time spent per day, with derived columns for minutes and seconds per card:

daily = (
    df.groupby('date')['time']
    .agg(['sum', 'count'])
    .rename(columns={'sum': 'total_ms', 'count': 'cards'})
    .assign(minutes=lambda x: round(x['total_ms'] / 1000 / 60, 1))
    .assign(seconds_per_card=lambda x: round(x['total_ms'] / x['cards'] / 1000, 1))
)

# Median over the last 21 days
daily.tail(21).median()

Custom percentiles

describe() only shows a fixed set of percentiles. Pass your own list to see more of the distribution:

custom_percentiles = [0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99]
df['time'].describe(percentiles=custom_percentiles)

This is useful for spotting outliers — e.g. cards where you walked away and left Anki open will show up at the 99th percentile with very high times.

Filtering by review type

If you want to exclude learning/relearning steps and look only at mature card reviews:

reviews_only = df[df['type'] == 1]