How to plug a scheduling algorithm into Anki

Recently, I've seen a lot of discussion about improving anki scheduling algorithm. In this post, I intend to explain how any developper can do it; at least as long as they know what scheduling rules they want to use. I'll also discuss the limit I see.

How to do it

In this section, I'll assume you only want to review cards on the computer. I'll discuss other devices in last section.

Take a look at my repo anki-schedule. You can clone it, change the function scheduler in scheduler.py. It should be a function which takes as input a card and return the interval (in seconds between the last review and the next review. Remarks that the next review could well be in the past. For example if you return 3600, it means the interval should be an hour. If the last review was more than one hour ago, the due time is already passed.

If you install this add-on, and in the main window, select "tools>Reschedule", then it'll do mostly what you are looking. Each card will be scheduled according to your algorithm. If the interval is passed, the card will be set to be reviewed now. If the interavl is more than two days, I assume we can set the due date as a day and don't care about the exact minute anymore. Otherwise, the exact time is used.

When a card is reviewed or saved in general, the review is logged as usual, and then its new due date is computed.

Experience

This method as been used by a client of mine. As far as I know he discovered that the scheduling he wanted to use was not actually as good as he expected. Alas, my add-on allow you to use your scheduler you want, it does not solve the far bigger problem of designing a good scheduler.

Data that you can use

All of the data available is given anki anki database structure page. This page is not official, but as far as I know, it is accurate. However, let me recap what you probably want to use:

  • for each review, with the time of the review, the button pressed (between 1 and 4), how much time was taken to answer (capped to 60 seconds usually), what was its previous state (learn, review, relearn, cram)
  • for each card: its creation time (which may be later than the note), its note, which card type it is, it's deck
  • for each note: its tags (which could be used to relate cards together), its fields, it's creation time

Any add-on can also save its own data. That is particularly important if you use machine learning techniques; you certainly would want to save precomputation somewhere.

Limits

Alas, there are many limits. And they are only partially solvable

Breaking everything

If you change all due date, and your scheduler was broken, you are going to break you entire review process. That's actually not as bad as it sounds, since in theory you can use the very same technic to compute back the original due date. Simply, no one did the "schedule" function to do it, so it would require some works. Or you can also use your backup. Or use yet another scheduling algorithm, overriding the broken one.

Other devices

Add-ons works only on anki desktop. The good news is that my add-on reschedule cards, so that as soon as you sync with your computer, you can compute the correct interval. If my add-on were to be really used, then I should automatize the application of the new scheduler to nsynchronized cards. Right now, you need to tell anki to do it by clicking on "reschedule" in the menu. There are still two obvious limits. For cards in relearning, you are stuck with anki relearning the way anki does it. The good news is that ankidroid is free, and it would not actually be hard to fork it and add a new scheduler if you are willing to pay for it (and I can be hired to do it). The limit here is that AnkiDroid is in java, and so you can't directly use the python code you use in the add-on.

Buttons

Currently, anki allows to have between two and four buttons during review. You can't easily tell anki to add more buttons and log them. However, you can limit the number of buttons to be two. Under each button, the interval you'd obtain is shown. In this case, the interval will be false since it'll use the old scheduler data.

Of course, if someone is really interested, the buttons problem can be solved; but I believe that to prototype a new scheduler, it's not a priority

Computation time:

Of course, a big limit is the computation time. Your code can takes as much time as you want. But if you need .1 seconds by cards and have 100 000 cards, then you'll need 10 000 seconds to recompute everything, that is 2h45 !

Shared information

One main advantage of centralized system such as duolingo is that once people have trained their algoritm, the training can be used for new users. If you create your own cards and decks, of course, you can train the scheduling algorithm with other people. But if you use a shared decks, which is often the case in medicine, language... then the data exists. The problem being that the data is based on thousands of different computers and on ankiweb.

I highly doubt Damien Elmes, author of Anki, would accept either to share the data, or to run a training algorithm over them. At the very least because he have some european user, and it would be a RGPD nightmare. And probably because of thousands of other legal reasons. And maybe even ethical reason. So if we want to do this, we would need to convince people to share their data. An add-on can be created to ensure that data is shared when they install it. Syncing review data is not hard. But it's going to be a huge work to convince people to install those add-ons. Especially since no one can promise them that they'll obtain any advantage when they share their data, since no one can promise honestly that they are sure that the data will allow to create a better scheduling algorithm.

Todo:

  • Reschedule during sync and full sync
  • select card in browser and reschedule them only
  • Add a best control to the number of button, and show the right interval
  • Add a "training" button (for ML based algorithm, it would state to do the training again)

Add a comment

HTML code is displayed as text and web addresses are automatically converted.

Add ping

Trackback URL : http://www.milchior.fr/blog_en/index.php/trackback/758

Page top