The Scheduler problem is the biggest open problem in the Anki/Spaced repetition learning community I believe. As any good research problems, there are two questions to consider: what are the problems we want to solve, and how to solve them. I've no idea how to solve them, but at least, I hope I can contribute by allowing to clarify the questions.
I've wrote an uncountable number of time, Anki improved my life drastically. Anki is far superior to any non-assisted technique that exists for learning. We have an impressive number of very positive feedback. That's great, I'm really proud of my contributions to Anki, I wish more people use it... but that does not mean that Anki is optimal. I would invite any student, and generally any people who loves to learn or have to learn to use anki or another spaced repetition system such as Super Memo (SM). However, once I consider that Anki already exists, is a part of my life, and that I am an important contributor to this ecosystem, the question remains: how to improve the software.
What is a scheduler
The role of the scheduler is basically to decide what the user should review now. Technically, it is more complex but I am trying to get the big picture first. The user may choose to review everything, or maybe just to review their guitar related question because they currently have their guitar, or their geography because that is what their next exam is about... The scheduler is in charge of deciding which questions to ask and to record the answers given to those questions.
There are two questions that a scheduler algorithm must answer. What are the input it requires to takes its decisions and what should it optimize ?
The input may varies a lot depending on the app'. There are two kinds of inputs I believe; inputs that remains consistent over time and inputs that change. For example, if the user only has access to a computer during breaks at school, then it is important to know that they only have access to the system at most 30 minutes each work day and not during holidays at all... We may also state that some deck is more important, e.g. learning French may be more important than learning Latin because of the way school grades those matters.
And there is the far bigger can of input, which is the inputs that gets logged over time. Some inputs are useful but currently non existant for most users. For example super memo's user should log their sleeping hours because it is supposed to influence a lot the learning process. Anki does not have this information, and any new scheduler for Anki should take into consideration that the data is not available. Of course, it could ask the user to enter this information in the future, or any other information, but it should also be able to work without it.
The input usually contains the past reviews. In the case of anki, it contains for each question id the timestamp, the time taken to answer (usually up to a minute, assuming that if a card took more than a minute, the user simply left their computer/phone and stopped using the app, and so the time is not a useful information), the answer button pressed. It should be noted that anki store the question unique identifier and not the actual question. This means that if for any reason the question has changed, this information is absent. For example, let's say you have a question which is "Who is the prime minister of the U.K.", there is a risk the user changed "Theresa May" to "Boris Johnson" instead of creating a new card, which means that the scheduler does not know that the answer has changed and the card is as good as new.
The input also contains the set of all questions. Anki uses this last informations very basically, simply by having a few parameters that can be configured on a deck by deck basis. The user should configure those parameter themselves, which is uselessly complex. Indeed, the scheduler could figure out that if a lot of card are hard, the next card will probably be hard too, and so the paramaters should be set accordingly (for example, by having a lower number of new cards seen each day)
However, it is easy to imagine that if two cards are sibling (i.e. generated from the same information. Such as, "Drink(English)"->"Boire(French)" and "Boire(French)"->"Drink(English)"), the easiness of both cards is correlated.
If a card asks "what is a square" and the answer is "a rectangle that is a diamond" then this card is at least as complex as the cards containing the definitions of "rectangle" and "diamond" (or may be actually they are easier... or they are related in some other way that I do not know).
We can assume that the card "Trinkt(German)"->"Drink(English)" is easier than "Boire(French)"->"Drink(English)". In theory, both questions are really similar, and the computer may have trouble seeing why one is easier, but for a human it is clear that "drink" sounds similar to "trinkt" and not to "boire".
I should note that knowing what the user want to optimize may also require more input data, but I'll leave that for the next section.
The very complex question that must be solved is what should the scheduler optimize ? Here are a (absolutely non-exhaustive) list of example I can think of:
Passing an exam
If you have an exam and you know you need to have at least 12/20 to pass your class, and that the exam is the 28th of june and the second try is the 15th of july, then your goal is to have enough knowledge and skills to get at least 12/20 the 15th of july. Since nothing can be sure, I assume we should state that you get at least 12/20 with 99% of probability the 15th of july. And since you want your month of june mostly free, you can also try to succeed with 95% probability for the 28th of june. In theory, you want to get exactly 12/20, because any better grade is time lost that you can have spent doing more interesting things.
I should note that, unless the exam contains very basic question such as "what is the capital of France" and "what is the name of the river in Berlin" then it's actually hard to modelize the note you'll get. I should also note that the goal is not to get 12/20 on average, because if there is 50% of change that you get 20/20 and 50% of chance you get 4/20, you'll have a high probability to fail the exam.
Being first at a contest
This problem is harder than the last one, because the actual amount of knowledge and skills you need depends on the other contestant. In the worst case, if everyone uses the same scheduler as you, the scheduler can't guarantee anything since they are always some looser ! So the better approximation to this problem consists simply in trying to get the maximal score... Or at least to the maximal score possible when satisfying some other constraints. For example, maybe the maximal score requires to spend 3 hours reviewing each day, but that you have a job and can't spend 3 hours each day reviewing, then the scheduler should also try to take the amount of time you have into account, and possibly the fact that you are reviewing some questions in a noisy place in the public transit and reviewing some other question in your private bedroom...
Note that spending 3 hours each day reviewing is probably a bad idea. So a scheduler should probably never suggest to spend that much time in order to optimize learning; but the scheduler will have a hard time discovering how tired you are if you are mentally exhausted by anything other than reviewing. Or maybe the scheduler should take into account that if you start to get an unusual number of bad answer, then you should stop and get some rest..
Learning as quickly as possible
Maybe you don't care about a specific date, and you want to learn something as quickly as possible. Let's say that you want to learn Greek. The Greek language has 24 letters, with upper and lower caps, so there are 59 letters you need to learn. Your goal is going to be learning them as quickly as possible, since it is a preriquisite to any ulterior learning of Greek, and you can't wait until the exam occurs to know them. I may imagine that this require a very different scheduler than the one of the previous cases.
Getting a shallow knowledge of a subject
Sometime, I don't care about knowing a subject well, I just want to have a shallow knowledge. I believe in France it is what we call "Culture générale"(General culture), we need to show off that we know a lot of thing even if we don't know anything well. A lot of people in politics, newspaper... are supposed to be able to discuss any subject, what matter is your style, the way you present your (non-existant) knowledge, by using a few buzzword. I would believe it to be really useful for salesman, for job interview... As you can guess by my writting, I'm not a fan of it, but I know it exists, and that it is a skill which opens a lot of possible actions. I admit that I used it myself to prove to some people that I have a basic knowledge of their field of speciality, even if I can never cite anything that is not in the introduction chapter of the 101 lecture about this subject, and that is usually sufficient for people to consider that they can seriously discuss with me, even if we can't have a research-level discussion.
I would assume that in this case, the scheduler should be really different than in the previous case, since your goal is not actually to know anything well, but just to ensure that you always have a few buzzword accessible in your mind and that when someone uses a buzzword, you know in which field this buzzword belong. So you can actually concentrate on some easy card and quickly decide to ignore anything that is difficult.
Learning because it may one day be useful
Similarly to the last point, you may also want to learn something in case it turns out to be useful, and still know it well. For example, I can imagine that it is useful to know the countries of the world. I don't expect to ever need to know the full list, but it may be useful if someone comes from a country which is not often in the news that I show that I know where the country is. That probably show more respect for their countries than most people give. Or maybe just because if I need to take a trip there one day, I'll have in my mind the list of other countries nearby that I may want to visit. As I've no reason to learn this, I can take a lot of time, and in this case, it's not easy to see what to optimize. I guess I could state that I want to optimize the usefulnes of the time I spent using the spaced-repetition app maybe...
Optimizing the pleasure
I'm learning guitar and piano. Anki helped me a lot with it. However, there is no goal for me other than to be able to play music (and maybe cruise using it...). I'm also learning lyrics of songs, because it often create strong emotion to myself when I sing (badly) a song I like. There is probably no reason to do all of this. Maybe what should be optimized is the pleasure I spent, because it associates the use of Anki to something pleasurable... I'm not really sure of myself here, because anki allows me to concentrate on the more complex scales (on piano), which is not really pleasurable directly. It only get pleasurable later, when I can actually play well things that uses the skill developped thanks to the scales.
What to measure
The main trouble with the above proposal is that it offers little way to measure success. Even if students were to log and share their success on all exam they take
Why this is an imperfect setting
Who we teach
As I recently wrote, Anki is nice for people who have self-motivation. It does not solve in the slightest the far bigger problem of teaching to all children, teenagers and students that don't want to learn, that don't see the point (yet), who does not know how to read or who have trouble doing it. The problems I listed above entirely ignore this question. This question totally ignore the crucial question which is to ensure people actually use the software !
How to use domain specific knowledge
To teach how to recognize birds, mushrooms, some developpers use a huge database of pictures that are known to be correctly labelled, and each time the user uses the app, a new picture is shown. This ensure the user does not learn to recognize a particular picture, but learn to actually recognize the specy. This does not seems to be even possible in the current setting.
Similarly, Duolingo teachs foreign language, and their uses a lot of context that is available only for foreign language. For example, if there is a typo, they can guess what general grammar rules the user forgot, and so, they can make decisions which depends on the exact error the user made to decide what they should review. They can show hints for each word independtly.
Currently, the best solution I have heard of is the Super Memo algorithm, version 18; I must aknowledge that I have not spend the time required to understand it correctly and see whether the article contains enough information to implement it in a distinct software, but that is certainly something that should eventually be tried. Anki currently uses a variation of the version 2 of super memo algorithm, and it is obvious that 18>2.
I have a huge admiration for Woz, the creator of supermemo, and arguably the creator of spaced repetition. I am not convinced that, simply because he is the creator, he spent 35 years working on spaced repetition, and have access to a huge amount of data, nothing better can be done. I would expect that, even for a genius, there is a limit to what you can do mostly alone. Furthermore, during most of those 35 years, he would not have had access to machine learning and other modern tools to analyze data, and while his insights seems amazing when reading his wiki, it is not obvious to me that he did not miss any other important variables. I already explained that access to the data is possible even if potentially complex, but that does not mean that the problem can't ultimately be solved. Any scheduler requiring data of a lot of user is going to be late in the race with Anki and Super Memo, but that does not remove all hopes.
I should also mention that duolingo published some research, but it seems dubious that their system can be used for a non-centralized system. I went to human learning workshop, but I don't actually know how any general solution can be found from their research. In particular, while I offered to help them interface with anki and its userbase, they never contacted me. Seems a real user base is harder to consider than some clean data set obtained thanks from Amazon Turk with people in settings you can easily control. But maybe there are more research I missed that would be of interest.
 There are two lower case sigmas
 I guess that, especially in France where searcher are really badly considered, anyone showing your respect to your research subject is already more friendly than a random people you meet in social event.
 I believe they used to do so, but they stopped, not clear why.