Collaborative decks in Anki

A lot of people want to create collaborative deck for Anki. In September 2018, I had already made quite a few add-ons, and some people contacted me thus to discuss collaborative decks. It has always been in the back of my head since. I'm going to try to write down every thoughts I had and why it seems quite complex.

There are multiple reasons I have not done it yet. The main one may be that I have not yet learned how to do real web/online programming. I taught myself Python and JQuery to modify Anki; I taught myself Android programming to modify AnkiDroid; I could teach myself web programming to create this. The problem being that I would need to do it from scratch, which is quite more complex than just editing an already existing codebase.

The second reason is that there is a lot of non-trivial questions to consider. There are a lot of cases where multiple user experiences would make sense, yet them being very distinct. I would prefer not to implement anything before I what I would like it to be. I doubt I will create such a website in the near future[1], instead I will just try to list all of my thoughts, hoping to help anyone who try to actually implement all of this.

Of course, I'm not the original author of all thoughts described here; I discussed this with a lot of people and am not able to credit all of them. I at least tried to consider everything mentionned about it recently in a subreddit post about this subject, and the related discord discussion.

What are Collaborative decks

Deck creation

The simplest definition of a collaborative deck is a deck which content is provided by at least two people. In pratice it may means a lot of different things. There may be multiple authors working together, and wanting to have a nice interface to collaborate. There may also be a main author, wanting small feedback from users, like typos corrections, proof-reading and double checking the content. It may also means having a deck created, and allowing anyone to it to ensure it contains the newest informations. For instance, this could be about a deck of the prime minister of each country, which would need to be up to date.

Obtaining the deck

Let's assume you find a collaborative deck online that you like. It is mandatory that the tool can add it to your anki collection really easily, with as few click as possible. Similarly, you must be able to check whether the deck has been updated, and get the latest version easily without loosing your review history. In particular, if you want to edit the content of a note, you first want to see what other people did recently, so that you don't loose time making a correction somebody else already made.

You also probably want an history of decks, so if a deck is defaced by someone malicious, you can revert the changes.

Why collaborative decks

There is a common argument against collaborative decks; I'm going to try to answer it.

This argument is that one primarilly learns by taking notes. Writting down ideas, figuring out what is important, what should be learned and what can be omitted, are all parts of the learning process. It may be okay to learn the list of country without making the deck yourself, but you don't want to learn a serie of complex definitions without being sure that you understand what they means. Making the card yourself ensure you understand their meaning, and that the definition uses your own words.

This argument is convincing, and indeed, if you have the time, personal decks are far superior to shared decks. This does not stop us at all from sharing decks on ankiweb or by email. We don't always have enough time to create notes, and so learning notes made by other people may be a quite acceptable compromise. I believe that actually, a deck made collaboratively by a bunch of people working together and improving the work of other may even be an improvement over the current shared decks of ankiweb. So I still believe that having a collaborative tools for deck is still interesting.

Existing solutions

I will now list solutions that already exists. Some solutions are fully implemented, some are only projects.

Collaborative decks.

Some collaborative decks already exists. I will list the solution they use:

Ultimate geography

The deck ultimate geography lists all countries, their flags, capital, the continents, the seas... I would have assumed that this deck does not change too much, and I would have been quite wrong.

All data are saved in a csv file. In order to edit this file, you need a github account. Then you can simply edit the file on github, and click on the button "submit". You can also fork the repo, clone it on your computer, edit it, and do a pull request. This last step is probably easier if you're used to deal with CSV and have a software to do it efficiently.

Whatever choice you make, to make your correction, you need to find the correct line (out of 322) and the correct column (out of 34), and put the change there. Of course, with the proper software it's not that hard to do. But in no case is it as quick process, as correcting a typo on wikipedia is. And you also need to learn how to add/change an image when a flag change, if you have a better map, and so on...

AnKingMed

AnKingMed[2] is a person/team/website/series of shared decks related to US Medical School. According to The AnKing, there is a lot of contributors. Contributing is far easier, as it suffices to edit a Google[3] document. He then spent a lot of time porting the informations from the document in his deck.

They also had to make a youtube video to explain how to download updates of their decks while letting user keep some personal notes using an add-on[4]. It shows that it's still not really practical since you need to learn how to use this tool.

Tools Here are the tools available nowaday

Exporting and importing

Decks can be exported and imported, which allow for simple sharing. Furthermore, importing a deck containing notes already in our collection may update those notes.

CrowdAnki

CrowdAnki is a software/add-on which allow to create deck and collaborate by github. It's pretty nice I assume, but still need a lot of click to do any changes. You need to change your deck, export it, commit, push, make a pull request, etc... And you need to understand the ideas behind git. The readme is far too long for something that should be used by virtually everyone.

Note that to get translations of ultimate geography, you need to use CrowdAnki.

Card Overflow

Card Overflow is a project which may eventually answer the problem of collaborative decks. As it is, it's still a work in progress, and hard to state whether it will actually works. However, since its author mentionned it on discord and reddit, I believe it should be mentionned here too.

Brain Brew

Brain brew was advertised as a solution on discord. I must admit that I can not really understand what it is supposed to do and how we should use it. It may sounds pretentious, but I fear that it means it can't be used by the average user today.

Google sheet

On discord, Conan states having a solution using google sheet in an add-on.

Mandatory features

I am now going to list what I believe to be mandatory features of collaborative decks.

Correcting small mistakes

The key aspect is that correcting a typo/simple mistake should be extremly easy to do. Correcting a typo in anki is easy: you open the editor, correct the typo, and close the editor. There even exists an add-on which allows you to skip the part where you open the editor[5]. Sending a correction should not be any more complex. For instance, a small button could appear in Anki's note editor to allow you to send the correction. At worst, the editor will give you a link to a webpage where you can do the change. The process could also ask you to explain why you are suggesting this change, so that moderator, mainteneur and/or other user can check that you are positively contributing to the deck.

Updating the decks

Getting the latest version of the collaborative deck should be an easy process as well. Ideally an automated process, so that each correction get shared with every user of this deck quickly. This would also ensure that people don't suggest to correct a mistake they saw on their device that was actually already corrected by someone else on the shared device. It would also ensure that you don't learn any mistakes. When you sync Anki, it should update both with ankiweb and this add-on.

Keeping some informations for one self.

You may want to ensure that some fields are never updated. Traditionnally, in the AnKingMed decks, there is a field for personal notes; this field is kept during updates, while other fields are all changed. This is done thanks to the special fields add-on[6]. A similar feature is mandatory for shared decks.

Moderation

Contributions should be moderated. Someone should have authority to remove a deck if its content were to be ilegal[7], and to cancel modifications which are clearly wrong. Wikipedia actually has bots which ensure that big deletions and commits which just add some obscene words are reverted immediately.

Questions to solve

Let me quote The AnKing, which explains why the project is complex.

I think this project is actually far more complicated than it seems on the surface

1. It would need to be extremely simple so anyone using the deck would feel comfortable submitting changes such as spelling errors. It would also need to be extremely simple to update decks (ideally as or more simple than the current version using the Special fields add-on). The current crowdanki is a good example of something way too complicated.

2. I believe it would require a significant amount of customer support, updates, maintenance, etc. I would imagine there would need to be some continual source of income for this to be feasible. I don't know if patreon would be successful enough.

3. Many of the decks could be shared on the website/platform directly because they have copyrighted images. With my medical school deck I share the deck without images, but if it is updated on top of decks shared anonymously on reddit, people can get the images. This issue would have to be resolved somehow (i.e. a way to import just images)

4. This would need to be simple and efficient for deck creators. I have everyone submit changes via a google sheet and then i go through that sheet and put a significant amount of time into making changes requested by everyone (spelling, content errors, etc). Ideally 1 deck creator could have the final say on whether a change actually gets included (i.e. "accepts the pull request"). I have found in my time doing this that about 25% of changes requested are wrong, so there definitely needs to be a way to control this. The conflict resolution part of this app would be absolutely critical for it's success.

5. This would have to support thousands of users. I don't have an exact count but I would guess there are at least 10,000 people using my medical school decks.

Now, let me consider the questions I considered myself. Note that some of those questions will only make sens if you already know anki codebase, or at least the way it saves its data.

Centralizing

An important question is how centralized should the solution be. That is, whether someone should be in charge of the whole collaborative deck process and tools, or whether the process should be distributed on everyone's computer and/or on many personal websites. Currently, with crowdanki, the process is as decentralized as git. The problem is that when a person want to learn how to use CrowdAnki, they must learn how to use git, github, and figure out how to install every tools. On the other hand, if someone want to participate to a collaborative deck which is simply shared on a website, they only need their browser and Anki. For the remaining of this discussion I'll thus assume that our goal is to have a website which allow to share collaborative decks. Of course, decentralized services[8] are nice, but that also brings a lot of trouble, and I don't expect that it's necessary to deal with decentralization questions before being sure that people are actually interested in collaborating on decks.

Note that even a centralized website does not forbid the user to make modifications offline, in their anki devices.

What to reuse, what to build

A lot of software already exists to deal with anki's collection and we should avoid reinventing the wheel. So the question is: what should we re-use and what should we create.

Anki backend

Anki's backend is in python and rust. This backend deals with the collection itself, and does rarely consider how to present the collection's content.

It seems quite easy to use anki backend on a website. However, a big part of the backend seems to be of little use for collaborative deck. Scheduler and statistics would be useful only if we also allowed user to review[9]. Deck manager would be only useful if we wanted to deal with a collection, not with individual deck. Etc... Finally, there remains mostly the only interesting modules in Anki's backend code are the ones dealing with notes and the cards.

Wiki

An idea I've seen shared many time is to use a wiki website. At a first glance, it would seems to be a good idea. After all, if all changes are made on the wiki and the computer only synchronize from the wiki and does not update changes there, it may be an easy way to deal with modification of note type. Since there is far more notes than notes type, it would probably be okay. Especially if most notes uses one of the five classic note type (basic or cloze).

However, when I consider the questions I'll list below, it seems hard to see how to adapt a general wiki system to solve them.

Logging all contributions

Should we be able to attribute all changes to someone ? On Wikipedia, all changes are attributed either to a user, or to an IP address. There is no obligation to be logged in general. Logging allows people to vote for moderators, they allow people to edit some protected pages, but that's mostly it (unless you're yourself a moderator/admin.)

Is there a deck owner

No deck owner

In wikipedia, no one owns any page. Most page can be changed by anyone, except in very specific cases. We can go this way. However, in order to do that, wikipedia must have a discussion page for each page, and a conflict process. Otherwise, edit war could go forever and a page keep changing. It is not guaranteed that similar problem would not occur. For instance, let's take a geography decks; there is a lot of heat on whether some parts of the world are indendant countries or not, and where their borders are.

One particular important question is: if modifications can be done on a computer and synced later, what should occur if multiple people update the same field of the same note in different way. Should we only keep the last update ? Or should we reject the last one because it's not based on current deck ?

With a deck owner

On the other hand, if there is a deck owner, they have the final word about what is and is not accepted in their deck. But then, we should decide what occurs if the owner disappear. That's already an existing problem with ankiweb, some add-ons are not updated anymore and someone else has to upload a new version with a similar but distinct name. Users of the old version do not know that there is an updated add-on by someone else. I believe that, in such a case, it should be nice to let them know that they can switch to an updated version of the deck. Similarly, if someone believes the deck contains a mistake, and the owner does not want to correct it, it would be nice that there exists multiple version of the same deck with distinct owners. This way, work is not duplicated, copyright is clearly attributed, and both versions are accessible. Furthermore, if the former deck update, it would be just as nice if those updates could be ported into the latter one. In developer terms, I am just saying that it would be nice our system had some standard git features, like allowing fork, merge and maybe even rebase. However, even if we were to allow for all of this, it should only be an advanced feature. The most common operation, such as correcting a simple typo/fact, should not even require to understand a concept such as "pull request".

Should we allow note types to change ?

It may seems silly to restrict the allowed changes in anyway. For example, adding a new field in a note type is not supposed to be difficult. However, in practice, doing such a small change entirely breaks Anki's synchronization process. The solution Anki chose is that, when you change a note type on your computer, it decided that your computer has the correct official version of your collection, and all other devices have error in it. So it will copy the content of your computer on all other devices and erase their content. While it is a relatively acceptable solution for anki[10], it is not acceptable for collaborative deck. If a deck suddenly gets a new field (e.g. if someone want to add the name of every country in a new language in Ultimate Geography) we can't tell the user that they must choose a primary device and that data on all other devices will get erased to be replaced by the data of the primary device.

It means we need to code ourselves update of collaborative decks, and it is not easy. For example, if a template(a.k.a. card type) change, some note may have cards created or deleted; in this case it's not clear how this update should be propagated. In particular, if a user decides to mark a field as not-updatable, what should we do if this field gets deleted ? If a field is moved to another position, we have to keep track of the field position at each step of the process, to know what changes should occur to the fields of each note.

Some of those problems could be solved very easily if we designed all of anki from scratch, for example by ensuring that each field and card has a unique id, which is guaranteed to always be the same. We can easily add such an id, since templates are encoded as a json object/python dict. However, we have no guarantee that such informations will be preserved by all add-ons and (third party) app dealing with collections.

We could also, of course accept, deletion/addition of fields, but refuse to allow to change their name; however it seems to make little sense.

Deck copyright

What should be the copyright of collaborative decks ? The simplest answer would be to imitate wikipedia and force all content to be under a free license. That may also be too permissive for some users, who want to forbid some changes to their creations.

If the website does not force a single licensing to all contributions, does it automatically have to deal with all the complexity of licensing, sublicensing, etc... and keep track of who has what rights ? It seems like a problem which is going to take far far too much time to solve correctly.

Copyfraud

Some shared decks may infringe on some copyright. In particular if they use some images, graph, content, from a book, a website, etc... However, it's far from easy to establish whether a claim makes sense or not. Personally, I make a lot of decks based on mathematics and computer science books. I read a mathematics book and create cards from it. It's going to be hard for the author to assert that they have the copyright on a mathematical definition, a theorem or of a library function. However, they can probably claim copyright over the order in which the content is presented, the facts they decided to emphasize, and so on. So I do not even know how much I could share of my own work.

I expect people to accept to recreate images for Collaborative decks. After all, people create images for free for Wikipedia. This is due to team spirit, to improve the data available on the web. However, it's going to be hard to recreate complex images, important pictures, well known art depiction, etc...

The problem is particularly important when one creates a deck from a book/pdf... Usually, what one does is to select a sentence and apply cloze deletions to it. Given that a sentence is literally taken from the original book/pdf, sharing it in a deck would be forbidden as it would be copyrighted.

Note that making notes using public domain or Creative Commons (CC) content would be great. In particular, wikipedia is under CC and it's entirely legal to reuse their content; however, it's not really often in my experience that wikipedia really helps to learn a subject. I consider that Wikipedia is nice to discover interesting factoids, such as an exact definition, the creator of a concept, etc.. but I have never been able to use wikipedia to really conceptually understand a complex system on a deep level. Hence I fear that the use of wikipedia is very limited to learn complex concept.

Money

Maintening an online website is costly. It takes bandwidth, disk space, and time to ensure everything keeps working. While creating a proof of concept for a few users should be doable on a single server with a better version of sqlite, if the service is to be succesful, someone will have to pay for its continuation. It is far from clear how to get this kind of money.

The first question is: who should pay ? We can not really ask people to pay in order to share their deck. We want them to share their deck, not to disincentivize them. It would also be hard to ask people to pay in order to download decks. On the one hand, we can't stop those people from sharing the decks with other users. Furthermore, if we ask for downloaders to pay, we get money from the content of contributors, which is ethically wrong if we don't pay them. To be more precise, I'd be okay if the cost cover the server etc... not if the admin is the only one making revenue from the work of others. Note also that paying countributors of decks would be a nightmare in terms of tax declaration.

The standard answer to this kind of problem is advertising; I really hope we can avoid that.

Giving advantages to people who pay

We could also just ask money for technical support, for help in improving deck quality (for technical questions), or just to have access to regular updates to the corrections of the decks. As far as I understand, this would be similar to what Glutanimate does for his patreon.

A few ways to fail making money.

Of course, we could also ask for donations. For all of my contributions to anki, my patreon gets less than 50$ a month, I believe I can tell with certainty that it does not work for such a small community.

I am pretty sure a crowdfunding would not work. I did one for the most wanted feature for anki, according to a vote, and I had trouble getting 300€, which was really not a lot given the amount of works it required. I doubt you could successfully finance a server through this in the long term. In particular, I should mention that the medical school anki subreddit prohibits posting about crowdfunding; so participation can not even be asked there.

On the other hand, I should note that anki has at least a million regular users (according to ankidroid's google analytics. I.e. counting only the people using the Android version), so a cent by year from every user would allow a good starting fund. However, I know no way of getting one cent by person using anki[11].

TODO:

Here I list feature I can imagine, that would be nice, but not mandatory to start.

Super decks

I believe that it would be nice to be able to get a super decks. You can decide to takes a collection of decks about medicine, mathematics... or just a single deck about a specific topic.

Translations

It would be nice to allow deck translation. Ultimate Geograhy is a proof that it can work. I believe it would be easier to do than creating a deck about the same subject in each language.

Other variations over a single deck could also be configured.

Discussion

Allowing discussion about a particular card/note. This can be done on duolingo and seems to make interesting discussions sometime. This is also done on wikipedia and is quite useful if something is not clear, if there is a potential idea to improve some content, but some uncertainty too... And as suggested by someone, it would also be good to post memes there, in order to increase the fun of anki.

Card order

One of the most ridiculous mechanism of anki, in my opinion, is that each note has an "id", and that this "id" has multiple meanings. As any id, this value is unique in a collection. I.e. two distincts notes have distinct ids and two notes with the same ids are considered to be the same note. The "id" is the creation time in milliseconds. The great thing with this is that two users will have distinct notes with the same id if, by accident, they created notes at the same miliseconds. Given a userbase of more than a million people, it has probably already occured that two people created a note at the same milisecond. However, most notes are never shared, so I expect that it never created any trouble in practice. The trouble is that "id" is also used to display to the user the creation time of the note. New cards are usually shown to users in the order in which there notes were created. It means that it's almost impossible to change the order in which cards are discovered, because it would require to change a unique ID. While it can be done (and is risky) in a single user database, dealing with this problem in a shared environment is a nightmare.

The only good news is that each note also have a "Globally Unique ID"; this id is usually used only to test equality while importing deck. The main problem with it is that most people (even devs who have not dealt with importing) does not know GUID's existence, and so I fear it will lead to strange bugs in all of the code which assumes's note ids to be constants.

What not to update

It should be possible to let the user indicates which content should not be updated. For example, they may enter personal notes in a field, such as "what we saw in first week" or "mom had it in 2017"; this may help the review process for the user adding the note, and be entirely irelevant to everybody else. The trouble is that, even if the user should be able to keep a part of its notes distinct from the note in the collaborative deck, it is not clear how much choice the user must have. You may want to skip updates for one specific field, for all fields of a note type, for a whole deck, for a whole note type, for a specific note... However, allowing that much freedom means that there will be a huge number of button just for this particular feature; it seems to be too much.

I thanks u/joy_void_joy and Shaddy for proof reading. All mistakes remain mine.

Notes

[1] I started a new job the first of May, I expect to have less free time after.

[2] Disclaimer: The AnKing has been a client of mine and helped me find clients. I am thus not neutral in discussing his contributions.

[3] Disclaimer: Google is my new employer.

[4] Disclaimer: I was paid to improve this add-on.

[5] Disclaimer: I slightly contributed to it by adding the feature of resizing images. This feature was paid for by a crowdfunding.

[6] Disclaimer: I was paid to improve this add-on.

[7] Illegal content may of course be copyrighted material. It may also be someone trying to share content which falls unders anti-discriminatory law, etc...

[8] Are there still people prefer Twitter over Mastodon ?

[9] Being able to review cards and do everything from IOs/AnkiDroid on a website would be great, but it's not the same subject.

[10] I'm not a fan, but I can't change this without changing ankiweb

[11] and anyway, asking people to accept the terms and conditions of the payment system is already more costly than one cent.

Add a comment

HTML code is displayed as text and web addresses are automatically converted.

Add ping

Trackback URL : http://www.milchior.fr/blog_en/index.php/trackback/762

Page top