DEV Community: Magnus Wuttke

Exploration-Optimisation: the systematic improvement of user- generated solutions

Magnus Wuttke — Mon, 21 Oct 2019 08:00:41 +0000

How the shared principles behind cookie speed eating and choosing the right box allow us to make optimum decisions as well as leveraging the power of crowdsourced improvements to open source projects

When I was a child, my friends and I used to come up with challenges like eating a yoghurt hands-free or cookie speed-eating. We all had 30 seconds to think of a strategy and then the challenge started. Cookie crumbs flew left and right and the yoghurt rarely ended up where it was supposed to, landing instead in noses, eyes and even ears. The initial strategies we devised were rarely successful and so the real challenge was to constantly improve on them or invent even better ones. We did so by trial and error while attempting to achieve some success with the best strategy we'd come up with in the meantime. To take the cookie challenge as an example, given the size, amount and dryness of the cookies, it was common to alternate between taking small bites in quick succession and gobbling them down whole in order to determine which strategy was most effective.

In its general form, this approach is used widely to solve any problem with an as yet unknown optimum solution, not just by children playing around. This trial-and-error approach, seeking to maximise the gains over a defined period of time, is a common learning paradigm. It plays a central role in the machine learning discipline of reinforcement learning, and is generally referred to as exploration-exploitation.

Part 1: the principles

Let us consider the following example to illustrate the general problem more closely: you have been invited to participate in a game show and are faced with ten boxes filled with different amounts of money. The game consists of ten rounds. During each one, you must choose one of the boxes to discover and receive its previously unknown contents. The now empty box is filled with the same amount as before and you get to play again. Your goal is evidently to obtain as much money as possible. Intuitively, given the constraint of having only ten rounds, we would seek to choose the combination of boxes which would yield the largest amount of money, but the problem is that we still do not know the amounts in each box. The only way to explore the contents of a box is to actually choose it and discover its contents. How do you know if the chosen box is the one with the highest amount of money in it? Only by knowing the contents of the other boxes. Once you have determined the desired box, you can exploit your knowledge by repeatedly choosing that box.

Exploration-exploitation

Technically speaking, you want to maximise the total gains over a finite amount of time by choosing the right actions from a vast action pool. Every new action (e.g. choose box no. 7) results in a well-defined but previously unknown outcome, its gains (e.g. the amount of money in box no. 7). You want to choose actions so that the sum of their gains is maximised, and we say that an action performs well if it results in higher gains than the majority of the remaining actions. The exploration-exploitation paradigm is a strategy that attempts to maximise the total gains. It performs exploration steps which harvest information about the available actions, i.e. the action pool, and exploitation steps which seek to maximise the gains using the information acquired:

Exploration is defined as choosing unknown actions and thus determining their gains. You discover how well, or poorly an action performs, and you become increasingly able to assess the average gains resulting from actions. This allows you to estimate whether an action performs well or poorly compared to the overall action pool without knowing the gains resulting from every single action.
Exploitation is defined as choosing the action that, in your view, performs the best.

Deciding when to explore and when to exploit is not an easy task, and is subject to research in reinforcement learning. A fundamental trade-off underlying exploration and exploitation exists, which is sometimes termed the exploration-exploitation dilemma. The more you explore, the less you exploit and vice versa. This is a result of the fact that you can either explore or exploit, but cannot do both simultaneously. An entire theory on how to best approach this in different settings exists, but, essentially, it all comes down to intuition: if you're lacking concrete information about the actions, you explore. Once you have enough information and think you have found the best or at least a well performing action, you exploit.

Where to eat? A real-life exploration-exploitation dilemma (Image source: UC Berkeley AI course slide, lecture 11)

Exploration-optimization

You have received another invitation to participate in the game show, but this time the rules are slightly different: in each round you can either open a new box or choose to increase the amount of money in a box which is already open. If you decide on the latter, a random amount is added to the money in that box. At the end of the ten rounds, you receive the box with the highest amount of money in it (of course only opened boxes are considered). In order to maximise the amount of money you receive at the end of the game, you want to find the box with the highest-value contents as quickly as possible in order to have a maximum timespan left to increase the amount of money in that box.

Generally speaking, you are no longer interested in maximising the total gains, but instead in maximising the individual gains resulting from your actions. This means that actions no longer have fixed gains, but you are able to randomly improve their performance. In addition to the a priori unknown gains resulting from the actions, you are unaware, beforehand, of how good the improvements will be. Therefore, let us define exploration-optimisation as a variant of exploration-exploitation:

Exploration is defined as choosing a new action and determining its gains.
Optimisation is defined as choosing a known action and improving on it.

Exploration-optimisation comes with the same trade-off as exploration-exploitation, since one cannot simultaneously explore and improve. The following section investigates the structure of a system which relies on an exploration and optimisation approach in order to achieve its goals. After applying the principle, we will make a connection between the actual system and the principle of exploration-optimisation.

Part 2: in practice

In my last post, Learning to code by creating open source documentation, I introduced an approach describing how to use beneficial synergies to improve the creation of open source code documentation and the learning experience of online coding tutorials. Open source code is integrated into online coding tutorials as reading exercises so that the tutorial's users learn to read and understand code written by other people. In return, the learner's task is to provide documentation relating to the piece of code they have been working on, which can subsequently be integrated into the open source projects to enhance usability and user-friendliness.

Learning to code by creating open source documentation: a beneficial synergy

Magnus Wuttke ・ Aug 1 '19

#opensource #tutorial #documentation

Specifically, the intermediary system facilitating this beneficial synergy aims to create a set of different documentation versions for a given piece of software in order to increase the odds of producing accurate and understandable documentation. After obtaining several different versions of documentation for the respective software, we must determine which of these is best. However, it would be unrealistic to assume that these initial documentation versions are already as accurate and comprehensible as we would like them to be, so we seek to improve them further. It follows that we aim to improve existing documentation, simultaneously determining how well it documents the respective code.

The main goals are as follows:

Creation of different versions of documentation for the given code
Systematic improvement on the documentation available
Rating the various documentation versions to determine the highest-performing solution

The intermediary system consists of two distinct parts which include mechanisms designed to achieve the above mentioned goals. The first part is a solution generator responsible for the documentation's initial creation. The second part, a feedback loop, is responsible for rating and improving on the different versions of the documentation. An in-depth look at these two core components of our system and the mechanisms they include is provided below.

Furthermore, we will assume that the Input step has already occurred, consisting of open source code being uploaded to the intermediary platform and tagged appropriately (e.g. programming language, dependencies, level of difficulty, etc.). Later on, the tags are used by the coding tutorials to navigate and filter the available code. Every time a piece of code is requested by a coding tutorial, the Distribution step takes place and the intermediary system has to decide whether to launch the solution generator or the feedback loop.

User-generated solutions

If the intermediary system decides to generate new solutions in order to expand the solution space, the distribution step constitutes the intermediary systems providing open source code to the requesting coding tutorial. In that case, the solution generator, consisting of the following steps, is launched:

Integration step: online coding tutorials integrate the open source code in the form of reading and documentation exercises.
Creation step: several learners work in parallel on the same pieces of code and create the required documentation to the best of their abilities.
Collection step: the different versions of the user-generated documentation are returned to the intermediary system and stored as part of the solution space.

While the Integration and Creation steps take place remotely at the coding tutorial, the Collection step involves the coding tutorials as well as the intermediary system.

Systematic improvement

If, upon request, the intermediary system possesses enough different documentation versions, the feedback loop will be triggered and the Distribution step constitutes the provision of one of the existing documentation versions along with the code. As previously mentioned, the main purpose of the feedback loop is the rating and improvement of the documentation provided, and its structure appears as follows:

Integration step: online coding tutorials integrate the open source code as well as the documentation provided within their learning experience in the form of exercises.
Rating step: the learners are asked to study the code and rate the documentation provided with a performance score (e.g. from 0 to 100), expressing their opinion on how understandable and accurate the documentation is.
Refinement step: the learners are asked to improve on the documentation according to their own needs.
Recollection step: the performance score and the improved documentation are returned to the intermediary platform, where the corresponding documentation is assigned its performance score and is updated to the improved version.
Discarding step: the system searches for documentation that has failed to increase its performance score over the last couple of iterations and discards that documentation.

The Rating and Refinement steps take place remotely at the coding tutorials, while the Recollection step involves both the tutorials and the intermediary system, and the Discarding step takes place internally within the intermediary system once all the information has been re-collected.

Note that we not only use the performance score as the final criterion to determine the best documentation, but also to track the evolution of a given documentation version and its improvements. Each time the documentation is improved on, the performance score is determined and its value sequence indicates whether the performance of the given documentation has increased or stagnated over time. When documentation stagnates, it is discarded and new initial documentation can be generated.

Theory meets practice

After this brief introduction to the mechanisms used by our intermediary system, let us take a step back in order to relate the abstract concept of exploration-optimisation to the concrete steps of the solution generator and the feedback loop.

By defining our set of different documentation versions as the action pool and the gains of an action as its performance score (describing how well that documentation fits the code), we can, in formal terms, describe our search for effective documentation as an exploration-optimisation problem:

Exploration is implemented by the solution generator, and constitutes the creation of new solutions. This is required as an initialisation process and after poorly-performing solutions have been discarded.
Optimisation is implemented by the feedback loop and primarily constitutes the rating of documentation verisions to determine their gains as well as of the improvement of said documentation. Part of the optimisation process is the decision to discard poor solutions in order to allow for increased exploration.

Schematic exploration-optimisation implementation in the form of a solution generator and a feedback loop

In theory, exploration-optimisation can be used to systematically improve on any kind of user-generated solution by defining the action pool appropriately, provided that mechanisms to determine the gains of your actions exist, as well as the means to alter these accordingly. Thus, exploration-optimisation provides a conceptual and abstract framework for building systems which aim to determine optimum solutions for a given problem in an iterative manner. The intermediary system I have described in this article is a concrete example of how to transition from the conceptual framework to a real system. I believe its mechanisms and functionalities can be extrapolated and applied to a vast range of similar problems. I look forward to seeing the solutions and approaches involving exploration-optimisation developed by others in future.

In the spirit of collaboration, if you have comments or ideas for improvements, constructive criticism or any other feedback, please get in touch (magnus.wuttke@librarylab.ethz.ch).

Learning to code by creating open source documentation: a beneficial synergy

Magnus Wuttke — Thu, 01 Aug 2019 08:00:21 +0000

How Duolingo and ReCAPTCHA inspire us to improve open source documentation and the learning experience of coding tutorials

For most people, captchas are not exactly their idea of fun, but rather a necessary evil. Captcha stands for "completely automated public Turing test to tell computers and humans apart" and they are annoying little tasks that consist of reading and typing in scrambled words in order to identify yourself as a non-machine. I always hoped I would get an "easy one" whenever a captcha popped up. Not the kind of illegible word that would make you start all over again and curse the inventor of this frustrating waste of time, especially if it is your 10th captcha in the space of 30 minutes.

Ironically, the idea behind captchas was to confound machines by setting tasks which they perform badly at, while being natural and easy for humans to solve, such as image recognition or reading handwriting. After a while, reCAPTCHA, "Captcha reloaded", as it were, emerged and suddenly you had to type in two words instead of one. Great, more madness!

Admittedly, I exaggerate a little, although there is no denying that I have a point. Of course, it had nothing to do with madness, and later I discovered that something had changed underneath the hood which no longer made captchas a waste of time, but transformed them into a powerful crowdsourcing tool.

Captcha reloaded

With more and more websites using captchas and an increasing number of people completing these little tasks every single day, the developers behind captchas came up with the smart idea of using the massive amount of manpower that goes into solving captchas more constructively. They founded a company called reCAPTCHA and redesigned captchas in such a way that the solving and authenticating process essentially stays the same, but with the difference that the answers given by users are now used to solve big data problems.

As the name implies, big data problems require huge amounts of data in order to be solved, and often the datasets have to be created by hand. Examples of such tasks are digitising handwritten books or labelling pictures for machine learning datasets. The method used to digitise handwriting involves making people read a handwritten word and type it in, and storing the result as the digitised version of the handwritten word, whereas picture recognition tasks simply ask you to click on images which contain certain objects.

Of course, there are several mechanisms to ensure that only correct answers are stored, but otherwise that's all it is. Incredibly simple and efficient. According to Luis von Ahn, one of the developers and founders of reCAPTCHA, they managed to digitise 2.5 million books per year, which in fact has been so efficient that by 2011 the New York Times' archives and the books on Google Books had already been digitised in full.

You can check this out and more in Luis von Ahn's (very funny and inspiring) Ted Talk about massive online collaboration. At this point, I would like to move on to another project he mentions in his talk which has become increasingly popular in recent years:

Duolingo

Co-founded by von Ahn, Duolingo is an online language learning platform which teaches you the language of your choice by letting you practice by completing short exercises every day. It is free to use and provides a possibility for everyone to learn a new language while, to quote the MIT technology review, "being an education that pays for itself".

How does Duolingo achieve this? Besides more "classical" options like language tests and obtaining certificates for them, Duolingo provides a very clever service: crowdsourced translation. Companies can submit texts, such as daily news, to Duolingo and request a translation into a specific language. Duolingo will then distribute the text to the students on the platform, who will work on it as part of their learning experience, before returning the translated text to the companies for a fee.

It is easy to see how the principles behind Duolingo's and reCAPTCHA's business models are very similar:

Intelligent crowdsourcing
Combined workflows
Clever usage of existing resources

Synergetic benefits

Let's take a closer look at the genius of these combined principles. Obviously, the problem, and the solutions, of identifying a human user, as well as that of digitising books already existed before reCAPTCHA. However, only by combining those two problems in order to solve them both simultaneously do you gain a truly elegant and efficient solution that creates added value - essentially out of nothing, simply by combining existing processes. The same goes for Duolingo, where required tasks such translating the news are completed while serving as examples for language learners. Both the translation and the learning aspects existed before, but only by combining them do you create a synergy that is beneficial for both. Translations become cheaper as they are done through crowdsourcing, and language learners get some hands-on experience and can learn a language for free.

Both reCAPTCHA and Duolingo are very popular, which in my opinion is due to the clever principles or "synergetic benefits" they are built upon:

2 for 1 - solving two problems with one action
Win Win - both sides benefit from the synergy
Recycling - reusing existing processes and solutions

Background

You could also describe reCAPTCHA and Duolingo as "interactive knowledge transfer platforms", and if you think of more traditional ways of transferring knowledge, you can't get around libraries. In fact, I have been looking into "synergetic benefits" as part of my research on open source software development for the ETH Library Lab. The Library Lab aims to rethink and advance information infrastructures and information cycles for science, research and education, which is why I took a closer look at open source software and its increasing importance as a digital public infrastructure.

Here is some background information, so we are all on the same page: the number of users and repositories on Github, the most prominent code sharing platform, has been increasing almost exponentially over the last decade, reaching more than 31 million developers and over 96 million repositories according to the state of Octoverse 2018. An increasing number of companies use or release open source software and according to the 2015 BlackDuck/Northbridge "Future of open source" survey, 78% of companies operate some or all of their systems on open source software. Moreover, the StackOverflow developer survey 2019 states that 41% of the developers polled improved their skills or learnt new ones through the contribution to open source software.

This increasing amount of open source software development can only be achieved with sufficient developers and maintainers working on it. Statistics show that as technology becomes increasingly present and important in our lives, the number of people wanting to learn coding has increased similarly to the demand for open source software. Surrounded by devices running and executing code all the time, the desire and, of course, the need to understand and design such systems is becoming more prevalent. Nowadays, the go-to place for code-learning is the internet. According to the aforementioned StackOverflow survey, more than 60% of the developers have taken an online course or worked through online tutorials in order to acquire new coding skills.

All about code

With a growing amount of software being developed, two factors become increasingly important:

The ability to read, navigate and understand foreign code
Code documentation

As software projects become bigger in terms of lines of code and more and more people collaborate on projects, developers need to be increasingly able to understand other people's code. This skill is far from being trivial and takes practice, much like when you were first learning to read. There are some interesting articles and forum posts about techniques and methods on how to navigate through and read foreign code in order to grasp its structure and understand its functionality. Unfortunately, as most online courses and tutorials lack in-depth exercises and guidance on how to do so properly, the transition from online tutorials to real coding projects can be harsh. In my opinion, the integration of code reading tasks and exercises in coding tutorials would be beneficial, but more on this later.

On the topic of code documentation, the Open Source survey states that the most common problem of open source is incomplete or confusing documentation. With an increasing amount of software becoming available thanks to the open source ideology, the concern about software has shifted from the existence of a certain software solution towards the usability of said software solution. Poor documentation raises the problem of software becoming unusable or being used in a wrong or insecure way because it has not been understood correctly. With open source software becoming public infrastructure, it is important that software projects are well documented and maintained. This is particularly crucial for critical projects which a lot of people and companies rely on. Improving the quality and quantity of documentation also improves the maintainability of a project, as developers can join in and understand a project much more quickly. Unfortunately, with developers spending most of their time coding, debugging and solving issues, documentation is often the lowest priority and thus many projects are either poorly documented, if at all.

The beneficial synergy

Let's wrap everything up and you will see how all these topics are related. As I am sure you will have noticed, the aforementioned problems of learning to read code and improving on open source code documentation are twofold, and you would like to solve them simultaneously so that both sides benefit. As part of our work at the Lab, we came up with a system based on "synergetic benefits" that aims to solve the two aforementioned problems, and I would like to share it with you as an example of how different work and knowledge flows can be combined in order to generate added value.

The system consists of a distribution platform, to which open source projects and online coding tutorials can connect. Similar to the crowdsourced translation service provided by Duolingo, open source projects can upload code to the platform and request crowdsourced documentation. The platform provides a standard API for online coding tutorials, through which they can pull code and integrate it in the learning experience as code reading exercises for their users. Once the code has been understood, people are asked to provide the documentation they would have needed in order to understand the given piece of code. Through an internal crowdsourced review process, the documentation is improved and then returned to the distribution platform, where it can be retrieved and integrated by the open source project.

Setting up the system in the described way comes with the following direct benefits:

Code learners working on the open source code learn to read code and get used to working with other people's code.
By formulating the documentation for the given piece of code, they can test and consolidate their understanding of the code.
Undocumented open source projects will be provided with documentation, making them more accessible and usable.
Documentation will be improved by being written or extended by people who are not part of the project and thus have an outside view on the matter.
Documentation will become more user and especially beginner-friendly.

The scientific community is seriously discussing the possibility of integrating code used for obtaining results in research papers and making it publicly available in order to render results verifiable and speed up research processes. The system presented would indirectly benefit this, since less time would have to be spent on working through poorly or undocumented code and more time could be spent on actually working and experimenting with the code.

This begs the question as to why online learning tutorials would not directly pull open source code from GitHub repositories and then return it, and why you would need an extra intermediate platform for this. Neither open source projects, nor coding tutorials are specialised for such interactions, so they would probably be on a small scale and relatively inefficient. An independent intermediate allows for multi-platform interactions, such that any Coding platform out there has the possibility to integrate open source code into its learning experience. Moreover the platform fulfils the task of sorting the received code in order to provide the choice of different levels of difficulty and to make sure online tutorials receive code that matches their topics (machine-learning code to machine-learning tutorials and java code to a java tutorial for example). I see the biggest potential here in providing a standardised access point to make sure the knowledge exchange is made as easy as possible and enable a good scalability of the approach.

All about collaboration

Only by building on the many great resources that already exist can this approach develop its full potential so that the system is entirely based on extending existing structures and collaborating with open source projects and online tutorials. This seems to be a promising field for classical institutions in order to extend their reach beyond traditional services and remain relevant in the future. I am convinced there are many other problems that can be solved efficiently through the use of "synergetic benefits" and I am looking forward to see what other people come up with.

In the spirit of collaboration, if you have remarks or ideas for improvements, constructive criticism or any other feedback, please get in touch (magnus.wuttke@librarylab.ethz.ch).