DEV Community

ProgramCrafter
ProgramCrafter

Posted on • Originally published at lesswrong.com

User-inclination-guessing algorithms: registering a goal

Reading post about Community Notes in X made me recall a recent idea and post it here.

It's commonly known that ranking algorithms based on estimating user's views and inclinations can be incredibly useful in recommendation systems, advertising, and content curation. But there must be other applications which do not require to create one's own social network, for instance!

There actually is a task which I believe to be a low-hanging fruit: predicting participant's score in olympiads; preferably with dataset available in the Internet, large scoring range and various tasks - that is, in informatics.

My thoughts about the prediction system

Olympiad tasks in informatics tend to use a lot of ideas and one or two algorithms, which makes them different both in topic and in difficulty. This may be represented by a vector in a high-dimensional space (based on task tags list on Codeforces, there might be at least 24 dimensions), where projection on each axis means usage of knowledge from the corresponding topic.

Let's make user embeddings also vectors in 24-dimensional space, and measure compatibility between a person and a task as projection of person embedding onto task vector, scaled down corresponding to the latter. If p_i is person's vector, t_j - task vector, then  compati,j=pitjtj2compat_{i,j} = \frac{p_i \cdot t_j}{t_j^2}

Illustration

I want score prediction to be straightforward, so I map compatibility from  [0;1][0; 1]  range to  [0;100][0; 100]  expected score, and everything outside that range to the closest bound.

I'm currently struggling how to fill variance, and guess that a good idea is to use one of embeddings' components for this. Competitions have an element of unpredictability, and I want to be able to distinguish predictions "participant will get 40-60 points" from "participant will get 20-80 points". And, while system is training, some variance is to be expected after all.

Calculation options

I'd be glad to go forward with Bayesian updates, but unfortunately the problem space is  R24(people+tasks)\reals^{24(people+tasks)} , and pieces of evidence are not independent as well; so it will be easier to use usual learning algorithms from neural networks.

I've also found out that calculating scores will be quite easy there! Let's demonstrate this on 4-dimensional embeddings, 3 tasks and 2 people.

    >>> import torch
    >>> tsk = torch.tensor([[1,0,0],[0,0,1],[0,3,0],[0,4,1]])
    >>> tsk
    tensor([[1, 0, 0],
            [0, 0, 1],
            [0, 3, 0],
            [0, 4, 1]])
    >>> ppl = torch.tensor([[1,1],[1,0],[1,0],[1,5]])
    >>> ppl
    tensor([[1, 1],
            [1, 0],
            [1, 0],
            [1, 5]])
    >>> (ppl.T @ tsk) * tsk.square().sum(0).reciprocal()
    tensor([[1.0000, 0.2800, 1.0000],
            [1.0000, 0.8000, 2.5000]])
Enter fullscreen mode Exit fullscreen mode

So, for instance, we see that second person is well (but not exactly, only 80%) capable of solving second task.

Goal and call for help

I expect to create a somewhat working prediction system and I'm going to describe it in a separate post sometime this year! Bets whether I'll actually do this are on Manifold!

I'd like to hear if you have ideas on ML algorithm to use, or on domain to test the system next time. Criticism (except that such recommendation algorithms in social networks might be supposedly harmful) is also welcome!

Top comments (0)