DEV Community

loading...

Wikimedia internship: Modifying Expectations

Liudmila(Jade) K
Aspiring IT student, computer linguistics enthusiast. Outreachy 2020 Wikimedia intern.
・2 min read

Planning for me usually leads to the one of the two results: it either becomes a good plan for me to follow, with optimized schedule and everything, or this leads to the ultimate disaster, when nothing goes as planned. Of course, you should strike to the something between these two points, but it almost never happens, at least for me.

And, sure, for such a big project, as my Outreachy task, sticking to the plan in each and every aspect is not an option - it's too big, takes a lot of time and, basically, too many things can go wrong. The first thing, that could be considered as the "wrong" one (but not really) is that I'm not the only one intern chosen for this project, there's two of us - and working in a group is always different comparing to working alone.

So at the start of my internship, I decided for myself, that, yes, I have some kind of plan, but really I do not. I decided, that rather when having a plan, I have a trajectory, and I'll stick to it. There's these main milestones, that I should head to - but less strict planning is more handful to accommodate different issues. And this approach showed itself as quite a helpful one - but it doesn't mean that my expectations weren't bended by reality.

The biggest change for me was a realization, that current project and dealing with source code analysis has some of it's features that are quite different from natural texts, for example, amount of words, used in texts, and increased importance of the word order. These facts may look obvious, but understanding that is really important, as it affects your decision on which algorithms and approaches should be used.

Another important thing I didn't realize when I started my internship is that the amount of data I have to work with is really huge. My project is connected to the functions, which are used inside different wikis, and this information is not really on the surface level; regular Wikipedia reader would probably never think of something like that. But, well, there are really a lot of functions, doing different stuff: from displaying correct pronunciation to showing different message boxes.

This fact lead to another unexpected result - computational speed is way more important than precision. The end user will be the one, checking whether the algorithm prediction is correct or not, and the amount of data is about two hundred thousands of code examples - so computational time is our worst enemy. And this is quite unusual, at least for me, as most of the papers prefer accuracy of prediction as a desired metric.

So, my vision of the project changed a lot through these weeks of research and hard work. And I'm quite sure my insight would be different by the end of the internship. But the most important is the fact, that in the end the community will get a tool, that will make everyone's life a bit easier.

Discussion (0)