Re: Why is machine learning 'hard' by S. Zayd Enam

Dawid Laszuk published on September 27, 2020

4 min, 775 words

I saw a few people referring to S. Zayd Enam's article as an explanation of why machine learning (ML) is hard. It is a good read with some visuals so I'd recommend reading it. However, I don't entirely agree with author's arguments and so I thought about highlighting a few nuances.

But, before we go throwing stones, let's make sure no one gets hurt. Since I'm late to the party (ML discussion parties rock!), my comments are biased from the "now" perspective. I'd like to talk about the craft rather than tools though it's often not possible as they overlap a lot. Also, Zayd has a disadvantage in that he had published a finite amount of text, thus throwing the anchor, whereas I get to focus on and judge all the details leaving him little space to clarify nuances.

First of all, I agree with the premise that machine learning is "hard". The hard will mean different things for different people and I might spend the next thousands of words trying to explain what "hard" exactly means but let's do a short neat trick to speed up everything. Imagine something difficult that would take a day of two to accomplish. The ML is not that; it's more difficult and it'll take you longer to accomplish. Ok, sorted.

Regarding nuances in Zayd's blog post. He compared ML development to programming and not software development. Programming is an essential part of both the machine(!) learning and software development. He somehow acknowledges this by keeping "implementation" and "algorithm" axes in ML problem space. However, with the introduction of the "model" axis means that we're focusing on a particular subfield of machine learning that has a model. Given that neural approach dominates the ML field that's likely what most people thought. But in that case we should also focus on some subfields on the other side. How about games development or peer-to-peer network software? I'm guessing that most people will now go either "well this is just silly" or "what's your point?". The point is that it might be more obvious that in these particular cases there would be additional axes to think about, for example "performance" and "consistency". Hold on, but going into details should reduce the problem space, not increase it? Yes. Well, actually, there are many many more axes, infinite to be precise, but because of where we stand and our life experience, most of them are trivial or hidden.

There are two points I'd like to highlight from this "observation". Firstly, machine learning in its current form is a relatively new field. It's lacking in tools compared to other popular branches of computer science/application because it hasn't been widely used for as long as the other ones. The necessity is the driver for development and we can see that there are frameworks building toward easier debugging. That indeed makes ML more difficult right now but it doesn't mean one being inherently more difficult than the other.
The other point is that most ML practitioners/researchers didn't study pure ML (what is "pure ML"?) before approaching it. This forces a mental rotation in the problem domain. We have spent some time crafting our approach toolkit but now it seems that those hammers aren't as useful as before. It's difficult because the path to the goal, and often the goal itself, haven't been widely studied.

What Zayd kind of mentioned but didn't go into details is the complexity. Machine learning is complicated. Same are other branches of computer science/engineering. However, I'll argue that the ML is an outlier on the complexity axis. An integral part of ML is the input data which don't even make sense… except that our brain tells us otherwise. All data have noise, be it recording error, or generation process, or quantum wave interaction with the whole universe. This is a problem because we're trying to understand and utilize some processes but we have little confidence whether our observations are general enough. Extrapolation is hard. Robust methods are hard. Understanding is hard. We're teaching machines something that we don't understand ourselves using the same input we were fed with.

All in all, ML is hard. I found it necessary to write out why I don't agree with the author's reasoning because understanding the problem is part of the solution. Maybe someone will write why I am wrong, bringing us a step closer to where we want to be?