The new software: Less coding more data

Software like everything is evolving but it is evolving differently than I thought. When I was studying computer science at the university I thought that the future was parallelism. We were taught only one class in parallel programming. Multi-core computers were on the rise and it seemed to be the thing to learn. Since then my opinion has changed. There is indeed the need for parallel programmers but it is not as big as I had foreseen. Most of the libraries and already implemented code that needs parallel programming are already implemented. Some basic notions are required and every now and then.

They also teach “sequential” (regular) programming at the university. Regular programming until now is about giving specific instructions on what actions the program should take. Each line of code contains a specific instruction with a defined goal. The programmer wrote code indicating at each step which actions the computer had to perform without leaving room for the computer to improvise. All cases are defined in a way or another. If not, the program breaks. This is also true for parallel programming where the code defines the actions to be performed, and in which order. The difference between single-core and multi-core is that single-core executes all the instructions in a sequential manner whereas the multi-core can have multiple executions of different parts of the code that do not necessarily follow the same paths.

On the new paradigm, the one that Andrej Karpathy made me notice, and I agree with, software is abstract. Software becomes the weights of neural networks and as humans, we cannot interpret nor program it directly. Therefore the goal in these instances is to define the desired behavior. The software developer should then define that for these sets of inputs we want this other set of outputs. For the program to follow the right behavior, we need to write the neural network architecture to extract the information from the domain and then train it so that the program searches the space for the best solution. We will no longer address the complex problems using explicit code; instead, the machine will figure out by itself.

Software is certainly changing in a new direction. There are many instances where it may be easier to gather more training data than actually hard coding an algorithm to perform a specific task. In the new software era, the coders’ tasks are to curate the datasets and monitor the system. The system is optimized to perform the task in the most accurate manner. “Old school” programmers will be sill in need in the same way there are still people who code for a living on low-level languages. Old school programmers will develop labeling interfaces, create and maintain infrastructure, perform analysis, etc.

Nowadays it is clear that neural networks are the clear winner over hardcoded instructions in many different domains. With the current software, McKinsy states that 30% of companies’ activities can be automated. Machine translation, image recognition, text analysis, and games like chess or even ‘League of Legends’, which requires an advanced understanding of the universe, can be automatized by computers. Google reduced 500 000 lines of code to only 500 from the translate program thanks to TensorFlow and neural networks. These are the classic examples where deep learning is straight forward and can shine but there are other less intuitive (and less sexy) domains where huge improvements can occur like data structures and databases. In this example of not sexy publications, the deep learning software was up to 70% faster and used an order of magnitude less memory than the traditional software. As the last example, I would like to bring this article where researchers brought this idea to the extreme. They created software that does not even require to define the model to be used and instead finds it for the user.

In conclusion, traditional software, the one that we have been using until now, will remain. There are some instances like law and medicine where black-boxes cannot be accepted and won’t be tolerated. Other times it will be more cost-efficient and easy to hard code the features instead of preparing training data and letting the model figure out. The new software era will be the one where coders will not explicitly write the course of action for each case, instead the neural network will find the best solution for a given input. The software will find out the best solution for a given problem without the explicit instructions of a human. The new paradigm software will become more prevalent and along with these lines new software tools will be developed.