Von Helmuth Fuchs
Moneycab: Mr. Paskalev, DeepCode provides a platform that makes suggestions to programmers as how to improve their code. Do you have any quantifiable data to highlight how much existing code can benefit from this improvements?
Boris Paskalev: We have a number of KPIs (Key Performance Indicators). The main one we focus on today is how many serious suggestions we find. Today we find approximately two times more serious issues than any of the existing tools. In addition, we offer a much higher precision, so when we find such a suggestion there is 80% to 90% probability that it is real and applicable in your code. Other tools rarely reach 50% precision.
«Considering that developers spend on average 30% of their time in searching and fixing bugs we can save 10% of their time today and much more over the next 1-2 years.» Boris Paskalev, CEO DeepCode
There is nice graphic, already form 1996, exemplifying the time saved converted into money. But not only let we the developers know about potential issues right after they finish coding their ideas, but we also provide examples of how the issue should be fixed. The alternative is to have issued identified at a later stage and then returned to the developer: this leads to many hours wasted by multiple people, coordination and time-delays. Considering that developers spend on average 30% of their time in searching and fixing bugs we can save 10% of their time today and much more over the next 1-2 years.
In order to get good results, your platform uses existing code to learn from and then applies artificial intelligence and machine learning techniques to make recommendations to improve new code. Where do you get the “training code” from and who checks the quality of this code so your platform does learn with good training material?
We look at close to 100’000’000 fixes completed by open source developers in the main Git repositories like GitHub, GitLab and Bitbucket. We have proprietary pre-sorting which code-bases we use, but ultimately the Machine Learning algorithms we implemented seed-out any bad code. When people fix something in a wrong way it is very rare to see many people fixing it in the same bad way, while when you fix something the right way, there is a large number of examples where the same correct solution is used. Since our comparison is based on a proprietary semantic representation and not on syntax, we end up having hundreds or thousands of samples backing up each of our suggestions.
«We look at close to 100’000’000 fixes completed by open source developers in the main Git repositories.»
The biggest gains could be expected if Deep Code would be used for code that is distributed to millions of clients, like the business software from the giants like SAP, Oracle or Google. Did you already get any feedback from such companies, does any of the big software companies intend to use your software?
Yes indeed, we have spoken to many large companies both in terms of their daily needs and in terms of their efforts to implement similar ideas to tackle the problem. My co-founders are the leading experts in the space and continuously share with the leaders from around the world and the leading tech companies the best practices and developments in this space. We are lucky to have 6+ years of focus on this area where we have combined a large number of published and private research into a scalable platform that we have tuned and which we continuously expand.
One very sensitive area would be the potential security problems in existing software that powers vital infrastructure like airports, energy plants, hospitals, defence systems etc. How could your software help to prevent such problems and are there already plans to specifically provide solutions in that area?
As of Jan 2019 we introduced a dedicated Security analysis for Python. This novel analysis builds on top of our platform which has already proven to find security vulnerabilities that no existing systems can find. Our platform automatically tells how many of those exist in the open source community and the initial version identified more than 35’000 such critical security vulnerabilities from the OWASP Top Ten list.
You are dependent on good training code, for example from good programmers or open source code. Do they in any form participate on the revenue you generate with your solution?
The way we give back and enrich the development community is by offering our services for free for the open source development community. Any public repository can seamlessly subscribe for our service within seconds.
«The way we give back and enrich the development community is by offering our services for free for the open source development community.»
Is there any scenario in the near future that you could train programming robots instead of helping programmers improve their code and therefore get rid of faulty coders for good?
Coding is a combination of engineering rigor and artistic creativity. Augmented Intelligence will help developers in many ways but we and the state of the human knowledge are far away from replacing developers. Sure, there will be very specific areas where we can generate code automatically, but those are and will be for quite some time very targeted to a specific domain and limited in their scope. Those capabilities allow developers to deliver faster and better solutions to work on the next abstraction level of coding. DeepCode is targeting to release in 2019 Automatic Code Fixing for 60% of the suggestions we provide – this will be the first example of targeted code generation.
One of the main problems for programmers to understand code that was written by someone else is the often poor, outdated or non-existing documentation of the code. How does your platform deal with that?
Today we do not have a user-facing solution for that. Internally we already label with keywords what each piece of the code does. There are ongoing projects to further that and offer a more comprehensive way to document and explain what code does. We actually had a big customer asking for that already, but exposing our proprietary internal representation is not strategically feasible at the moment.
Currently you support Java, C, C++, Java Scripts and Python. How much of your solution is language dependent and how much effort is necessary to add another language?
The only language-specific piece in our platform is the parsing of a language and then training from language-specific Big Code data set. Today we can deliver a new language in less than 2 months, this can be squeezed down if we can put two or more engineers working on a new language at the same time.
«Today we can deliver a new language in less than 2 months.»
As an ETH Spin-Off you are headquartered in Zurich. How important is the proximity to the ETH for the further development and where do you see the biggest opportunities to grow your business in the near future?
ETH is a main source for talent when we talk about our platform. ETH Zurich has a leading role and tradition in the space of Program Analysis and learning from Big Code and these advantages are still quite visible in the global talent pool. In terms of the less differentiated parts of our technology stack, like the product, UX/UI, front end, integrations, analytics, and DevOps, the global market is well distributed. From a user base perspective, the 30 million global development community is also very well distributed around the world and every single developer can ripe considerable benefits from DeepCode already today and we work hard to expand the WOW factor rapidly.
You already employ a team of highly specialised IT-experts. Where will you look for the talents to fuel future growth of Deep Code?
Anywhere where talent exists. Our team is very diverse already representing Switzerland, Germany, Bulgaria, Russia, Moldova, Canada, USA, Taiwan, Romania, Spain and The Netherlands
Which technological developments do you see, that will have the biggest influence on your business?
It is hard to predict but AI 3.0, novel ML (Machine Learning) representations, advances in Deep Learning and processing power growth (Quantum Computing). In general, we are fueled by the growing demands from the ever-expanding software community, which in turn is in high demand from every single sphere of the modern society, so any business and their technology growth in turn fuels demand for DeepCode’s unique software platform with our current and future services.
At the end of the interview, you are granted two wishes. What are these?
I wish I could glance in the future, 10 years from now, and see the impact that DeepCode would have made in shaping the world: alleviating Software Development as a key bottleneck for solving business, societal and global-scale problems.
The 2nd one is that we are able to continue motivating our team and attract the talent we need so we can keep on delivering the steps enabling our revolutionary vision.
Boris Paskalev, Co-founder and CEO of DeepCode:
Boris Paskalev has more than 15 years industry experience in growing technology startups to above 1 billion USD, business/product development, R&D, global team management and lean operations. He has an executive MBA from TRIUM as well as MSc and BSc from MIT. Boris Paskalev on Linkedin
DeepCode:
Zurich-based DeepCode provides a platform for analyzing and improving code. The platform uses Artificial Intelligence and Machine Learning techniques to read public and private GitHub repositories and tells programers how to fix problems, remain compatible and generally improve their programs.
This interview was made possible with the support of swissICT and the Digital Economy Award.
DeepCode was the winner of the «Next Global Hot Thing»-Award of 2018.