Bot detection and mitigation is the the hottest market within our industry. Finding effective ways to manage sophisticated human-like bots is becoming more difficult, due to the advanced techniques being used by attackers, which can imitate browser activity that seems human-like and bypass the traditional fingerprinting technologies on the market. One revolutionary company that’s working to prevent these kind of attacks is Unbotify, a bot detection company that uses behavioral biometrics and machine learning techniques to keep bad bot traffic off enterprise websites. Recently we had the opportunity to talk with Yaron Oliker, CEO, and Coby Fernandess, Chief Data Scientist of Unbotify, about the bot landscape and their innovative techniques for bot detection.
Is Unbotify a pure play bot mitigation company or something else?
Yaron: Unbotify is all about bot detection. We’re answering one question: are users bots or humans? And the way we detect that is through behavioral biometrics.
Coby: We’re doing behavioral biometrics, which means extensive analysis of data. Whatever that data is used for: bots, transaction fraud… It will be the conclusion of data analysis based on accuracy, models, and performance. The revolution that we bring is about the way we are quantifying this stuff to make business better. That’s our core technology and the added value we bring.
Do you provide client-side software that customers install on their end?
Coby: We have a Javascript client that listens to the interaction events that users generate, whether they move the mouse, click the keyboard, or use a touch screen. We send these interactions back to our data center (usually AWS) where they are stored and preprocessed. For each interaction, we generate a set of features that characterize it. These features are then being analyzed with machine learning tools in order to determine whether the interaction was generated by a bot or a human. So essentially we build a classification mechanism for each company, dividing users into humans or bots according to the way users are behaving on the customer’s site. That classification mechanism is very valuable to the market today. That’s the million dollar question.
What is going on in the world of bot attacks?
Yaron: So essentially the bot problem we have today is that over 60% of all web traffic is non-human, per incapsula’s studies. Half of that is good bots, such as search engines or services monitoring uptime. The other half is doing bad stuff. They’re there for nefarious purposes, like ad fraud, which is probably the biggest activity of cybercrime in the world right now. Some studies showed that one out of every three dollars spent on digital advertising today is wasted on bots that are faking clicks, ad impressions, lead form completions, etc.
These are all techniques fraudsters are using to siphon ad dollars out of the ecosystem. Bots can leverage layer 7 (DDoS) attacks, so you have to discern layer 7 traffic from real traffic. Bots are also taking over users accounts. They take advantage of the fact that users usually reuse their passwords on different web services. So when one site gets breached, their user credentials will be traded in the dark web, and bots will try them on many websites. The success rate of such attacks is typically around 1-2%, and can lead to massive account takeover and fraud losses.
With the first bot attacks starting back in the early 90s, websites always did some stuff to protect against them. Techniques such as IP reputation and IP rate limiting have been around for quite a few years. But these methods are being rendered ineffective because of two main things – First is IoT botnets, like Mirai, where hackers can take control of hundreds of thousands of IoT devices on residential IP’s. The other is P2P proxy networks, which have millions of users, so attackers are able to effectively rotate the IP of their bots to render IP based methods useless.
How does Unbotify compare to the competition?
Yaron: Right now, there are a few companies that provide browser fingerprinting, which is a technique that leverages Java Script and client-side technologies to detect what’s going on in a browser. They can see plugin information, fonts etc. to get a lot of information about the browser. This data is used in two main ways in conjunction with bot detection – the first is browser integrity checks: trying to see if the browser looks legit or if it’s a script or a headless browser trying to disguise itself as Chrome for example. The techniques for that would be Java Script challenges that only a certain browser would answer in a certain way, and other integrity checks. The second way is to compile a browser fingerprint that is supposedly stickier than an IP and harder to change, so you can blacklist these fingerprints rather than IP’s.
But today, both ways are not efficient anymore as attackers are able to spoof this data in a completely valid way for each session separately, so there’s actually no way to tie these requests to the same source. There’s also polymorphism, which can confuse lower-level bots by manipulating the underlying HTML of web pages, but smarter bots today are using a full stack browser to render web pages, moving the mouse to the element they are attacking, clicking, and entering text. All the previous solutions fall short to prevent this type of attack, and bot developers who are using automated browsers and know how to hide their tracks bypass them effectively.
Our system relies on completely different data points for bot detection – instead of fingerprinting browsers and devices, we are focused on characterizing the way human users interact with their devices, at the biometric level. This is the data point that is the hardest to spoof with automation. This enables us to address the most advanced types of attacks with far better accuracy. We raise the bar for hackers significantly.
Regarding ad fraud, does bot detection technology need to understand the entire ad workflow to mitigate bots?
Coby: The key question is about bot detection. You can’t always understand all the flow. All you need to do is stop it. Once you detect the bot, you can reveal fraud and prevent it.
Yaron: You can’t always tell what the intention of the bot is initially, but you can detect it. What Unbotify is the best in the world at doing is figuring out what is and isn’t automation. After we figure that out, telling whether the attack is DDoS, ad fraud, or scraping is trickier. It can be hard to reveal intent sometimes.
Does your platform capture visitor data and then create a fingerprint?
Yaron: Yes. We capture data on every visitor of a site. The way we typically work to analyze that data is to start with a set of visitors we are certain are human (such as someone who paid for a service) and build our machine learning models around that. We then improve and extend our models constantly to achieve better accuracy.
Are you building fingerprints for each person who visits a site?
Yaron: No. The fingerprint is not unique per user, but characterizes the way human visitors are behaving on the site. This is a completely different concept than fingerprinting specific visitors. The platform must ingest a large volume of data to handle this kind of activity. We have a massive amount of data coming into our system every day since our current customers are large. We use advanced infrastructure technologies like Apache Spark clusters and other cluster algorithms to process the data. It’s not an easy thing to do, and it’s hard to build that kind of infrastructure in a scalable way, but we have done it very successfully.
Do you provide a reporting dashboard?
Coby: Of course. It is important to give a complete view of the bots on your site. However, keep in mind that the dashboard is only a visualization tool, while our core technology is about classifying traffic. This is what actually is saving customers money previously lost to fraud. For more information, reach out to us here.