Gary McGraw on Building Secure AI Systems and His 20-Year Battle to Improve Software Security
Editor’s note: This post is the latest in Security Outliers, a series of interviews with people who are tackling big security problems while questioning the status quo.
Ten years ago Marc Andreessen famously said, “Software is eating the world.” Today, it’s safe to say that software is wreaking havoc on the world, too. Software supply chain attacks like SolarWinds and NotPetya and the countless data breaches attributed to unpatched vulnerabilities all point to a serious problem with the building blocks for our digital age. But none of this is news to Gary McGraw, who has been a bit of a Cassandra about the need to build security into software for many years.
I interviewed Gary back in 2010 when he was CTO at Cigital and he released a study called Building Security in Maturity Model (BSIMM). The research looked at the software engineering practices at 30 of the biggest tech companies, financial services and others and compared how secure their code was. A lot has changed since then: BSIMM10 now reports on 122 companies, Cigital was acquired by Synopsis in 2016, and Gary has moved on to explore security implications with the next big frontier — Artificial Intelligence and Machine Learning. He’s semi-retired now but still banging the drum on software security. When he’s not playing fiddle with his band (The Bitter Liberals), blogging about bad hotel bathrooms, and quoting Frank Zappa at Shmoocon, he’s tackling AI development as co-founder of the Berryville Institute of Machine Learning (BIML). I recently caught up with Gary to chat about the state of the software world.
You co-authored the first book on software security 21 years ago. Do you ever think “Why the hell aren’t they reading my book!?”
They are reading it, that’s the good news. The first book I wrote with John Viega, Building Secure Software, was a kind of philosophy book. All these problems turned out to be bad software and then I wrote a bunch of books about how to build secure code. We’ve made a heck of a lot of progress in the last 20 years, but we haven’t yet been able to make everyone do it. That is, we do know what to do, but everybody’s not doing it. I’d like to see the ideas behind secure software in more widespread use than they are now.
Companies that know how to build more secure systems will survive and those that don’t will go the way of the dinosaur.
Should we make building secure code mandatory?
Regulations are always tricky. The government lags in software security behind advanced corporations. You don’t want people leading from the rear. Generally speaking, the regulations would reflect the state of security from a decade or two ago. I don’t think that’s the answer. It’s clear the market by itself hasn’t come up with an appropriate answer, although many companies have changed their behavior and are trying to build better stuff. If we educate people that software doesn’t have to suck and systems can be secure if you build them right, that will change the way demand works and you can use our capitalist system to attain secure software. I also believe companies that know how to build more secure systems will survive and those that don’t will go the way of the dinosaur.
Why aren’t companies doing more to secure their software?
First off, it’s hard. Also, the incentives are a little screwed up because it’s not clear when software is built into a product that it’s the software that’s the problem when something bad happens. For example, look at cars. You buy a car based on the brand, its looks, and performance, but you don’t think about all the software that goes into it. That will change as cars become even more obviously computers on wheels. We have made progress, but spotty progress, and we’re in danger of having setbacks if we don’t keep our eyes on the ball.
I know the Berryville Institute of Machine Learning recently got a $150,000 grant from Open Philanthropy. Congratulations. Tell me what you’re up to at BIML.
Berryville is this tiny speck of a town in a speck of a county in Virginia where we’re doing world-class work in security, mostly focusing on risks that are built into today’s machine learning (ML) models. We published a report that focuses on 78 important risks developers have to keep in mind when designing or adopting ML systems. Right now, we’re taking a look at open source software for ML and trying to integrate runtime and memory protection into some of those systems. We’re focusing on PyTorch (open source ML library), which a lot of people use, and working on developing hands-on training courses. We also want to do a video series of talks with experts in the ML security field, called “Machine Learning in the Barn” (referring to the timber frame barn I built on my property), and include an interactive Q&A from BIML experts.
One of the issues with ML is when you train up a system and it seems to work, you might not understand how it’s doing what it’s doing. If you aren’t able to explain why it’s solving the problem then there are risks that are built in.
So what issues are you seeing crop up as a result of insecure ML? What types of attacks are happening?
One of the issues with ML is when you train up a system and it seems to work, you might not understand how it’s doing what it’s doing. If you aren’t able to explain why it’s solving the problem then there are risks that are built in. There have been a few real-world attacks, nothing really spectacular yet. But we’re not in an operational arms race against hackers in ML yet. I see it more as we’re focusing important attention on getting the architecture right and making sure we understand the security implications of what we’re building rather than build it and see what happens.
The type of thing that could happen is an ML system may slip out from under the constraints that you think you’re training it on. A good real-world example is an ML system that was supposed to be trained to discriminate between wolves and dogs. Designers gave it a set of pictures and said “this is a picture of a wolf, this is a picture of a dog,” for thousands of examples. And it came up with a way of categorizing other pictures of wolves and dogs (and it seems to be good at this categorization). But actually, in every picture of a wolf there happened to be a little bit of snow and every picture of a dog had no snow, so if you went on to show a picture with snow the ML system would always say, “wolf.” Basically, what they actually built was a snow detector. That’s a perfect example of what happens in ML. They do what we said, but not what we intended.
What’s a security-related issue that could crop up?
Well, consider an automated driving system that could be made to confuse stop signs and speed limit signs using a few pieces of reflective tape. By studying a machine learning system, an attacker can create “adversarial input data” — changing a stop sign to show a maximum speed limit instead — to cause an algorithm to misbehave.
This kind of risk has important implications for weapons systems — like, is it viewing a tank or a cat — or providing incorrect data on how fast centrifuges are spinning in a power plant automation monitoring system. We’re also at risk of seeing ML systems that are subjected to extraction attacks where confidential data are stolen, or systems that can be targeted by attackers through poisoning public data sources, or even systems that learn to be biased because the data they train on is biased. This has serious implications for areas like healthcare, self-driving cars, and facial recognition, to name just three domains.
I interviewed only women for a year and didn’t tell anybody I was only inviting accomplished women to participate. At the end of the year, I pointed that out. There are a lot of women contributing in important ways to our field and we should stop ignoring them.
You hosted the Silver Bullet Security podcast for 13 years. Any highlights from that you want to share?
It was fun to do. The most listened to episode was one with (Cambridge Professor and security engineering expert) Ross Anderson, he was by far the most popular. Another fun thing I did was I asked (security researcher and firewall inventor) Marcus Ranum the exact same 9 questions a decade apart, without telling him I was going to do that. I wanted to see how consistent he was, and he was incredibly consistent in his philosophy! Another thing I did took an entire year. I interviewed only women for a year and didn’t tell anybody I was only inviting accomplished women to participate. At the end of the year, I pointed that out. There are a lot of women contributing in important ways to our field and we should stop ignoring them.
Right on! So, what have we not covered that’s on your mind?
One surprising thing about me is that I’m optimistic that we’re making progress in security. A lot of people, when they get to be gray-bearded like me, become very cynical and think we haven’t learned anything, and no progress has been made. I don’t think that’s the case. I think security as a field has matured and broadened and gotten deeper in the 25 years I’ve been doing it. We’ll have future challenges that are hard to think through, like rampant ML adoption. But we also have some tools that are pretty good to help us work through these issues. I’m very proud of the progress we’ve made in software security in the last 25 years. I remember in 1999 I had to convince people that software security mattered. People were at the time overly focused on firewalls and bad packets and hackers getting in your open ports. My arguably “revolutionary” thinking was, “you’ve got broken stuff here and bad people there, and the plan seems only to put a firewall between them…but, um, why not just fix the broken stuff?!”
One final, very off-topic question… I know first-hand what a cocktail connoisseur you are. Can you describe your favorite drink or the most outrageous cocktail you make for our readers?
Ha! There are way too many data to trawl through to answer this correctly. One of my favorite cocktails of all time is the Negroni. It is a classic cocktail with only three ingredients combined in equal parts that even crappy bars can pull off. I like to have “Negroni hour” as often as possible. Another favorite of mine is one I made up, called the “Old 48.” You can find instructions for making an Old 48 on my Noplasticshowers blog here.