DARPA wants to accelerate translation of C code to Rust – and it’s relying on AI to do it

Cyber security analyst using AI security tools at desktop workstation in low-lit room.
(Image credit: Getty Images)

The American Defense Advanced Research Projects Agency (DARPA) wants to use AI to automate the translation of legacy C code to Rust to avoid known memory safety vulnerabilities.   

The C programming language dates from the 1970s, but remains ubiquitous on everything from “smartphones to space vehicles”, DARPA noted.

That's in part because of the popularity of C and C++, but also because of the continued use of legacy systems powered by those programming languages — and DARPA admits that includes systems running at the Department of Defense.

That's a problem for the industry as C programming languages have serious memory flaws. First, programmers can directly manipulate memory, making it easy to introduce accidental errors that enable memory to be corrupted.

Second, the programming language has no specification for unexpected behaviors. These problems mean accidental errors can introduce serious bugs, but hackers can also take advantage of the flaws for malicious activity.

DARPA said it saw consensus among the software engineering community that bug-finding tools weren't enough to address these issues. Instead, legacy code needs to be upgraded to programming languages like Rust that don't feature these vulnerabilities.

Memory safety vulnerabilities are the most common subset of disclosed software flaws, according to the US Cybersecurity and Infrastructure Security Agency, a fact that led the US National Security Agency to advise organizations to ditch C and C++ in favor of Rust, Java, or Swift.

Consumer Reports said that as many as seven-in-ten browser and kernel vulnerabilities can be pinned on C and C++, calling for companies and organizations to shift to safer programming languages, educators to ditch C for safer, modern languages, and admitting regulatory action may be required.

DARPA's rescue TRACTOR looks to fix up code safety

To help, DARPA has unveiled an initiative called TRACTORTranslating All C to Rust. The aim is to take advantage of AI, in particular large language models (LLMs), but also static and dynamic analysis, to automate the translation process as much as possible. 

"You can go to any of the LLM websites, start chatting with one of the AI chatbots, and all you need to say is ‘here's some C code, please translate it to safe idiomatic Rust code,’ cut, paste, and something comes out, and it's often very good, but not always," said Dr. Dan Wallach, DARPA program manager for TRACTOR.

"The research challenge is to dramatically improve the automated translation from C to Rust, particularly for program constructs with the most relevance."

DARPA is accepting proposals for how best to achieve this, and will hold public competitions to test LLM-powered solutions.

Why Rust?

Modern programming languages such as Rust handle memory more safely, "thereby eliminating the entire class of memory safety security vulnerabilities in C programs," DARPA said.

Microsoft Azure CTO Mark Russinovich said it should be the successor to C, and the language has been adopted by developers at a myriad of organizations globally in recent years. .

"Rust forces the programmer to get things right," Wallach said. "It can feel constraining to deal with all the rules it forces, but when you acclimate to them, the rules give you freedom. They're like guardrails; once you realize they're there to protect you, you'll become free to focus on more important things."

Beyond safety, it's proven popular with developers, winning "most loved" programming language in Stack Overflow surveys.

On the downside, hackers are already turning to such coding languages. Research from BlackBerry in 2021 found that threat actors were turning to languages such as Rust when coding malware in a bid to avoid detection.

The study noted that security professionals were increasingly seeing malware strains written in Rust, as well as other ‘exotic’ languages such as as Go, Nim, and DLang.