Codex Exposed: Task Automation and Response Consistency

January 21, 2022 at 08:13 am EST

In June 2020, OpenAI released version 3 of itsGenerative Pre-trained Transformer (GPT-3), a natural language transformer that took the tech world by storm with its uncanny ability to generate text seemingly written by humans. But GPT-3 was also trained on computer code, and recently OpenAI released a specialized version of its engine, namedCodex, tailored to help - or perhaps even replace - computer programmers.

In a series of blog posts, we explore different aspects of Codex and assess its capabilities with a focus on the security aspects that affect not only regular developers but also malicious users. This is the third part of the series. (Read the first and second partshereandhere.)

Being able to automate tasks or programmatically execute them unsupervised is an essential part of both regular and malicious computer usage, so we wondered if a tool like Codex was reliable enough to be scripted and left to run unsupervised, generating the required code.

As it turned out, one could not step into the same river twice: It was immediately apparent that Codex is not a deterministic system, nor a predictable one. This means that the results are not necessarily repeatable. By its very nature, the massive neural network behind GPT-3 and Codex is a black box, the inner workings of which are tuned by feeding it a huge set of training texts from which it "learns" the statistical relationships between words and symbols that ultimately constitute a faithful imitation of users' natural languages. This has several consequences that users should keep in mind while interacting with GPT-3 in general or Codex in particular, such as:

Since it is a natural language transformer, all interactions with the system happen in natural language. This is also known as "prompt-based programming" and it basically means that the output of the transformer heavily depends on how the input question is formulated. Even slight variations on what is seemingly the same question can lead to massively different results.
Among these, empty results or plain old gibberish can also occur, as we experienced especially during our first attempts.
Whenever this happens, there is really no indication of a discernible reason as to why the system decided to respond with noise rather than a coherent result.

Figure 1. The same question, asked at different times, leading to dramatically different results

In the two screenshots above, the same question ("generate a list of ani alu") was asked, but the results were completely different. One was just a long sequence of spaces, while the other was legitimate code. No other parameters were changed. (The user input is highlighted in red.)

In another example, we can appreciate the stochastic - that is, random - nature of the system by looking at how two subsequent and apparently identical requests lead to different pieces of code being generated. Only the most attentive reader might spot a space too many in the request prompt.

Figure 2. Two queries that differ only by one space

Essentially the same query ("python code get password router") was used in both cases, except that the latter case had an extra space. (The input fields are highlighted in red.)

When interacting with Codex manually, this behavior is not a major problem, and the workaround is to iterate and simply attempt to formulate the prompt differently. However, this makes it very difficult, if not impossible, to use the language transformer programmatically. Imagine writing a script to perform many requests to Codex to generate, for example, a set of code snippets in an unsupervised manner: One would need some logic dedicated to detecting and fixing or discarding any garbled response.

Another realization that rose in our various attempts at generating some code is that, contrary to a popular misconception, Codex does notbehave like a search engine for code. Instead, it tries to play an ad-lib game with the user, aiming to complete whatever input comment is provided with the code that in its "experience" would "go well" with the input prompt. The question it tries to answer is not the one the user asked in the comment itself and the input should not be treated as such. Rather, the question Codex tries to answer is, "What (code) should I write to finish the paragraph the best, given such a beginning?" It is a subtle but important difference that can lead to dramatically different results, as shown in the examples below.

Figure 3. A different formulation of the same request leading to dramatically different results

The query used here was "list soafee". (The inputs are highlighted in red.) These examples show how a small variation in what was asked, merely giving a more descriptive prompt, led to an actual result rather than an empty output.

In the end, trying to automate Codex to perform repeated tasks, unsupervised, very often implies having to check the output and filter out all garbled responses. For many types of projects, whether they are malicious or not, this task of filtering and fixing the response might very well end up being more labor-intensive than, say, resorting to a more traditional solution to achieve the same end result. This makes Codex a difficult choice when constant human supervision cannot be guaranteed.

Attachments

Original Link
Original Document
Permalink

Disclaimer

Trend Micro Inc. published this content on 21 January 2022 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 21 January 2022 13:12:01 UTC.

	1st Jan change	Capi.
TREND MICRO INC.	-8.03%	5.73B
MICROSOFT CORPORATION	+20.65%	3,371B
SYNOPSYS INC.	+19.25%	94.08B
CADENCE DESIGN SYSTEMS, INC.	+15.90%	86.55B
PALANTIR TECHNOLOGIES INC.	+63.45%	62.51B
DASSAULT SYSTÈMES SE	-22.05%	49.4B
ATLASSIAN CORPORATION	-23.72%	47.23B
SEA LIMITED	+82.30%	42.4B
TAKE-TWO INTERACTIVE SOFTWARE, INC.	-4.72%	26.84B
ROBLOX CORPORATION	-11.35%	25.94B

1st Jan change

Capi.

TREND MICRO INC.

-8.03%

5.73B

MICROSOFT CORPORATION

+20.65%

3,371B

SYNOPSYS INC.

+19.25%

94.08B

CADENCE DESIGN SYSTEMS, INC.

+15.90%

86.55B

PALANTIR TECHNOLOGIES INC.

+63.45%

62.51B

DASSAULT SYSTÈMES SE

-22.05%

49.4B

ATLASSIAN CORPORATION

-23.72%

47.23B

SEA LIMITED

+82.30%

42.4B

TAKE-TWO INTERACTIVE SOFTWARE, INC.

-4.72%

26.84B

ROBLOX CORPORATION

-11.35%

25.94B

Market Closed - Japan Exchange Other stock markets 02:00:00 2024-07-12 am EDT			5-day change	1st Jan Change
6,942 ^JPY	-2.79%		+1.02%	-8.03%

Jun. 19	Jefferies Adjusts Trend Micro’s Price Target to 7,600 Yen From 8,800 Yen, Keeps at Hold	MT
Jun. 18	Tranche Update on Trend Micro Incorporated's Equity Buyback Plan announced on February 15, 2024.	CI

Jefferies Adjusts Trend Micro’s Price Target to 7,600 Yen From 8,800 Yen, Keeps at Hold	Jun. 19	MT
Tranche Update on Trend Micro Incorporated's Equity Buyback Plan announced on February 15, 2024.	Jun. 18	CI
Trend Micro Incorporated's Equity Buyback announced on February 15, 2024, has closed with	Jun. 16	CI
Trend Micro Launches First Security Solutions for Consumer AI PCs	Jun. 05	CI
Trend Micro to Secure AI-Enabled Private Data Centers Worldwide	Jun. 02	CI
Trend Micro taps Nvidia software tools for AI cybersecurity offering	Jun. 02	RE
Transcript : Trend Micro Incorporated, Q1 2024 Earnings Call, May 08, 2024	May. 08
Trend Micro Incorporated Expands AI-Powered Cybersecurity Platform to Combat Accidental AI Misuse and External Abuse	May. 01	CI
Tranche Update on Trend Micro Incorporated's Equity Buyback Plan announced on February 15, 2024.	Apr. 08	CI
India's Motilal Oswal says operations unaffected by cyber incident	Feb. 19	RE
Tech, Property Rallies Push Asian Exchanges Higher	Feb. 16	MT
Nikkei races to fresh 34-year high, all-time peak in sight	Feb. 15	RE
Trend Micro Incorporated Provides Earnings Guidance for the Year Ending December 31, 2024	Feb. 15	CI
Transcript : Trend Micro Incorporated, Q4 2023 Earnings Call, Feb 15, 2024	Feb. 15
Trend Micro Incorporated announces an Equity Buyback for 6,300,000 shares, representing 4.47% for ¥40,000 million.	Feb. 15	CI
Trend Micro Incorporated authorizes a Buyback Plan.	Feb. 14	CI
Japan's small cohort of listed firms led by women	Jan. 31	RE
Fed, China Property Outlooks Roil Asian Stock Markets	Jan. 30	MT
Nikkei's Hot Run Continues; Hits Fresh 34-Year High - 2nd Update	Jan. 12	DJ
Fedramp Authorization Increases Trend Micro's Leadership as the Most Globally Compliant Cybersecurity Platform	Dec. 20	CI
China Outlook, FX Churn Asian Stock Markets	Dec. 04	MT
Transcript : Trend Micro Incorporated - Analyst/Investor Day	Nov. 30
Trend Micro First to Integrate Cloud Risk Management and XDR Across Customers' Entire Attack Surface	Nov. 28	CI
Boeing data published by Lockbit hacking gang	Nov. 10	RE
Concerns Over China's Economy Weigh on Asian Stocks	Nov. 10	MT

Trend Micro Inc.

Equities

4704

JP3637300009

Software

Codex Exposed: Task Automation and Response Consistency

Latest news about Trend Micro Inc.

Chart Trend Micro Inc.

Company Profile

Income Statement Evolution

Ratings for Trend Micro Inc.

Analysts' Consensus

EPS Revisions

Quarterly earnings - Rate of surprise

Sector Other Software