WritingAmazon (Nova)Amazon (Nova)published May 14, 2026seen 5d

Promptimus: Improving already good LLM prompts with zero manual engineering

Open original ↗

Captured source

source ↗

Promptimus: Improving already good LLM prompts with zero manual engineering - Amazon Science

Close

Close

Social

bluesky

threads

twitter

instagram

youtube

facebook

linkedin

github

rss

Menu

Research

Research areas

Automated reasoning

Cloud and systems

Computer vision

Conversational AI

Economics

Information and knowledge management

Machine learning

Operations research and optimization

Quantum technologies

Robotics

Search and information retrieval

Security, privacy, and abuse prevention

Sustainability

Our scientific contributions

Publications

Research from our scientists and collaborators.

Conferences

Our experts present and discuss cutting-edge research at scientific meetings globally.

Research areas

Automated reasoning

Cloud and systems

Computer vision

Conversational AI

Economics

Information and knowledge management

Machine learning

Operations research and optimization

Quantum technologies

Robotics

Search and information retrieval

Security, privacy, and abuse prevention

Sustainability

Our scientific contributions

Publications

Research from our scientists and collaborators.

Conferences

Our experts present and discuss cutting-edge research at scientific meetings globally.

News & blog

The latest from Amazon researchers

Amazon Science Blog

Technical deep-dives and perspectives from our scientists.

News

Research milestones and recent achievements.

The latest from Amazon researchers

Amazon Science Blog

Technical deep-dives and perspectives from our scientists.

News

Research milestones and recent achievements.

Collaborations

Amazon Research Awards

Overview

Call for proposals

Latest news

Research stories

Recipients

Amazon Nova AI Challenge

Overview

Rules

FAQs

Teams

Research collaborations

Overview

Carnegie Mellon University

Columbia University

Hampton University

Howard University

IIT Bombay

Johns Hopkins University

Max Planck Society

MIT

Tennessee State University

University of California, Los Angeles

University of Illinois Urbana-Champaign

University of Southern California

University of Texas at Austin

Virginia Tech

University of Washington

Amazon Research Awards

Overview

Call for proposals

Latest news

Research stories

Recipients

Amazon Nova AI Challenge

Overview

Rules

FAQs

Teams

Research collaborations

Overview

Carnegie Mellon University

Columbia University

Hampton University

Howard University

IIT Bombay

Johns Hopkins University

Max Planck Society

MIT

Tennessee State University

University of California, Los Angeles

University of Illinois Urbana-Champaign

University of Southern California

University of Texas at Austin

Virginia Tech

University of Washington

Resources

Code and datasets

AGI Labs

Meet the team building useful AI agents.

Amazon Nova

Try Amazon’s frontier foundation models.

Code and datasets

AGI Labs

Meet the team building useful AI agents.

Amazon Nova

Try Amazon’s frontier foundation models.

Careers

Careers

Explore our open roles.

Amazon Scholars

Faculty research opportunities on industry-scale technical challenges.

Postdoctoral Science Program

Early-career research opportunities alongside experienced industry scientists.

Careers

Explore our open roles.

Amazon Scholars

Faculty research opportunities on industry-scale technical challenges.

Postdoctoral Science Program

Early-career research opportunities alongside experienced industry scientists.

Search

Submit Search

Promptimus: Improving already good LLM prompts with zero manual engineering

By focusing on specific failure points and suggesting targeted solutions, a new automated prompt-engineering framework improves prompt performance without compromising existing functionality.

By Zhengyuan Shen , Yunfei Bai , Sullam Jeoung , Shuai Wang

May 14, 2026

16 min read

Share

Share

Copy link

Email

X

LinkedIn

Facebook

Line

Reddit

QZone

Sina Weibo

WeChat

WhatsApp

分享到微信

x

Overview by Amazon Nova

Promptimus is an automated method for optimizing well-developed prompts for large language models (LLMs), designed to improve performance without manual engineering. It works through a four-step iteration loop that includes evaluation, feedback generation, strategy and edit generation, and candidate evaluation, with options for standard or edit mode depending on the prompt's complexity. Promptimus achieves the best results on 16 of 20 benchmarks, outperforming six leading automatic prompt optimization methods, and demonstrating sample efficiency and model-agnostic generalizability across various LLMs and enterprise tasks.

Was this answer helpful?

Large language models (LLMs) have become integral to enterprise applications across industries. Under the hood, customers’ inputs to the models are usually augmented with prompts that encode intricate business logic, regulatory requirements, and domain expertise: a healthcare system must use language compliant with the Health Insurance Portability and Accountability Act, for instance, and a financial trading system must follow risk tolerance rules. These prompts are typically crafted by domain experts over weeks or months. Yet business demands continue to push for further performance gains. The challenge, therefore, is not engineering prompts from scratch but rather elevating already strong performance by discovering nuanced, task-specific refinements — without compromising domain requirements. In this post, we present Promptimus, a method for automatically optimizing well-developed prompts that has several advantages over its predecessors:

It's model agnostic : It takes a prompt already optimized for a source model, rapidly reoptimizes it for a target model, and compares the optimized prompts across models. It's driven by performance criteria : It takes the existing prompt template, task-specific data samples, and user-defined performance metrics and generates targeted improvement strategies, iterating repeatedly to achieve domain-specific optimization objectives. It focuses on exploits : It uses a metric-analyzer AI agent to identify failure points and a debugging helper agent to identify root causes, and it surgically refines prompts relative to failures (rather than along random dimensions) for targeted performance improvement. It’s fully automated : It analyzes user-defined metrics and uses a code sanitization AI agent to generate debugging checkpoints automatically. Metric functions can be imported as Python code, and…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Amazon research on prompt optimization