Promptimus: Improving already good LLM prompts with zero manual engineering
Captured source
source ↗Promptimus: Improving already good LLM prompts with zero manual engineering - Amazon Science
Close
Close
Social
bluesky
threads
youtube
github
rss
Menu
Research
Research areas
Automated reasoning
Cloud and systems
Computer vision
Conversational AI
Economics
Information and knowledge management
Machine learning
Operations research and optimization
Quantum technologies
Robotics
Search and information retrieval
Security, privacy, and abuse prevention
Sustainability
Our scientific contributions
Publications
Research from our scientists and collaborators.
Conferences
Our experts present and discuss cutting-edge research at scientific meetings globally.
Research areas
Automated reasoning
Cloud and systems
Computer vision
Conversational AI
Economics
Information and knowledge management
Machine learning
Operations research and optimization
Quantum technologies
Robotics
Search and information retrieval
Security, privacy, and abuse prevention
Sustainability
Our scientific contributions
Publications
Research from our scientists and collaborators.
Conferences
Our experts present and discuss cutting-edge research at scientific meetings globally.
News & blog
The latest from Amazon researchers
Amazon Science Blog
Technical deep-dives and perspectives from our scientists.
News
Research milestones and recent achievements.
The latest from Amazon researchers
Amazon Science Blog
Technical deep-dives and perspectives from our scientists.
News
Research milestones and recent achievements.
Collaborations
Amazon Research Awards
Overview
Call for proposals
Latest news
Research stories
Recipients
Amazon Nova AI Challenge
Overview
Rules
FAQs
Teams
Research collaborations
Overview
Carnegie Mellon University
Columbia University
Hampton University
Howard University
IIT Bombay
Johns Hopkins University
Max Planck Society
MIT
Tennessee State University
University of California, Los Angeles
University of Illinois Urbana-Champaign
University of Southern California
University of Texas at Austin
Virginia Tech
University of Washington
Amazon Research Awards
Overview
Call for proposals
Latest news
Research stories
Recipients
Amazon Nova AI Challenge
Overview
Rules
FAQs
Teams
Research collaborations
Overview
Carnegie Mellon University
Columbia University
Hampton University
Howard University
IIT Bombay
Johns Hopkins University
Max Planck Society
MIT
Tennessee State University
University of California, Los Angeles
University of Illinois Urbana-Champaign
University of Southern California
University of Texas at Austin
Virginia Tech
University of Washington
Resources
Code and datasets
AGI Labs
Meet the team building useful AI agents.
Amazon Nova
Try Amazon’s frontier foundation models.
Code and datasets
AGI Labs
Meet the team building useful AI agents.
Amazon Nova
Try Amazon’s frontier foundation models.
Careers
Careers
Explore our open roles.
Amazon Scholars
Faculty research opportunities on industry-scale technical challenges.
Postdoctoral Science Program
Early-career research opportunities alongside experienced industry scientists.
Careers
Explore our open roles.
Amazon Scholars
Faculty research opportunities on industry-scale technical challenges.
Postdoctoral Science Program
Early-career research opportunities alongside experienced industry scientists.
Search
Submit Search
Promptimus: Improving already good LLM prompts with zero manual engineering
By focusing on specific failure points and suggesting targeted solutions, a new automated prompt-engineering framework improves prompt performance without compromising existing functionality.
By Zhengyuan Shen , Yunfei Bai , Sullam Jeoung , Shuai Wang
May 14, 2026
16 min read
Share
Share
Copy link
X
Line
QZone
Sina Weibo
分享到微信
x
Overview by Amazon Nova
Promptimus is an automated method for optimizing well-developed prompts for large language models (LLMs), designed to improve performance without manual engineering. It works through a four-step iteration loop that includes evaluation, feedback generation, strategy and edit generation, and candidate evaluation, with options for standard or edit mode depending on the prompt's complexity. Promptimus achieves the best results on 16 of 20 benchmarks, outperforming six leading automatic prompt optimization methods, and demonstrating sample efficiency and model-agnostic generalizability across various LLMs and enterprise tasks.
Was this answer helpful?
Large language models (LLMs) have become integral to enterprise applications across industries. Under the hood, customers’ inputs to the models are usually augmented with prompts that encode intricate business logic, regulatory requirements, and domain expertise: a healthcare system must use language compliant with the Health Insurance Portability and Accountability Act, for instance, and a financial trading system must follow risk tolerance rules. These prompts are typically crafted by domain experts over weeks or months. Yet business demands continue to push for further performance gains. The challenge, therefore, is not engineering prompts from scratch but rather elevating already strong performance by discovering nuanced, task-specific refinements — without compromising domain requirements. In this post, we present Promptimus, a method for automatically optimizing well-developed prompts that has several advantages over its predecessors:
It's model agnostic : It takes a prompt already optimized for a source model, rapidly reoptimizes it for a target model, and compares the optimized prompts across models. It's driven by performance criteria : It takes the existing prompt template, task-specific data samples, and user-defined performance metrics and generates targeted improvement strategies, iterating repeatedly to achieve domain-specific optimization objectives. It focuses on exploits : It uses a metric-analyzer AI agent to identify failure points and a debugging helper agent to identify root causes, and it surgically refines prompts relative to failures (rather than along random dimensions) for targeted performance improvement. It’s fully automated : It analyzes user-defined metrics and uses a code sanitization AI agent to generate debugging checkpoints automatically. Metric functions can be imported as Python code, and…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Amazon research on prompt optimization