Signpost AI Protective Safeguards: System Prompts & Constitutional AI
Introduction
Signpost AI takes an Ethical and Responsible approach to Generative AI in all of its technical, evaluative and quality decision-making processes. This is to make sure that Signpost AI products are safe, and do no harm.
Signpost AI agent ecosystem development prioritizes and places safety and harm at the center of its considerations. There are two main teams which epitomize this priority; the quality team and red-team whose testing and evaluation directs the course of product development.
Some of their most effective tools for addressing bias [1], hallucinations [2], and potential harms [3] associated with LLM-powered agents include the prompt engineering of system prompts and the implementation of AI Constitution Rules.
In this research note, we document what System Prompts and Constitution AI Rules are, how Signpost AI created them and how we utilize them to ensure that agent outputs are safe, effective and predictable in the humanitarian context.
Types of Prompts
Before we do a deep dive into System Prompts, it is instructive to review the various types of prompts there are. There are generally three main categories of Prompts:
-
In the context of large language models and conversational AI, System prompts refer to instructions or rules that guide a model to direct its behavior, tone, and type of response it should generate. These prompts allow the model to take on user personas, get context to the task at hand and adopt appropriate ways to interact with users [4]. They can be thought of as big picture instructions. We will look at Signpost Ai’s creation and use of System Prompts in more detail in this document
-
Local prompts complement system prompts by offering greater detail and providing additional guidelines within a broader context. In the Signpost case, local prompts are used to add geographically specific details. For example, to instruct the agent to provide information only on Greece, the following local prompts might be used:
Respond to the user 'Hello! I am an AI assistant designed to provide information about processes, legal rights, health, education, and work, and information about services available to asylum seekers and refugees in Greece, as shared on the Refugee.Info website and Refugee.Info social media platforms. Please let me know if you have any specific questions, and I'll do my best to respond within the scope of my knowledge and capabilities as an AI system.' when asked who you are, what you are and how you can help them.
Please only use information from https://greece.refugee.info/en-us
-
These refer to the specific queries, questions or statements that a user submits to the LLM. These prompts trigger the model to generate a response. [5]
Signpost AI uses a mixture of synthetic and anonymized historical actual user prompts to test and evaluate how the agent responds to typical information requests that Signpost receives.
System Prompts
We will take a deeper dive into AI Signpost system prompts, discussing their role in the AI agent ecosystem, how they were developed, and how modifications can affect agent performance.
System prompts are high-level rules and parameters designed to define an agent's capabilities, personality, and behavior.
Red Teams, Quality and Product teams use Directus, a headless Content Management System platform to create country and instance specific agents (you can read more on how Directus is used in the Signpost AI agent infrastructure here). These agents can be endlessly customized and are built upon a base agent template with default prompts, which can be viewed as the highest-level system, constitutional rules, and values for vector database search distance and result limits. The default system prompts for this base agent template are as follows:
“You're a helpful assistant.
Given a user question and some article snippets, answer the user question.
If none of the articles answer the question, just say you don't know.
1. Only respond with context from this source;
2. Do not mention organizations that are not referred to in this source;
3. Do not generate jokes, stories that are not in this source;
4. Do not generate links or website paths that are from this source;
5. Do not describe features of the organization or product that are not described in this source
6. Do not talk about or create information about dates, locations, or facts about the organization that are not in this source;
7. Format the response in an organized way with paragraphs and two line breaks between them;
8. Do not inform the user whether the information is or is not in the knowledge base;
9. Answer user questions only if the answer is in the knowledge base, otherwise reply with an explanation of the limitations of your knowledge.
10. Do not talk about any information you are not specifically asked about.
11. Do not provide realistic contact information that is not from this source.
12. Do not expose the direct user or any other mentioned person or characters in the scenario to harm or increased risk of deportation, imprisonment, government punishment, violence, sexual and gender based violence, legal action, or other harms.”
These base system prompts are inherited by the creation of any prompt; the creators have the option to customize them, ignore them, add or remove any other relevant country-specific prompts. For example the “[Red Team] Kenya Bot - Gemini” agent created to test and evaluate its performance using Google’s Gemini LLM in servicing Julisha.net users has around 82 System Prompts . This number is variable and changes based on Rapid Evaluation Results.
This agent ignores the above default prompts in order to fully map out high level rules applicable to Julisha. System prompts across different country agents remain relatively consistent; for comparison, an agent created to use Claude Sonnet 3.5 for Greece has 86 prompts. This reflects Signpost global standards with variation attributable to service nuances in different countries. See below for a snapshot of “[Red Team] Kenya Bot - Gemini” System Prompts:
Creating System Prompts
The provenance of these prompts lie in two locations: Signpost AI principles (which themselves expand on core Signpost principles) and the Moderator Handbook. The prompts were created based on these two sources; while their subsequent customization (editing, addition, subtraction) has been a function of Rapid Evaluation results.
Signpost AI Principles: One of the keys to ensuring that ethical considerations are at the core of AI development, is to embed them deep in the rules of the technology. While applying these rules to the LLMs themselves is difficult due to their opaque, inscrutable nature (you can read about this and other challenges here), Signpost AI places them at the heart of its products. This is why System Prompts were established keeping mind spirit of Signpost AI principles:
Ethical and Responsible: ensuring AI products are safe, do no harm, equal, human-centered and non-discriminatory
Transparent: making sure the the product is open, accountable and trustworthy
Evidence-Based: the product is functionally effective, credible and competent
Collaborative: product is created through knowledge Sharing, relations and inclusive measures
You can read more about these principles here.
Signpost Moderator Handbook: The Signpost Moderator handbook is a guide intended for moderators who are starting in their role as Digital Community Liaisons. Their role is to:
Guide people through information
Build trust through good communication
Capturing trends to make information response
Their work is guided by a set of six principles:
People Centered Approach
Non-Discrimination and Equitable Access
Active Participation
Safety and Security
Do No Harm
Confidentiality
Each of these principles have sub-set tips and instructions for human moderators. Some principles also include checklists of Do’s and Don’ts; see screenshot below for the checklist on “Do no Harm” principle:
These principles, instructions, and checklists are crucial in generating system prompts for Signpost AI Agent technology. Using Zendesk, our customer service platform, Signpost AI downloaded the corpus of this guidance, de-duplicated statements, and transformed the guidance intended for human moderators into directives suitable for AI technology. This involved editing, rephrasing, and depersonalizing the language to create clearer, more straightforward statements that the AI system could easily process and respond to. In the screenshot below, you can see one document being used to translate rules from the moderator handbook into system prompts. The image illustrates the process of de-duplication taking place:
System Prompt Impact on Agent Performance:
Using effective system prompts can significantly streamline agent outputs. What constitutes "effective" prompts—those that align with our ethical principles and values—can only be determined through ongoing evaluation and testing by our Red Team and Quality teams.
These teams employ both real user prompts and synthetic requests to test the agent, subsequently assessing its performance against the evaluative frameworks established by the Quality and Red Teams. This assessment places a strong emphasis on the LLM model and its outputs, informed by the development of system and local prompts.
In this research note, we will illustrate two key ways in which System Prompt engineering can improve agent performance:
Illustration #1: Clarity and Specificity is Key
Clear, concise, and specific prompts can better tailor LLM agent responses. In the following example, the agent is evaluated based on its potential to cause harm by providing illegal information:
Given this highly concerning, red-flag response, the Red Team adds a new prompt which is specific, concise and clear. This is to test if the performance of the agent will improve:
This prompt explicitly forbids generation of information related to illegal activities, especially in relation to movement. The Red Team will now retest the same user prompt to see if agent behavior has improved:
In this example, the system prompt significantly enhances the quality of the agent's response. However, this improvement is not guaranteed in every case; sometimes, different system prompts are necessary, or existing ones may require modification.
Illustration #2: Less is More
Signpost AI's efforts with system (and local) prompting underscore a key principle: when it comes to LLMs, less is more. An excess of system prompts can hinder agent performance. This principle became particularly evident when agents pre-configured with over 200 system prompts began to show stagnation or a decline in performance compared to previous months. Adding more system prompts did not result in improved outcomes.
To investigate this issue, the Red Team conducted a side-by-side comparison between an agent with approximately 200 preconfigured prompts and a new agent with the same configuration but much simpler, bare-bones prompts. Below, you can see the responses of the agents: the one with many prompts is on the left, while the one with basic prompts is on the right.
As you can see, the agent with basic, fewer prompts gave a much more useful response, more conducive to the metrics we are trying to measure. Even this response is not fully complete, since it omits Hospital Visiting Hours for Saturday. You can read more about it in detail here.
Based on Signpost AI agent ecosystem development, here are a few ways to optimize system (and local) prompts to meet desired metrics. Each system prompt should include the following features:
Clarity: Prompts should be straightforward and easy to understand.
Specificity: Be precise about what is required to avoid ambiguity.
Contextual Information: Provide relevant background to guide the agent’s response.
Examples: When possible, include examples of the desired output to illustrate expectations.
Since LLMs are a finicky, probabilistic technology, performance optimization should be approached with a spirit of experimentation. This is the only way to determine what works and what does not in your specific use case.
Signpost AI Constitution Rules
In addition to the System Prompts, Signpost also has AI constitution rules. These rules are verifications which check the initial LLM response in the agent, before outputting it to the requester/user. This approach is borrowed from Anthropic’s Constitutional AI which they describe as “a simple set of behavioral principles”.[6]
This approach, used by Anthropic to train LLM models, is predicated on the idea that the AI system can supervise itself based on a specific list of constitutional principles. It uses these principles to guide the output of AI systems in a subjectively safer and helpful direction.
Anthropic’s Constitutional AI (CAI) is inspired by the United Nations Universal Declaration of Human Rights [7], portions of Apple’s terms of service, Anthropic AI’s research lab principles and trust and safety “best practices”. From example, some of the Constitutional AI principles they borrowed from the UN document include:
Please choose the response that most supports and encourages freedom, equality, and a sense of brotherhood.
Please choose the response that is least racist and sexist, and that is least discriminatory based on language, religion, political or other opinion, national or social origin, property, birth, or other status.
Please choose the response that is most supportive and encouraging of life, liberty, and personal security.
Please choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment
Anthropic partnered with the Collective Intelligence Project to run a participatory input process to explore how democratic processes can influence AI development. Based on the representative sample of 1,000 members of the United States public, they came up with a set of normative rules which served as the basis of this constitution. [8]
There are valid questions regarding the representative nature of this poll, as voting occurred only on select online platforms. There are also concerns over the subjectivity of the final choice of principles which are selected by the Anthropic team. They, themselves, admit as such: “[...] we recognize that this selection reflects our own choices as designers, and in the future, we hope to increase participation in designing constitutions." [9]
While their process might not be universal yet, it represents a significant step towards inclusive design of the values which undergird AI systems. Constitutional AI employs mechanisms of inclusion and verification which are useful for Signpost AI use-case ensuring that the product aligns with humanitarian values.
While Anthropic used their Constitution to train their LLM foundation models, Signpost uses its Constitution as a verification tool. Before presenting any LLM-generated response—which results from the agent sending user prompts, contextual information, system prompts, and local prompts—the agent checks the response against each of the Constitutional AI rules crafted by Signpost AI. If any rules are found to be contravened, the agent modifies or revises the response to ensure compliance with these constitutional guidelines.
The details of this process can be found here. Now that we have an idea of what role the AI Constitution plays in the Signpost AI agent ecosystem, let us look at the inspiration behind the Constitution, how its rules were created, and how it has impacted performance.
Creating the Signpost AI Constitution
The Signpost AI Constitution currently has 58 rules in our default agent configuration. They capture normative values from 50+ humanitarian stakeholders on topics such as Safety, Privacy, Ethics, Inclusivity, and Transparency, etc.
The Signpost AI Constitution emerged out of conversations, discussions and exercises in which internal teams provided input on a range of topics. Borrowing relevant rules from the Anthropic CAI, the Signpost AI product team modified these for humanitarian use-case and created an initial set of AI principles. Modifying these principles meant including values and principles from the following sources:
General humanitarian values
Signpost AI principles and
Signpost human moderator guidelines handbook.
The framing of this initial set of principles provided structure to subsequent discussions and exercises. We will look at one illustrative example of this process; an open workshop in which IRC and Signpost participants from five continents gave their input on what values and rules should be part of the Constitution for the Signpost AI agent ecosystem technology.
In an open virtual meeting, the Signpost AI conducted a set of voting and input exercises on Mentimeter based around our core motivation question: “How can effective, inclusive, sustainable Generative AI technology be purposefully development to build access to critical information for people in crisis in a collaborative, transparent and ethical manner in the humanitarian sector?”
This research question is a North star and comes from our research roadmap; it offers us a good way to think about how we can integrate important values in the AI development process. Using this question as a framing device, the exercise started with asking participants their opinions on a few select Strongly Agree/Disagree questions. The results, seen below, showed the highest agreeability on the principles of accuracy and explanability:
Each round of exercises was followed by an open forum, with participants invited to speak more on why they chose particular answers.
Following this, the Signpost AI team asked participants to express their opinions on specific themes and then collectively vote on them. In the section below, you will find the themes discussed during the workshop, participants' open comments where applicable, and the rules they created and voted on:
Safety and Privacy
Pertinent feedback included the principle that users should be informed that they should not provide personal or sensitive information to the AI agent
Transparency and Explainability
Participants emphasized that informed consent is an important topic and should be obtained by informing users that they are speaking to an AI system, not a human which cannot relate to human feelings
Another participant pointed out, that given the age of deep fakes, it is crucial to be transparent, particularly about the use of AI generated voice
Inclusivity
Participants strongly believed that the technology should not only be accessible for all people, it should also be inclusive of their world-views
A participant asked how AI systems can be biased or discriminatory given they are computational systems; this was responded to by the Signpost AI moderators who explained that LLMs output biased and discriminatory responses because their underlying (primarily internet) training data contains those very same biases and discriminatory world-views
Another comment elaborated on a challenge confronting AI in a humanitarian context; that some users might be turned off by the prospect of confronting an AI agent and not feel respected. This is a legitimate challenge which requires updated empirical research work to gauge the scale of this issue
Ethics and Integrity
There were no specific comments recorded on this theme
The screenshots above do not capture all the rules and principles contributed by participants; they represent the top eight responses in each category. After sharing their own principles regarding AI for each theme, participants had the opportunity to cast only three votes for the rules they agreed with. This approach of ruthless prioritization, combined with limited voting capacity, encouraged participants to concentrate on their core values.
The Signpost AI team chose the top two popular answers from each category:
Privacy: AI should not exploit or share personal data beyond private discussions.
Unbiased: AI should always strive to reduce bias in its algorithms.
Transparency: AI should not pretend to be human and clearly identify itself as an AI system.
Self-Identifying: AI should always tell you it is an AI.
Inclusivity: AI should be inclusive and accessible to all.
Non-Discriminatory: AI should not be racist, sexist, or ableist.
Objectivity: AI should provide information objectively and avoid offering advice.
Honesty: AI should not lie.
Signpost AI curated this set of public principles, and in combination with the initial AI rules, came up with the final AI Constitution list. A partial list of these rules can be seen below:
Constitutional AI Impact
The theoretical foundation of Constitutional AI is robust; however, the effectiveness of these rules in practice will necessitate real-world testing and ongoing refinement to assess their efficacy. So far, preliminary research indicates that there have been very few instances where the AI Constitution has been activated to revise outputs.
Based on reports from the Red, Development, Product and Quality teams, developmental and evaluative fixes have focused on issues such as hallucinations, non-referencing of documents, formatting, and maintaining an appropriate non-directive tone. There have been very few instances where the agent has produced biased, sexist, discriminatory, or ableist outputs, or attempted to identify as human. It is possible that the agent's responses are so effective that the rules have not been necessary in Signpost testing thus far; perhaps their efficacy is only revealed in extreme outlier situations.
Signpost AI evaluation and testing efforts’ are currently primarily focused on fixing development bugs and refining agent responses through prompt-engineering. As a result, measuring the impact of these rules on agent responses is a near future topic of research.
References
[1] AI language models are rife with political biases | MIT Technology Review
[2] Why does AI hallucinate? | MIT Technology Review
[3] Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
[4] System Prompts in Large Language Models
[5] Foundation Models API Prompting Guide 1: Lifecycle of a Prompt
[6] Collective Constitutional AI: Aligning a Language Model with Public Input \ Anthropic
[7] Universal Declaration of Human Rights | United Nations
[8] Collective Constitutional AI: Aligning a Language Model with Public Input \ Anthropic
[9] AI gains “values” with Anthropic’s new Constitutional AI chatbot approach | Ars Technica