By the end of this module, you will be able to design scalable prompt systems, optimize token costs, implement A/B testing frameworks, and create enterprise-grade AI workflows that comply with AI Act regulations.
You will learn:- Token cost optimization techniques
- Prompt design systems and templates
- A/B testing and versioning strategies
- Agentic and complex workflow design
- Enterprise integration patterns
- Monitoring and optimization strategies
Exercise 4.1: Token Cost Optimization
Scenario: You have this prompt that costs €0.015 per execution (1500 tokens):
Analyze the following quarterly financial report of the company [COMPANY_NAME]
operating in the [SECTOR] sector for the year [YEAR] and quarter [QUARTER].
Please provide a comprehensive analysis that includes:
1. Financial performance compared to the same quarter of the previous year
2. Comparison with analyst forecasts
3. Profit margin analysis
4. Cash flow health assessment
5. Identification of positive and negative trends
6. Recommendations for investors
Report: [FULL_REPORT_TEXT]
Task: Optimize the prompt to reduce costs by 40% while maintaining quality.
- Apply compression techniques
- Remove redundancies
- Simplify language
- Test with different versions
- Measure quality/cost trade-off
Analyze financial report [COMPANY_NAME] - [QUARTER] [YEAR]:
1. Performance vs previous year
2. Vs analyst forecasts
3. Profit margins
4. Cash flow health
5. Trend +/-
6. Investor recommendations
Report: [REPORT_TEXT]
Format: key points, max 300 words.
Savings: From 1500 to ~900 tokens (40% reduction)
Applied strategies:
- Removal of redundant text
- Use of standard abbreviations
- Concise lists instead of full sentences
- Explicit length limit
Exercise 4.2: Prompt Design System
Task: Design a prompting system for a fintech company that handles:
- Suspicious transaction analysis (compliance)
- Customer responses about products (chat)
- Regulatory report generation
- Credit risk analysis
Requirements:
- 3-layer architecture (input, processing, output)
- Validation and fallback system
- Complete audit trail
- Cost control per department
- Real-time monitoring dashboard
- GDPR and AI Act compliance
Create architectural diagram and template for each module.
FINTECH AI SYSTEM ARCHITECTURE
================================
LAYER 1: INPUT SANITIZATION
├── PII Redaction Module
├── Format Validation
├── Rate Limiting
└── Request Logging
LAYER 2: PROCESSING ORCHESTRATION
├── Router (classifies request → appropriate module)
├── Compliance Module (suspicious transactions)
├── Customer Service Module (FAQ/chat)
├── Reporting Module (regulatory)
├── Risk Assessment Module (credit)
└── Cache Layer (frequent responses)
LAYER 3: OUTPUT VALIDATION
├── PII Leak Check
├── Compliance Check (no financial advice)
├── Fact Verification
├── Format Enforcement
└── Watermarking (audit trail)
TEMPLATE COMPLIANCE MODULE:
----------------------------
System: "You are a compliance system. Analyze transactions for unusual patterns.
Rules: Never suggest actions, only flag anomalies.
If uncertain → flag for human review."
Input: {transaction_data}
Output: JSON {risk_score: 1-100, flags: [], confidence: 0-1}
MONITORING:
- Cost per module/department
- Accuracy per task
- Response time P95
- Human override rate
- Compliance audit trail
Exercise 4.3: A/B Testing and Versioning
Scenario: You have 3 versions of a prompt for generating welcome emails:
- Version A: Formal, structured, security-focused
- Version B: Friendly, personalized, feature-focused
- Version C: Short, direct, next-steps focused
Task: Design an A/B testing system for:
- Defining success metrics (CTR, replies, retention)
- Creating statistically significant test groups
- Implementing random version rotation
- Collecting and analyzing data
- Deciding winning version with >95% confidence
- Creating deployment pipeline for new prompt
Specify sample size, test duration, stopping criteria.
A/B TESTING PLAN - WELCOME EMAIL PROMPTS
========================================
PRIMARY METRICS:
- Click-through Rate (CTR) on main link
- Reply Rate (user responses)
- Day 7 Retention (login after 7 days)
SECONDARY METRICS:
- Time to first action (minutes)
- Positive sentiment (response analysis)
- Unsubscribe rate
SAMPLE SIZE CALCULATION:
- Baseline CTR: 15%
- Minimum Detectable Effect: 2%
- Power: 80%, Confidence: 95%
- Sample per variant: 2,500 users
- Total: 7,500 users
TEST DURATION: 14 days
STOPPING RULES:
- Variant wins if p-value < 0.05 and lift > 2%
- Early termination if variant has +5% CTR with p < 0.01
DEPLOYMENT PIPELINE:
1. Test on 5% traffic (canary)
2. Gradual rollout 25% → 50% → 100%
3. Metric monitoring for regressions
4. Automatic rollback if CTR drop > 10%
VERSION CONTROL:
- Git for prompts (prompt-v1.2.3.md)
- Metadata: author, date, performance metrics
- Changelog: changes and estimated impact
Practical Applications: Agentic & Complex Workflows
Implementations of prompts that use external tools, code execution, and advanced integrations for complex professional scenarios.
"As an analyst, use code_execution to simulate tests on dataset: Variant A vs B. Iterate results, integrate privacy ethics. Output: Report with described charts."
"Build engine: Input user data, use browse_page for benchmarks. Generate dynamic segments, output algorithm pseudocode."
"Analyze reviews via X_semantic_search. Iterate with ML tool, output described dashboard."
"Optimize code: Use code_execution for benchmarks. Iterate for efficiency, integrate green computing."
"Hybrid low-code with custom: Use browse_page for tools, output architecture diagram."
"Contribute to repo: Fork, PR, integrate community feedback."