SHAFI ANWAR
What I Bring

What I Bring

The intersection of technical depth and customer-facing delivery.

Enterprise customer communication
Translating complex technical architecture into clear decisions for non-technical stakeholders - live, under pressure, with real money at stake.
Technical discovery and solution design
Starting with the operational problem, not the technology. Understanding how teams actually work before proposing what to change.
Workflow automation and AI implementation
Designing and shipping production AI systems end-to-end - from architecture through deployment, with real usage and real accountability.
Cross-functional stakeholder alignment
Coordinating across engineering, operations, and program management teams to drive initiatives from design through adoption - not just building, but getting things shipped.
Production ownership
Every system I have built runs with real consequences - 500+ weekly queries, financial decisions, compliance requirements. Not demos.
Independent judgment under ambiguity
Making decisions with incomplete information, surfacing trade-offs clearly, and staying accountable to outcomes rather than just deliverables.

Experience

Three years inside AWS. Every role pointed in the same direction.

The narrative: I started as the human who follows SOPs and handles cases. Then became the person who automates those SOPs entirely. I understand both sides - the human experience and the technical solution - which is why my solutions actually work and why I can communicate them to any audience.

Developer - AI Automation Engineer

current role

Amazon Web Services India


June 2025 - Present

Designing and building AI agents that automate real operational workflows - owned end-to-end from architecture through testing and deployment.

Part of the Tool Builder Academy program at AWS India. My work sits at the intersection of solution design and implementation: I identify a manual operational process, design an AI-powered replacement, author the decision logic, build the custom tooling, and validate it before production deployment.

The development approach is deliberate - I use AI-assisted coding (Amazon Q) to implement efficiently, keeping my focus on what determines whether a system actually works: the architecture, the edge case handling, the failure recovery design, and the business outcome. This is the same approach a Solutions Engineer takes with a client - understand the problem deeply, design the right solution, use the best available tools to build it.

Reseller Support Micro-Agent

running in production

An LLM-powered agent that replaced a 30-page manual SOP for handling reseller account queries - account linkage, billing disputes, and solution provider restrictions.

The core challenge was reliable intent classification across queries involving timing, negation, and context that keyword matching cannot handle. I solved this by authoring the SOP as a structured decision tree consumed by the LLM at runtime - so the same logic a senior engineer applies after years of experience runs consistently on every contact.

I brought direct operational knowledge to this build: as a Cloud Support Associate, I had personally handled hundreds of these exact cases manually. I had memorized the decision tree, understood every edge case, and become the go-to person on my team for reseller complexity. I did not design this agent from a spec. I designed it from experience.

Architecture

Customer query arrives
  → AWS automation platform
    → Amazon Bedrock (LLM) applies SOP decision tree
      → Intent classified: linkage timing / billing
        dispute / provider restrictions / preempt
        → Custom Python tool invoked
          → Real-time account data retrieved
            → Conditional routing applied
              → Resolution delivered to engineer
              → [Cannot classify] → OOD response
                → Human queue (full context preserved)

Key design decisions

  • LLM classification over keyword matching: handles paraphrasing, negation, and temporal context - “charges before linking” routes completely differently than “charges after linking”, something rule-based systems cannot reliably distinguish
  • OOD graceful fallback: when confidence is insufficient, the case releases to a human engineer with full context intact. Design principle: for an SOP that changes customer account state, a wrong automated answer is worse than no answer
  • JSON schema tool contracts: all custom tool input/output defined by explicit schema - makes agent behavior deterministic and testable
  • AI-assisted development (Amazon Q): implementation efficiency focused on architecture and logic, not boilerplate

What I owned

ComponentMy Contribution
Solution design and architectureDesigned full decision tree, routing logic, and fail-safe strategy
SOP authoringWrote the complete branching logic the LLM interprets at runtime
Custom Python toolsDesigned logic and built tools using AI-assisted coding (Amazon Q)
Tool specificationsDefined all JSON input/output contracts
Testing and validationValidated via CLI pipelines in sandbox environments
Stakeholder communicationPresented architecture decisions, implementation trade-offs, implementation risks, and progress updates to senior technical program management, helping align stakeholders across multiple automation initiatives
Python (AI-assisted via Amazon Q)Amazon BedrockPrompt EngineeringREST API IntegrationJSON Schema DesignLLM Intent ClassificationConditional RoutingCLI-Based Testing

Impact

  • Thousands of reseller contacts monthly routed consistently and instantly - no manual SOP reading
  • Handle time per case reduced significantly
  • Resolution quality consistent across all experience levels - newer agents get the same outcome as senior
  • Engineers freed for genuinely complex edge cases

Route 53 Service Quota Automation

testing - pre-production

A fully autonomous async workflow that handles Route 53 service quota increase requests end-to-end - from receiving the request through delivering the final resolution to the customer, with zero human intervention on the standard path.

The design challenge was building a system that spans hours or days across multiple independent components without losing state, blocking other engineers, or creating cases that could get permanently stuck. This required designing explicitly for failure - not just the happy path.

Architecture

SQIR request arrives
  → Identify quota type (1 of 7 Route 53 limits)
    → Validate eligibility against threshold criteria
      → Obtain explicit customer consent
        → Create internal service team ticket
          → Lock case (prevent concurrent modification)
            → Schedule 96-hour escape hatch timer
              (Amazon EventBridge Scheduler)
              → Monitor ticket via event streams
                (pub/sub - zero polling)
                → [Resolved] → Validate quota update
                  → Deliver resolution to customer
                → [96 hours elapsed] → Release to
                  human queue with full context
                → [Customer responds while locked]
                  → Release to human queue immediately

Key design decisions

  • 96-hour escape hatch via Amazon EventBridge Scheduler: a deterministic one-time timer that fires if the service team does not respond within SLA. Prevents cases from remaining locked indefinitely. Design principle: in any distributed async system, always design explicitly for the stuck state - never trust that downstream systems will respond
  • Event-driven monitoring (pub/sub pattern): agent subscribes to ticket lifecycle events via an EventBridge rule - near-real-time resolution detection with zero polling overhead
  • Case locking: prevents concurrent modification of customer account state while async processing is in flight - eliminates race conditions
  • Consent gate before any mutating action: explicit customer approval required before any quota change is submitted - not assumed from the initial request
  • Multiple release triggers: both the 96-hour timer and any customer correspondence while locked release the case to a human immediately - the automation never traps a customer in a waiting state

What I owned

ComponentMy Contribution
Architecture and LLDDesigned full async workflow, event-driven pattern, escape hatch logic
SOP authoringAuthored complete decision tree covering all 7 Route 53 limit types
Custom Python toolsBuilt using AI-assisted development (Amazon Q)
Testing and validationValidating all 7 limit paths including happy path, escape hatch, and correspondence scenarios
Program coordinationConsolidated onboarding status across all builder teams into a single tracking report for senior technical program management - providing visibility into progress, blockers, and dependencies across 10+ services

This Route 53 workflow is one of 10+ AWS services being automated under the same initiative. The async orchestration pattern and escape hatch design are consistent across all of them.

Python (AI-assisted via Amazon Q)Amazon BedrockAmazon EventBridgeEvent-Driven ArchitectureAsync Workflow OrchestrationMulti-Turn Agent DesignPub/Sub Event MonitoringEscape Hatch PatternREST API IntegrationPrompt Engineering

Impact

  • 10-15 minutes of manual work per request eliminated on the standard path
  • Days of async waiting and follow-up removed
  • Consistent handling across all 7 Route 53 quota types
  • Demoed to senior technical program management, April 2026

API Catalyst

poc - evaluated for adoption

A natural language search tool that lets automation builders find relevant internal AWS APIs from a catalog of 2,861 entries - and immediately verify whether each result is ready to use on the automation platform.

Built in response to a friction point I observed directly: builders were spending the first hours of every project manually searching documentation and messaging colleagues to find which APIs existed and whether they were actually available. API Catalyst eliminates that step.

Why LLM over keyword or semantic search: API names are cryptic by nature. Keyword matching fails for natural language queries. The LLM understands intent - a query for 'increase Route 53 limits' correctly maps to the right internal API even with zero word overlap in the name. Higher precision at slightly higher latency - the right trade-off for a discovery tool where accuracy matters more than speed.

How it works

  • Builder submits natural language query describing what they need
  • Amazon Bedrock interprets query and matches against structured JSON catalog of 2,861 APIs
  • Results returned ranked by relevance to the builder's specific use case
  • Platform onboarding status verified in real time - builder knows immediately what is available versus what needs onboarding
  • Prototype UI built with Streamlit

What I owned

Solo project - idea, architecture, all code, all prompt engineering, prototype UI, and demo presentation. Built working prototype in 7 days.

PythonAmazon Bedrock (Claude)StreamlitNatural Language UnderstandingJSON CatalogREST API IntegrationPrompt Engineering

Impact

  • Presented at IN Together Summit Hackathon, April 2026
  • Demoed to senior technical program management
  • Currently being evaluated for adoption across all builder teams
  • Addresses the first bottleneck in every automation project at AWS India

Cloud Support Associate

Amazon Web Services India


March 2024 - June 2025

Customer-facing AWS technical support. The role that taught me how enterprises actually experience cloud infrastructure - where they get stuck, what they need to hear, and how to move them forward under pressure.

I supported AWS customers directly - individual developers and small to mid-size businesses - on billing, account management, and service-related issues across EC2, S3, RDS, Lambda, VPC, IAM, CloudWatch, Cost Explorer, and more.

Every contact required me to understand the customer's AWS environment, diagnose their concern, and resolve it - often involving complex pricing structures, multi-region configurations, or billing relationships they did not fully understand. My job was to make AWS simple for customers who were overwhelmed by it.

That is the exact job description of a Solutions Engineer - just pointed at support instead of sales.

Case study

A small business customer's monthly bill jumped from approximately $200 to $1,400 overnight. Convinced they had been overbilled or compromised.

What I did:

  • Authenticated and de-escalated first - trust before troubleshooting
  • Pulled Cost Explorer - identified spike in EC2 and EBS in a region the customer had forgotten they launched test instances in
  • Diagnosed root cause - they had terminated instances but EBS volumes persisted. A classic AWS pattern that costs customers thousands daily
  • Explained clearly - walked through how AWS regions are independent, how EBS billing continues after instance termination, and why their test resources were still generating charges
  • Applied billing adjustment independently - using my direct adjustment authority ($500 USD per case, no management approval required), I processed a credit for the unintended EBS charges
  • Prevented recurrence - set up AWS Budgets with threshold alerts

15 minutes. Problem diagnosed. Credit applied. Customer educated. Trust earned.

Real-time technical translation

When a customer is frustrated about an unexpected bill, you have 30 seconds to earn trust. I learned to take complex AWS architecture - multi-region, data transfer pricing, reserved versus on-demand, consolidated billing hierarchies - and explain it to non-technical business owners, live, under pressure. This is the primary skill of a Solutions Engineer.

Operational AWS depth across 20+ services

I did not learn AWS from certifications. I learned it from real customer environments, real billing problems, and real mistakes people make with production systems. I understand not just what services do, but how customers actually use them and where they consistently get stuck.

Independent financial judgment

Direct billing adjustment authority up to $500 USD per case without management approval. Every case had real money at stake - process incorrectly and that is revenue impact; advise wrong and customers could delete production resources. I developed the judgment to know when to act, when to escalate, and when to push back - all while maintaining customer trust.

4.76 / 5

CCR score (target: 4.62)

25–35

customer contacts daily, simultaneous channels

Dual-channel

email plus calls or chats simultaneously

$500 USD

billing authority per case, independently

HR Operations Associate

Amazon Dev Center India


October 2021 - August 2022

Joined AWS directly after B.Com. Handled HR operations for Amazon's Employee Resource Center - the centralized HR shared services function supporting all Amazon employees globally.

My team handled leave of absence requests, disability accommodations, medical documentation, and return-to-work processes for employees across every Amazon business unit. Every case involved confidential medical information, strict compliance requirements, and real consequences for the employee on the other end.

The scale: Amazon's ERC supports 1.5 million-plus employees globally. Every process is standardized, measurable, auditable, and built for scale. Working inside it taught me how large organizations design operations that survive complexity - and where they break down.

SOP-to-automation thinking

My current work is converting complex SOPs into AI agents. Where did I first learn to read, interpret, and navigate 30-plus page branching SOPs? Processing leave cases with dozens of conditional paths: Is this FMLA? Does it qualify? What documentation is required? Which team handles this? The mental model is identical to what I now encode into agent decision trees.

Case lifecycle equals agent workflow

ERC runs on: intake, classify, route, act, resolve, document, close. My automation agents follow the exact same pattern. I did not learn this architecture from a textbook - I lived it processing hundreds of employee cases. That is why the workflows I design feel like real operational systems, not theoretical ones.

Knowing when automation should not act

Because I was the human handling sensitive cases, I understand which situations demand human judgment - not as a theory, but from experience. Cases involving active medical crises, disputes requiring nuanced interpretation, or situations where the policy answer and the right answer diverge. This is why every agent I build has a graceful fallback to a human: I know from direct experience what happens when automation tries to handle something it should not.

Education

B.Com (Hons) - Accounting and Finance
Tilka Manjhi Bhagalpur University, 2021