Advisory

For teams building production agents.

I help teams reason through context architecture, agent reliability, evaluation, prompt engineering, and the translation layer between domain experts and engineering. The useful work is usually before scale: naming the actual failure modes, deciding what context belongs where, and turning expert judgment into something the system can use.

Email muratcan.koylan@outlook.com with a short note: what you are building, who uses it, what currently breaks, and what decision you need to make.

Good Fit

I am useful when

You are building production agents, but quality degrades when real users, messy context, or handoffs enter the system.
Your team has prompts, tools, and documents scattered across the stack, and no clear context architecture.
You need evaluation rubrics that reflect expert judgment instead of generic thumbs-up / thumbs-down QA.
Your non-technical team knows the workflow deeply, but engineering is missing the tacit rules that make the work good.
You are deciding whether to build, buy, fine-tune, or wrap an AI tool around an existing business process.

Tracks

Two different advisory surfaces

For AI engineering and product teams

Context architecture review

Audit what enters the model: instructions, files, memory, tools, retrieved context, user state, and intermediate agent outputs.

Multi-agent system design

Review when to use handoffs, when to use tool-style consultation, and where a single agent with better tools is the simpler answer.

Evaluation and rubrics

Turn expert judgment into scoring frameworks, acceptance criteria, test cases, and feedback loops that can be inspected.

Prompt engineering systems

Review instruction hierarchy, dynamic prompt composition, tool protocols, and prompt patterns that need to hold under production use.

Memory and routing systems

Design what should persist, where it should live, how long it should survive, and what should be ignored.

For marketing and ops teams adopting AI

AI vendor evaluation

Separate useful capability from polished walkthroughs. Define tests around your real data, workflow constraints, and failure tolerance.

Marketing and ops workflow design

Map where AI should assist, where humans should stay in the loop, and which manual steps are worth automating.

Expert knowledge extraction

Interview operators, strategists, or domain experts and translate their implicit rules into usable agent instructions and rubrics.

GTM and research automation

Design research, enrichment, outbound, and reporting workflows that use agents without turning the stack into fragile glue code.

Engineering-to-operator translation

Help technical and non-technical teams define the same problem, constraints, and success criteria before implementation starts.

Engagement Shapes

Ways to work

System review

A focused review of prompts, agent flow, context loading, tools, evaluation, and failure cases. Usually starts with docs, traces, and one live walkthrough.

Advisory sprint

A short working block around one problem: agent reliability, persona design, evaluation, memory routing, vendor assessment, or workflow architecture.

Workshop

A practical session for engineering, marketing, ops, or leadership teams. Best when the team brings a real workflow or agent to dissect.

Ongoing advisory

Periodic review for teams already building. Useful when architecture, product decisions, and operator feedback are moving at the same time.

Grounding

What this is based on

Clinical AI context systems

At Sully.ai, I work on context engineering for AI Scribe and agent harnesses across medical specialties.

Marketing expert agents

At 99Ravens AI, I built multi-agent architecture, persona layers, agent skills, prompt engineering systems, and evaluation rubrics for strategy workflows.

Agent Skills for Context Engineering

Open-source patterns for context, memory, tools, evaluation, and multi-agent systems. Cited in Meta Context Engineering and mapped in the Agent Harness Engineering survey.

Long-horizon task briefs

Specification method for autonomous runs that span hours, context windows, or parallel workers: success predicates, non-counting outcomes, evidence contracts, and fresh-context review.

GTM and research systems

Built AI SDR, research, and enrichment workflows across Clay, Apollo, HubSpot, Firecrawl, Make, and LLM pipelines.

Specialized model work

Fine-tuned an 8B memory-routing model and built reusable pipelines for style transfer and task-specific model training.

Not A Fit

Where I am probably not useful

Generic prompt packs with no workflow, data, or evaluation plan.
Growth hacks where the goal is volume rather than system quality.
Replacing your engineering team. I can review architecture and help shape the system, but implementation still needs owners.
Compliance certification, legal review, or security audit work that requires a formal vendor.