Free Film "The History of Applied Behavior Analysis: Part 1" Now Available!

Behind the Wrapper: What You Need to Know About LLM-Powered Tools

This blog is a sample of our weekly newsletter, designed for ABA professionals who are building AI literacy skills. Subscribe here. Every week, we break down the foundational knowledge BCBAs and others in applied behavior analysis need to make informed decisions about the AI tools flooding our field.  

Behind the Curtain of AI Tools in ABA: What Every BCBA Needs to Know

Artificial intelligence is showing up everywhere, from session note summarizers to treatment plan generators, and many of these tools are being marketed directly to behavior analysts. Some claim to reduce clinical workload. Others say they help ensure compliance or automate decision-making. But how many of these tools are actually safe, secure, and appropriate for clinical use in applied behavior analysis (ABA)?


In this blog, we crack open the shiny shell and peek underneath the hood. Because when a tool claims to “use AI”, what that may mean is simply this: take your input (e.g., session notes, SOAP templates, behavior data) and pass it directly into a commercial large language model (LLM) like GPT, Gemini, or Claude, just as you might do in a public-facing chatbot. Unless they’ve taken specific steps to secure that process, your client’s information might be flying through third-party servers, stored without your knowledge, or even used to improve someone else’s model.

No security. No privacy controls. Done with a single line of code.

If you’re a BCBA, RBT, clinical director, or a university instructor shaping the next generation of analysts, it isn’t optional to understand what a “LLM wrapper” is. It’s an ethical imperative. Gaining competence around this topic starts by understanding one of the most overlooked distinctions in behavior analysis and artificial intelligence: the difference between a “wrapper” and a secure system. This is essential reading for navigating the new terrain of behavior analysis and artificial intelligence. Let’s unpack the difference between a “wrapper” and a secure system—and what you need to ask before using any LLM-powered feature in clinical or organizational work.

What Is a Wrapper Around a Large Language Model?

A large language model (LLM) is a type of artificial intelligence model trained on vast amounts of text data to generate human-like language. It works by predicting the most likely next word or phrase based on patterns it has learned from that data. LLMs don’t “understand” meaning—they generate responses by minimizing statistical error, not by reasoning or comprehension. Common examples include OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude. Many tools claiming to “use AI” are not building their own models. Instead, they simply pass your input (e.g., session notes, SOAP templates, audio transcripts, or other sensitive clinical data) directly into a general-purpose large language model (LLM) like OpenAI’s ChatGPT, Anthropic’s Claude, or Google’s Gemini.

These tools rely on a public API (application programming interface), often with minimal technical configuration.

It’s literally as easy as this:

from openai import OpenAI
response = OpenAI().chat.completions.create(
                model="gpt-5",
                messages=[{"role": "user", "content": user_input}]
)

That’s it. That single snippet of code can take your input and drive an entire “AI-powered” feature.

Your input could be sent, without redaction or encryption, to a third-party LLM provider, and a text response comes back. These tools are known as wrappers. They wrap a commercial LLM in a branded interface but offer no real modification, hosting, or secure infrastructure. The vendor may wrap this in a slick interface, slap on a brand name, and call it a “behavior plan generator,” “smart note writer,” or “clinical AI assistant.” But under the hood, it’s just a thin wrapper.

But what's the problem? Because it's this easy, and because venture capital and enterprise buyers have been chasing "AI solutions," there’s been a mad rush to build LLM-driven products by individuals and companies who don’t understand data privacy, don’t understand clinical risk, don’t understand how to align model outputs with behavior analytic principles, and in many cases, don’t even understand the models themselves.

What gets marketed as innovation is often just a quick integration, patched together by teams with no background in ethics, security, data science, or behavior science.

And that’s not just lazy. In clinical contexts, it’s dangerous.

 

Why It Matters: Risks to Clients, Providers, and Organizations

When behavior analysts unknowingly use LLM wrappers, especially in clinical settings, several ethical and legal concerns arise:

  1. Third-party exposure: Your data may be transmitted to servers outside of your vendor’s environment (i.e., HIPAA and FERPA violations occur instantly).
  2. Storage risk: Some LLM providers may retain or log data for debugging or training purposes (unless explicitly disabled).
  3. Lack of auditability: You can’t see what’s happening on the other side of the API, making it hard to ensure HIPAA or FERPA compliance.
  4. Consent violations: If the data includes client identifiers and you haven’t disclosed this processing to caregivers, you may be breaching informed consent or data protection laws.
  5. False assurance: A polished front-end can give the illusion of professionalism or compliance, even if the backend is insecure.

 

What Ethical AI in ABA Looks Like: Hosting Models in a Walled Garden

If thin wrappers represent the quick-and-risky route to deploying AI, private-hosted models represent the safer, more responsible alternative. Rather than sending sensitive data out to a third-party API, some vendors choose to create what’s sometimes referenced as a walled garden. In this secure, contained environment, an LLM runs entirely within the vendor’s own infrastructure. This can be done by deploying a language model (like LLaMA, Mistral, or Azure-hosted OpenAI models) inside a private cloud environment, often within Microsoft Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP). When done correctly, this setup ensures that:

  • All data remains inside the vendor’s cloud tenancy
  • Access is controlled, audited, and encrypted
  • Models are treated like any other enterprise-grade application subject to compliance reviews
  • Vendors can sign and enforce Business Associate Agreements (BAAs)

For example:

  • Microsoft Azure OpenAI Service allows enterprise customers to run GPT models in a region-locked, data-isolated environment with zero data retention and no model training on inputs by default.
  • Amazon Bedrock supports private deployment of open models (like Claude, Cohere, and LLaMA 3) with full encryption, IAM-based access controls, and integration into existing HIPAA or HITRUST-compliant workloads.

Vendors can also go one step further and fine-tune their own model instance using behaviorally sound, de-identified training data—ensuring outputs reflect the field’s values, not just Internet-average pretraining. This isn’t easy. It’s more expensive. It requires security expertise and infrastructure planning. But it’s also what clients and their sensitive data deserve. Because if you’re building tools for clinical use, your LLM shouldn’t live in the wild—it should live in a fortress.

Responsible vendors don’t just call the OpenAI API and call it a day. They build infrastructure to protect data before, during, and after model use. Whether you’re building or using documentation tools, note generators, or decision-support systems, these are the best practices every ABA-facing vendor should follow:

  1. Zero Data Retention Settings: Configure API calls to disable storage, logging, or training on user data.
  2. HIPAA-compliant Hosting: Use enterprise LLM instances with business associate agreements (BAAs).
  3. Prompt Redaction: Remove or mask all personally identifiable information (PII) and protected health information (PHI) before calling the model.
  4. Self-hosted LLMs: Run smaller language models entirely within the vendor’s secure environment, keeping all data on their servers.
  5. Encryption in Transit: Use TLS (HTTPS) and encrypted payloads for any data sent to or from APIs.
  6. Audit Logging: Keep a record of when and how LLMs were used, and what outputs were returned.
  7. Clinician-in-the-Loop Design: Ensure that model suggestions are never auto-applied. Instead, outputs are always reviewed by a qualified professional and a rigorous evaluation framework.
  8. Explicit Consent Language: Update client intake and tech consent forms to include LLM usage details.
  9. Data Segmentation: Keep AI-related data processing separate from core clinical systems unless all systems meet the same security standard.f

 

Eleven Questions to Ask Your Vendor

Even if you’re not writing code or training models, as a BCBA, you’re responsible for protecting client data and ensuring any tool you use aligns with ethical and legal standards. Fortunately, there are clear signs that distinguish a vendor who’s done this well from one who hasn’t. When you see “AI-powered” in a product you’re using (or considering), ask these 10 questions before uploading any sensitive data:

  1. Do you use a third-party LLM like OpenAI, Anthropic, or Google?
    → If yes, ask which one, and on what plan or instance.

  2. Is any client data sent outside your environment to LLM providers?
    → Ask for written confirmation.

  3. Do you use zero data retention settings in your API calls?
    → If not, your client’s personal data could be used to train someone else’s model (and arise in others’ outputs).

  4. Is the LLM hosted in a HIPAA-compliant environment with a signed BAA?
    → A BAA is a legal requirement for most clinical tools.

  5. Do you redact PHI before calling the model?
    → Raw session data or client identifiers should never be passed in directly.

  6. Can you provide a “model card” or white paper describing how the AI system works?
    → Transparency builds trust—and avoids snake oil.

  7. What are their output evaluation and output alignment frameworks?
    If they don’t take the time to understand what they built, you shouldn’t waste your time with them.

  8. Are outputs reviewed by a BCBA before being shown to clinicians or families?
    → The human-in-the-loop matters.

  9. Do you store the AI input and output, and if so, for how long?
    → If stored, it needs to be secure and governed.

  10. Has your AI system been independently audited for security and bias?
    → This shouldn’t be a “yes” or “no”. They should share the results.

  11. Do your consent forms inform clients and families about AI use?
    → Surprise AI is unethical AI.

If a vendor can’t clearly answer these questions, they shouldn’t be touching your data. Thin wrappers are easy to build, but hard to trust. As a field, we must demand more than functionality. We must demand transparency, security, and alignment with behavioral values. Because AI is not just a product—it’s an amplifier. And what it amplifies depends on who’s behind the curtain.

 

Final Thoughts: Don’t Just Use AI. Analyze It.

Behavior analysts are uniquely positioned to understand how systems shape behavior—and how models, data, and reinforcement histories define outcomes. If we apply that same lens to AI tools, we can move from being passive consumers to active evaluators. The real question isn’t “Should we use AI in ABA?” It’s: “How can we do it safely, transparently, and in ways that serve our clients—not just someone else’s bottom line?” Ethical behavior analysis in the age of AI starts with informed professionals. So start asking hard questions. Start reading the documentation. Start training your team. And remember: tools don’t make ethical decisions. People do.

 

Want to Learn More?

Ready to go deeper? Enroll in the Foundations of AI Literacy for Behavior Analysts course and earn 1.5 BACB® Ethics CEUs. Taught by Dr. David J. Cox, Ph.D., M.S.B., BCBA-D—an internationally recognized leader in behavioral data science—this course equips you with the critical thinking skills to assess AI tools ethically and effectively. Build the critical skills to evaluate AI tools confidently, ethically, and without needing a background in computer science. Earn 1.5 BACB® Ethics CEUs today, and unlock access to our advanced, role-specific tracks built for RBTs, BCBAs, Clinical Directors, Academics, and Organizational Leaders. Learn more or register here.

 


About the Authors: 

David J. Cox, Ph.D., M.S.B., BCBA-D is a full-stack data scientist and behavior analyst with nearly two decades of experience at the intersection of behavior science, ethics, and AI. Dr. Cox is a thought leader on ethical AI integration in ABA and the author of the forthcoming commentary Ethical Behavior Analysis in the Age of Artificial Intelligence. He’s published over 70 articles and books on related topics in academic journals and, quite possibly, the only person in the field with expertise in behavior analysis, bioethics, and data science.

Ryan O’Donnell, MS, BCBA is a Board Certified Behavior Analyst (BCBA) with over 15 years of experience in the field. He has dedicated his career to helping individuals improve their lives through behavior analysis and are passionate about sharing their knowledge and expertise with others. He oversees The Behavior Academy and helps top ABA professionals create video-based content in the form of films, online courses, and in-person training events. He is committed to providing accurate, up-to-date information about the field of behavior analysis and the various career paths within it.

 

Close

50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.