Glossary Table

Term	Definition	Notes
Foundation Model	Foundation models, sometimes known as base models, are powerful artificial intelligence (AI) models that are trained on a massive amount of data and can be adapted to a wide range of tasks. They serve as the base or building blocks for crafting more specialized applications. These more specialized applications are the main task for the AI Engineering teams.	The term "foundation model" was coined by the Stanford Institute for Human-Centered Artificial Intelligence (HAI) in 2021. On the Opportunities and Risks of Foundation Models – Stanford HAI (2021) Introducing Foundation Models – Stanford HAI
Frontier Model	A frontier model is an artificial intelligence system that represents the absolute cutting edge of capability at any given time. These highly advanced, general-purpose models push the boundaries of technology in areas like reasoning, multimodal understanding, and autonomous task execution.	External references explaining the concept. Frontier Models: The New Frontier of AI – Gen AI Summit EU - Valencia LLM vs. SLM vs. FM: Choosing the Right AI Model – IBM Technology
LLM	Large Language Model (LLM) is the technical term and conceptually the beginning of the named Foundation Models. A Large Language Model (LLM) is a specific, advanced type of machine learning model used within the broader field of Natural Language Processing (NLP). NLP is the overall field that enables computers to understand and process human language, while an LLM is huge trained model for achieving many NLP tasks, such as generating text, translation, and summarization, by using deep learning on massive datasets.	NLP based on Transformers is the foundation of LLMs. Attention Is All You Need – Vaswani et al. (2017), the Transformer paper What is a Large Language Model (LLM)? – Elastic
LMM	Large Multimodal Models (aka, MLLMs - Multimodal LLMs) expand the capabilities of traditional large language models (LLMs), which are primarily focused on processing and generating text. By integrating multiple types of data, LMMs enable more complex and versatile applications that require the synthesis and interpretation of both textual and nontextual information, such as text, images, video, audio, and more.	This term is related to the concept of Multimodal AI which refers to machine learning models capable of processing and integrating information from multiple modalities or types of data. These modalities can include text, images, audio, video and other forms of sensory input. A Survey on Multimodal Large Language Models – arXiv (2023) GPT-4V System Card – OpenAI (vision multimodal)
LxM	Large "x" Models family. An LxM is just an umbrella term for representing the LLM, LMM and further terminologies. It's another way of naming the Foundation Models (FM).	The Rise of LxMs – Chip Huyen Intro to Large Language Models – Andrej Karpathy
SLM	Small language models (SLMs) in this context are AI models that are trained on smaller, more focused amounts of data compared to large language models. Despite their smaller size, SLMs can perform a variety of tasks, such as text generation, summarization, translation, and classification. While they may not match the extensive capabilities of LLMs, SLMs are often more resource efficient and can be highly effective for specific, targeted applications.	Phi-2: The Surprising Power of Small Language Models – Microsoft Research SmolLM: Blazingly Fast and Remarkably Powerful – Hugging Face
AI Engineering	Refers to the process of building applications on top of foundation models. Many terms are being used to describe the process of building applications on top of foundation models, including ML engineering, MLOps, AIOps, LLMOps, etc.	AI Engineering (Book) – Chip Huyen, O'Reilly What You MUST Know About AI Engineering \| Chip Huyen
AI Agent	An AI agent is a computer program powered by artificial intelligence (AI) that can perform tasks autonomously to assist human users, even without definite instructions. Unlike other AI-powered software, such as chatbots, AI agents can operate outside of a specific prompt-based context. They can go outside of their training data and take a look around at the world, so to speak, to find information. Then they can, on their own, take actions based on that information in pursuit of a larger goal.	What are AI agents? - AWS AI agents: What they are and how they'll change the way we work - Microsoft Source
Agent Skills	Reusable capabilities that an AI agent can invoke to complete tasks reliably. See the dedicated page for a simple explanation, examples, and implementation references.	Agent Skills page
Agentic Workflow	An agentic workflow is an AI-driven process where autonomous agents plan, make decisions, and execute tasks using tools and feedback loops. Rather than relying on rigid step-by-step instructions or simply answering questions, an agentic workflow is goal-oriented. The AI assesses context, self-corrects errors, and adapts to changes in real time.	Agentic Design Patterns – Anthropic What are agentic workflows? – IBM
Vibe Coding	Vibe coding is a software development practice assisted by artificial intelligence (AI) where the software developer describes a project or task in a prompt to a large language model (LLM), which generates source code automatically. Básicamente, es crear código usando code assistant agents.	Vibe coding – Merriam-Webster Andrej Karpathy
Prompt Engineering	It is the art of asking a Generative AI model questions in natural language so that the model responds according to our needs.	Prompt Engineering Guide – DAIR.AI Prompt Engineering Best Practices – OpenAI
Prompt Context	Prompt context is background information or details provided in a prompt that help guide an AI's response to be more relevant and specific.	Providing Context in Prompts – Learn Prompting Anthropic's Prompt Engineering Guide – Anthropic
Context Window	The "context window" in GenAI refers to the amount of information a model can process and "remember" at one time, like a short-term memory. It is measured in tokens, which are pieces of words, images, or video. A larger context window allows the AI to handle longer conversations and larger documents by retaining more of the input and output without forgetting the beginning	Claude 3.5 Sonnet and its 200K context window – Anthropic Long Context Understanding in Gemini – Google Research
GenAI Token	A token is the smallest unit into which text data (or not text) can be broken down for an AI model to process. Whether a transformer AI model is processing text, images, audio clips, videos or another modality, it will translate the data into tokens. This process is known as tokenization. Large language models have varying token limits, which dictate the amount of text they can process at once, combining both the input prompt and the output completion. The token limit determines the model's "context window" the amount of information it can consider at any one time. So the number of supported tokens of a model is frequently used as a measure of the model power.	OpenAI Tokenizer – interactive tokenization tool Summary of Tokenizers – Hugging Face
Harness Engineering	Harness engineering refers to the practice of designing and building the scaffolding, tooling, and control infrastructure that surrounds an AI agent or model during execution. A harness manages the lifecycle of long-running agent tasks: orchestrating tool calls, handling retries and failures, enforcing timeouts, capturing intermediate state, and providing observability. Rather than focusing on the model itself, harness engineering focuses on the reliable, repeatable execution environment in which the model operates. In simple words: Harness Engineering is the discipline of designing the execution environment around an autonomous AI agent. It defines which tools the agent can call, where it gets information, how it validates its own decisions, and when it should stop. The guardrails of the AI agents.	Effective Harnesses for Long-Running Agents – Anthropic Harness Engineering – Martin Fowler
Ralph Loop	The Ralph Loop is an architectural pattern for autonomous software agents that operate in a continuous cycle. The RALPH acronym represents the following phases: Read (read context and current state), Act (execute actions or invoke tools), Loop (iterate until the goal is reached), Produce (generate an output or artifact) and Harness (control execution within a supervised environment). It is particularly relevant for long-running tasks where the agent must reason, act and self-correct iteratively with minimal human intervention.	The Ralph Loop – Geoffrey Huntley The Dual LLM Pattern for building AI agents – Simon Willison
LLM Temperature	LLM temperature is a hyperparameter that controls the randomness and creativity of a model's responses by scaling the probability distribution of the predicted next word	https://arxiv.org/html/2506.07295v1 https://www.ibm.com/think/topics/llm-temperature
Local Model	A local Model or LLM (Large Language Model) is an AI model that you download and run directly on your own hardware (PC or local server). Unlike cloud-based AI assistants (like ChatGPT or Claude), a local LLM operates entirely offline—your data stays on your device, and no internet connection is required to process requests.	https://ollama.com/ https://tengine.ai/blog/a-guide-to-running-ai-models-locally-what-local-really-means
KV Cache	A Key-Value (KV) cache is a fundamental optimization technique used during Large Language Model (LLM) inference. It stores the intermediate Key (K) and Value (V) vectors of past tokens in the model's attention layers, preventing the model from recomputing them from scratch for every newly generated token	https://huggingface.co/blog/not-lain/kv-caching https://arxiv.org/html/2603.20397v1 https://magazine.sebastianraschka.com/p/coding-the-kv-cache-in-llms
LLM Distillation	LLM Distillation is a specialized form of Knowledge Distillation (KD) that compresses large-scale LLMs into smaller, faster and more efficient models while preserving a significant portion of the performance. It enables lightweight models to approximate the capabilities of massive LLMs making them deployable on a broader range of applications and devices.	Knowledge Distillation of Large Language Models – arXiv (2023) DistilBERT: a distilled version of BERT – Hugging Face
TODO	TODO	TODO