Term Definition Notes
Foundation Model Foundation models, sometimes known as base models, are powerful artificial intelligence (AI) models that are trained on a massive amount of data and can be adapted to a wide range of tasks. They serve as the base or building blocks for crafting more specialized applications. These more specialized applications are the main task for the AI Engineering teams. The term "foundation model" was coined by the Stanford Institute for Human-Centered Artificial Intelligence (HAI) in 2021.
Frontier Model A frontier model is an artificial intelligence system that represents the absolute cutting edge of capability at any given time. These highly advanced, general-purpose models push the boundaries of technology in areas like reasoning, multimodal understanding, and autonomous task execution. External references explaining the concept.
LLM Large Language Model (LLM) is the technical term and conceptually the beginning of the named Foundation Models. A Large Language Model (LLM) is a specific, advanced type of machine learning model used within the broader field of Natural Language Processing (NLP). NLP is the overall field that enables computers to understand and process human language, while an LLM is huge trained model for achieving many NLP tasks, such as generating text, translation, and summarization, by using deep learning on massive datasets. NLP based on Transformers is the foundation of LLMs.
LMM Large Multimodal Models (aka, MLLMs - Multimodal LLMs) expand the capabilities of traditional large language models (LLMs), which are primarily focused on processing and generating text. By integrating multiple types of data, LMMs enable more complex and versatile applications that require the synthesis and interpretation of both textual and nontextual information, such as text, images, video, audio, and more. This term is related to the concept of Multimodal AI which refers to machine learning models capable of processing and integrating information from multiple modalities or types of data. These modalities can include text, images, audio, video and other forms of sensory input.
LxM Large "x" Models family. An LxM is just an umbrella term for representing the LLM, LMM and further terminologies. It's another way of naming the Foundation Models (FM).
SLM Small language models (SLMs) in this context are AI models that are trained on smaller, more focused amounts of data compared to large language models. Despite their smaller size, SLMs can perform a variety of tasks, such as text generation, summarization, translation, and classification. While they may not match the extensive capabilities of LLMs, SLMs are often more resource efficient and can be highly effective for specific, targeted applications.
AI Engineering Refers to the process of building applications on top of foundation models. Many terms are being used to describe the process of building applications on top of foundation models, including ML engineering, MLOps, AIOps, LLMOps, etc.
AI Agent An AI agent is a computer program powered by artificial intelligence (AI) that can perform tasks autonomously to assist human users, even without definite instructions. Unlike other AI-powered software, such as chatbots, AI agents can operate outside of a specific prompt-based context. They can go outside of their training data and take a look around at the world, so to speak, to find information. Then they can, on their own, take actions based on that information in pursuit of a larger goal.
Agent Skills Reusable capabilities that an AI agent can invoke to complete tasks reliably. See the dedicated page for a simple explanation, examples, and implementation references.
Agentic Workflow An agentic workflow is an AI-driven process where autonomous agents plan, make decisions, and execute tasks using tools and feedback loops. Rather than relying on rigid step-by-step instructions or simply answering questions, an agentic workflow is goal-oriented. The AI assesses context, self-corrects errors, and adapts to changes in real time.
Vibe Coding Vibe coding is a software development practice assisted by artificial intelligence (AI) where the software developer describes a project or task in a prompt to a large language model (LLM), which generates source code automatically. Básicamente, es crear código usando code assistant agents.
Prompt Engineering It is the art of asking a Generative AI model questions in natural language so that the model responds according to our needs.
Prompt Context Prompt context is background information or details provided in a prompt that help guide an AI's response to be more relevant and specific.
Context Window The "context window" in GenAI refers to the amount of information a model can process and "remember" at one time, like a short-term memory. It is measured in tokens, which are pieces of words, images, or video. A larger context window allows the AI to handle longer conversations and larger documents by retaining more of the input and output without forgetting the beginning
GenAI Token A token is the smallest unit into which text data (or not text) can be broken down for an AI model to process. Whether a transformer AI model is processing text, images, audio clips, videos or another modality, it will translate the data into tokens. This process is known as tokenization. Large language models have varying token limits, which dictate the amount of text they can process at once, combining both the input prompt and the output completion. The token limit determines the model's "context window" the amount of information it can consider at any one time. So the number of supported tokens of a model is frequently used as a measure of the model power.
Harness Engineering Harness engineering refers to the practice of designing and building the scaffolding, tooling, and control infrastructure that surrounds an AI agent or model during execution. A harness manages the lifecycle of long-running agent tasks: orchestrating tool calls, handling retries and failures, enforcing timeouts, capturing intermediate state, and providing observability. Rather than focusing on the model itself, harness engineering focuses on the reliable, repeatable execution environment in which the model operates. In simple words: Harness Engineering is the discipline of designing the execution environment around an autonomous AI agent. It defines which tools the agent can call, where it gets information, how it validates its own decisions, and when it should stop. The guardrails of the AI agents.
Ralph Loop The Ralph Loop is an architectural pattern for autonomous software agents that operate in a continuous cycle. The RALPH acronym represents the following phases: Read (read context and current state), Act (execute actions or invoke tools), Loop (iterate until the goal is reached), Produce (generate an output or artifact) and Harness (control execution within a supervised environment). It is particularly relevant for long-running tasks where the agent must reason, act and self-correct iteratively with minimal human intervention.
LLM Temperature LLM temperature is a hyperparameter that controls the randomness and creativity of a model's responses by scaling the probability distribution of the predicted next word
Local Model A local Model or LLM (Large Language Model) is an AI model that you download and run directly on your own hardware (PC or local server). Unlike cloud-based AI assistants (like ChatGPT or Claude), a local LLM operates entirely offline—your data stays on your device, and no internet connection is required to process requests.
KV Cache A Key-Value (KV) cache is a fundamental optimization technique used during Large Language Model (LLM) inference. It stores the intermediate Key (K) and Value (V) vectors of past tokens in the model's attention layers, preventing the model from recomputing them from scratch for every newly generated token
LLM Distillation LLM Distillation is a specialized form of Knowledge Distillation (KD) that compresses large-scale LLMs into smaller, faster and more efficient models while preserving a significant portion of the performance. It enables lightweight models to approximate the capabilities of massive LLMs making them deployable on a broader range of applications and devices.
TODO TODO TODO