Research Papers

Date Keywords Institute Paper
1955-08AI ProposalDartmouth CollegeA Proposal for the Dartmouth Summer Research Project on Artificial Intelligence
2017-06TransformersGoogleAttention Is All You Need
2018-06GPT 1.0OpenAIImproving Language Understanding by Generative Pre-Training
2018-10BERTGoogleBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019-02GPT 2.0OpenAILanguage Models are Unsupervised Multitask Learners
2019-09Megatron-LMNVIDIAMegatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019-10T5GoogleExploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019-10ZeROMicrosoftZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2020-01Scaling LawOpenAIScaling Laws for Neural Language Models
2020-05GPT 3.0OpenAILanguage models are few-shot learners
2021-01Switch TransformersGoogleSwitch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021-08CodexOpenAIEvaluating Large Language Models Trained on Code
2021-08Foundation ModelsStanfordOn the Opportunities and Risks of Foundation Models
2021-09Zero-Shot PromptingGoogleFinetuned Language Models Are Zero-Shot Learners
2021-10T0HuggingFace et al.Multitask Prompted Training Enables Zero-Shot Task Generalization
2021-12GLaMGoogleGLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2021-12WebGPTOpenAIWebGPT: Browser-assisted question-answering with human feedback
2021-12RetroDeepMindImproving language models by retrieving from trillions of tokens
2021-12GopherDeepMindScaling Language Models: Methods, Analysis & Insights from Training Gopher
2022-01COTGoogleChain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022-01LaMDAGoogleLaMDA: Language Models for Dialog Applications
2022-01MinervaGoogleSolving Quantitative Reasoning Problems with Language Models
2022-01Megatron-Turing NLGMicrosoft&NVIDIAUsing Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2022-03InstructGPTOpenAITraining language models to follow instructions with human feedback
2022-04PaLMGooglePaLM: Scaling Language Modeling with Pathways
2022-04ChinchillaDeepMindTraining Compute-Optimal Large Language Models
2022-05OPTMetaOPT: Open Pre-trained Transformer Language Models
2022-05UL2GoogleUnifying Language Learning Paradigms
2022-06Emergent AbilitiesGoogleEmergent Abilities of Large Language Models
2022-06BIG-benchGoogleBeyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022-06METALMMicrosoftLanguage Models are General-Purpose Interfaces
2022-06JEPAMetaA Path Towards Autonomous Machine Intelligence
2022-09SparrowDeepMindImproving alignment of dialogue agents via targeted human judgements
2022-10Flan-T5/PaLMGoogleScaling Instruction-Finetuned Language Models
2022-10GLM-130BTsinghuaGLM-130B: An Open Bilingual Pre-trained Model
2022-11HELMStanfordHolistic Evaluation of Language Models
2022-11BLOOMBigScienceBLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022-11GalacticaMetaGalactica: A Large Language Model for Science
2022-12OPT-IMLMetaOPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
2023-01Flan 2022 CollectionGoogleThe Flan Collection: Designing Data and Methods for Effective Instruction Tuning
2023-02LLaMAMetaLLaMA: Open and Efficient Foundation Language Models
2023-02Kosmos-1MicrosoftLanguage Is Not All You Need: Aligning Perception with Language Models
2023-03LRUDeepMindResurrecting Recurrent Neural Networks for Long Sequences
2023-03PaLM-EGooglePaLM-E: An Embodied Multimodal Language Model
2023-03GPT 4OpenAIGPT-4 Technical Report
2023-04LLaVAUW-Madison&MicrosoftVisual Instruction Tuning
2023-04PythiaEleutherAI et al.Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
2023-05DromedaryCMU et al.Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
2023-05PaLM 2GooglePaLM 2 Technical Report
2023-05RWKVBo PengRWKV: Reinventing RNNs for the Transformer Era
2023-05DPOStanfordDirect Preference Optimization: Your Language Model is Secretly a Reward Model
2023-05ToTGoogle&PrincetonTree of Thoughts: Deliberate Problem Solving with Large Language Models
2023-07LLaMA2MetaLlama 2: Open Foundation and Fine-Tuned Chat Models
2023-10Mistral 7BMistralMistral 7B
2023-12MambaCMU&PrincetonMamba: Linear-Time Sequence Modeling with Selective State Spaces
2024-01DeepSeek-v2DeepSeekDeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2024-02OLMoAi2OLMo: Accelerating the Science of Language Models
2024-05Mamba2CMU&PrincetonTransformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
2024-05Llama3MetaThe Llama 3 Herd of Models
2024-06FineWebHuggingFaceThe FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
2024-09OLMoEAi2OLMoE: Open Mixture-of-Experts Language Models
2024-12Qwen2.5AlibabaQwen2.5 Technical Report
2024-12DeepSeek-V3DeepSeekDeepSeek-V3 Technical Report
2025-01DeepSeek-R1DeepSeekDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2025-02Research PlatformOpenReviewOpenReview - the standard and official platform for publishing and reviewing papers in top conferences (ICLR, NeurIPS, and others)
2025-03Preprint RepositoryCornell UniversityarXiv (cs.LG/cs.CL) - the primary source of cutting-edge preprints in AI/LLMs
2025-04Benchmarks & SOTAMeta AIPapers with Code - tracking papers, benchmarks, and reproducible state-of-the-art results
2025-05NLP ProceedingsACLACL Anthology - official repository of NLP/LLM papers from ACL, EMNLP, NAACL, and more
2025-06Research DiscoveryAllen Institute for AISemantic Scholar - indexing, discovery, and tracking of relevant AI research papers
2025-07AI ConferenceICLRICLR (International Conference on Learning Representations) - one of the top annual conferences in ML/LLMs with high impact
2025-08AI ConferenceICMLICML (International Conference on Machine Learning) - a leading annual global conference in ML and foundation models
2025-08Research postHugging FaceThe Smol Training Playbook
2025-09AI ConferenceNeurIPS FoundationNeurIPS (Conference on Neural Information Processing Systems) - a global benchmark conference for cutting-edge AI advances
2025-10AI ConferenceAAAIAAAI Conference on Artificial Intelligence - a leading annual international AI conference
2025-11AI ConferenceACLACL (Annual Meeting of the Association for Computational Linguistics) - a flagship conference for NLP and LLM research
2025-12AI ConferenceEMNLPEMNLP - a top annual conference in applied NLP and recent LLM advances
2026-01AI ConferenceNAACLNAACL - a top-tier recurring conference in NLP and language models
2026-02AI ConferenceIJCAIIJCAI (International Joint Conference on Artificial Intelligence) - a historic and globally recognized reference conference in AI
2026-02Agent SkillsAnthropicAgent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality
2026-03LeWorldModelYann LeCunLeWorldModel: Stable End-to-End JEPA from Pixels
2026-05Agentic SearchZhuofeng LiBeyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
2026-06Research postOpenAIDreaming: Better memory for a more helpful ChatGPT
2026-06Research PostThariq Shihipar (Anthropic)The Unreasonable Effectiveness of HTML.

AI Books

Cover Book Description Publication Date
The LLM Engineering Handbook cover The LLM Engineering Handbook
Paul Iusztin & Maxime Labonne
Production-focused guide to RAG, evaluation, deployment, observability, and optimization for real-world AI systems. 2024
AI Engineering cover AI Engineering
Chip Huyen
Practical book by Chip Huyen on building and shipping reliable applications with foundation models. 2025
Designing Machine Learning Systems cover Designing Machine Learning Systems
Chip Huyen
End-to-end treatment of the ML lifecycle, from data and modeling to deployment, monitoring, and scaling. 2022
Building LLMs for Production cover Building LLMs for Production
Louis-François Bouchard & Louie Peters
Focuses on architecture, evaluation, latency, reliability, and deployment for customer-facing LLM products. 2024
Build a Large Language Model (From Scratch) cover Build a Large Language Model (From Scratch)
Sebastian Raschka
Hands-on walkthrough of tokenization, embeddings, transformers, training pipelines, and inference in PyTorch. 2024
Hands-On Large Language Models cover Hands-On Large Language Models
Jay Alammar & Maarten Grootendorst
Project-oriented coverage of embeddings, fine-tuning, retrieval, prompt design, evaluation, and deployment. 2024
Prompt Engineering for LLMs cover Prompt Engineering for LLMs
John Berryman
Advanced prompting methods including Chain-of-Thought, ReAct, few-shot prompting, and optimization patterns. 2024
Building Agentic AI Systems cover Building Agentic AI Systems
Anjanava Anand
Guide to agent architectures, tool use, memory, planning, orchestration, and multi-agent workflows. 2025
Prompt Engineering for Generative AI cover Prompt Engineering for Generative AI
James Phoenix & Mike Taylor
Practical frameworks to improve reliability and quality of outputs across modern generative AI applications. 2024
The AI Engineering Bible cover The AI Engineering Bible
Comprehensive AI Engineering Reference
Broad reference on AI engineering workflows, tools, deployment strategies, and production best practices. 2025