Yannic Kilcher

Videos
About

May 02, 2024

OUTLINE:
0:00 - Intro
0:19 - Our next-generation Meta Training and Inference Accelerator
01:39 - ALOHA Unleashed
03:10 - Apple Inks $50M Deal with Shutterstock for AI Training Data
04:28 - OpenAI Researchers, Including Ally of Sutskever, Fired for Alleged Leaking
05:01 - Adobe's Ethical Firefly AI was Trained on Midjourney Images
05:52 - Trudeau announces $2.4billion for AI-related investments
06:48 - RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
07:15 - CodeGemma - an official Google release for code LLMs
07:24 - Mistral AI: Cheaper, Better, Faster, Stronger
08:08 - Vezora/Mistral-22B-v0.1
09:00 - WizardLM-2, next generation state-of-the-art-LLM
09:31 - Idefics2, the strongest Vision-Language-Model (VLM) below 10B!
10:14 - BlinkDL/rwkv-6-world
10:50 - Pile-T5: Trained T5 on the Pile
11:35 - Model Card for Zephyr 141B-A39B
12:42 - Parler TTS
13:11 - RHO-1: Not all tokens are what you need
14:59 - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

References:
https://twitter.com/ayzwah/status/1780263768968273923
https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/?utm_source=twitter
https://twitter.com/soumithchintala/status/1778087952964374854?t=Mb-mQvm4YIZ35pVpEijs6g&s=09
https://deepnewz.com/tech/apple-inks-50m-deal-shutterstock-ai-training-data
https://twitter.com/TolgaBilge_/status/1778598047821291793?t=zInlPDRZzozcz7-pjFSnyA&s=09
https://twitter.com/javilopen/status/1778821749792034911?t=oGLiMj6GQdKTuM6GbiYrAg&s=09
https://twitter.com/paulg/status/1781329523155357914?t=vCQT2mJf5BbtjdN1BMFYFQ&s=09
https://twitter.com/RichardSocher/status/1776706907295846628
https://www.cbc.ca/news/politics/federal-government-ai-investment-1.7166234
https://arxiv.org/pdf/2404.07839
https://huggingface.co/blog/codegemma
https://mistral.ai/news/mixtral-8x22b/
https://twitter.com/MistralAILabs/status/1780606904273702932?t=JlSCcYulpJL74pNJbtSZag&s=09
https://huggingface.co/Vezora/Mistr..

May 02, 2024

65 33:25

May 02, 2024

ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)

Paper: https://arxiv.org/abs/2403.07691

Abstract:
While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, fine-tuning Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to 12.20% on AlpacaEval2.0 (Figure 1), 66.19% on IFEval (instruction-level loose, Table 6), and 7.32 in MT-Bench (Figure 12). We release code and model checkpoints for Mistral-ORPO-α (7B) and Mistral-ORPO-β (7B).

Authors: Jiwoo Hong, Noah Lee, James Thorne

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

May 02, 2024

76 37:00

Apr 30, 2024

TransformerFAM: Feedback attention is working memory

Paper: https://arxiv.org/abs/2404.09173

Abstract:
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.

Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Apr 30, 2024

85 17:46

Apr 29, 2024

[ML News] Devin exposed | NeurIPS track for high school students

OUTLINE:
0:00 - Intro
0:21 - Debunking Devin: "First AI Software Engineer" Upwork lie exposed!
07:24 - NeurIPS 2024 will have a track for papers from high schoolers.
13:29 - Opus can operate as a Turing machine.
13:47 - An AI-Powered, Self-Running Propaganda Machine for $105
14:27 - TechScape: How cheap, outsourced labour in Africa is shaping AI English
16:25 - Is ChatGPT Transforming Academics' Writing Style?

References:
https://news.ycombinator.com/item?id=40008109&s=09
https://www.youtube.com/watch?v=tNmgmwEtoWE
https://www.youtube.com/watch?v=xE2fxcETP5E
https://twitter.com/itsandrewgao/status/1779369373737668669?t=omW3DvRNmZyce8oo0Ehf1g&s=09
https://twitter.com/0interestrates/status/1779268441226256500?t=tGwngUpChSD2YZ0VQDJHAA&s=09
https://twitter.com/thegautamkamath/status/1778580754785550819?t=Qq1nLUIOyfRfBbZ6BHdXPw&s=09
https://twitter.com/vipul_1011/status/1778619720964419930?t=225aakPnHb-ojIjveaWkkg&s=09
https://twitter.com/avt_im/status/1778913195408626110?t=UPtduAKTX1uvq8Wa_EQOWg&s=09
https://arxiv.org/pdf/2402.05120.pdf
https://twitter.com/ctjlewis/status/1779740038852690393?t=AhIQM4rBUim-IWEkXL7OVQ&s=33
https://www.wsj.com/politics/how-i-built-an-ai-powered-self-running-propaganda-machine-for-105-e9888705
https://twitter.com/ylecun/status/1780728376283521191?t=rbTfUT7IWzXy83fvr-f4hw&s=09
https://www.futureofhumanityinstitute.org/
https://www.google.com/search?q=alex+hern+guardian+delve&oq=alex+hern+guardian+delve&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRigATIHCAQQIRiPAtIBCDQ5NTVqMGo0qAIAsAIB&sourceid=chrome&ie=UTF-8
https://www.theguardian.com/technology/2024/apr/16/techscape-ai-gadgest-humane-ai-pin-chatgpt
https://arxiv.org/pdf/2404.08627.pdf

Apr 29, 2024

91 37:16

Apr 27, 2024

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Google researchers achieve supposedly infinite context attention via compressive memory.

Paper: https://arxiv.org/abs/2404.07143

Abstract:
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

Authors: Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal

If you want to support me, the best thing to do is to share out the content :)

Apr 27, 2024

85 31:18

Apr 25, 2024

[ML News] Llama 3 changes the game

Meta's Llama 3 is out. New model, new license, new opportunities.

Apr 25, 2024

118 18:00

Apr 18, 2024

Hugging Face got hacked

If you want to support me, the best thing to do is to share out the content :)

Apr 18, 2024

103 9:54

Apr 16, 2024

[ML News] Microsoft to spend 100 BILLION DOLLARS on supercomputer (& more industry news)

Some updates from industry in the Machine Learning world

If you want to support me, the best thing to do is to share out the content :)

Apr 16, 2024

108 27:31

Apr 14, 2024

[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)

A flurry of new models continues to appear.

If you want to support me, the best thing to do is to share out the content :)

Apr 14, 2024

116 56:15

Apr 09, 2024

Flow Matching for Generative Modeling (Paper Explained)

Flow matching is a more general method than diffusion and serves as the basis for models like Stable Diffusion 3.

Paper: https://arxiv.org/abs/2210.02747

Abstract:
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.

Authors: Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Subscrib..

Apr 09, 2024

123 44:04

Apr 08, 2024

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)

Paper: https://arxiv.org/abs/2402.14083

Abstract:
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard A∗ search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of A∗. This model is then fine-tuned via expert iterations to perform fewer search steps than A∗ search while still generating an optimal plan. In our training method, A∗'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10× smaller model size and a 10× smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.

Authors: Lucas Lehnert, Sainbayar Sukhbaatar, Paul Mcvay, Michael Rabbat, Yuandong Tian

If you want to support me, the best thing to do is to share out the content :)

Apr 08, 2024

161 26:59

Mar 27, 2024

[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

OUTLINE:
0:00 - Intro
0:15 - XAI releases Grok-1
2:00 - Nvidia GTC
4:45 - Comment of the Week
5:35 - Brute-forcing OpenAI model names
7:30 - Inflection AI gets eaten by Microsoft
9:25 - EU AI Act moving forward
11:45 - Advances in Robotics
14:00 - India retracts controversial advisory
14:30 - OpenSora
15:20 - Improved Gemma fine-tuning
16:20 - Decoding encrypted LLM traffic
17:45 - Varia

References:
https://x.ai/blog/grok-os
https://github.com/xai-org/grok-1
https://finance.yahoo.com/news/nvidia-debuts-next-generation-blackwell-ai-chip-at-gtc-2024-205825161.html?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAHYRVePPrDnH3HxPV8smDzUiia_ztWttteAmHKxy-x_Z75lqq2trR4Exwq2sFyjNQojO_95xWvqQFHkV3NI_IKmw9W8XZ7d52qBsdvqaDRkdNzBSzQhnskzUE_E-nDo6OFG0LmrM0ygvjqLgJyhMDnraaGHrUsb98kknjn7-83MJ
https://spectrum.ieee.org/nvidia-gr00t-ros
https://twitter.com/anshelsag/status/1769989302552031473?t=DYAFhri4cu55LMwJV4V99A&s=09
https://twitter.com/ibab_ml/status/1769770983924142475
https://twitter.com/arthurmensch/status/1769842867621581299?t=sYPy011kN9KxzdnA11M4yQ&s=09
https://twitter.com/arithmoquine/status/1770136393563378082?t=FgH3-TABR73QVUQuP5wq2g&s=09
https://files.catbox.moe/od9pyb.txt
https://techcrunch.com/2024/03/19/after-raising-1-3b-inflection-got-eaten-alive-by-its-biggest-investor-microsoft/
https://archive.ph/p4W1N#selection-2463.23-2463.114
https://www.instagram.com/reel/C4df3DZg1wj/?igsh=MWQ1ZGUxMzBkMA%3D%3D
https://techcrunch.com/2024/03/15/mercedes-begins-piloting-apptronik-humanoid-robots/
https://www.axios.com/2024/03/14/humanoid-robot-army-agility-digit-amazon-warehouse
https://techcrunch.com/2024/03/15/india-drops-plan-to-require-approval-for-ai-model-launches/
https://github.com/hpcaitech/Open-Sora
https://www.reddit.com/r/LocalLLaMA/comments/1bd18y8/gemma_finetuning_should_be_much_better_now/
https://twitter.com/felix_red_panda/status/1769363356094230837?t=JMMb3OldqfhhCH8X5e7ljA&s=09
https://twitter.co..

Mar 27, 2024

166 26:49

Mar 18, 2024

[ML News] Devin AI Software Engineer | GPT-4.5-Turbo LEAKED | US Gov't Report: Total Extinction

Your weekly dose of ML News

OUTLINE:
0:00 - Intro
0:15 - Devin: AI software engineer
5:50 - Mira Murati on Sora training data
6:50 - Inflection accused of copying Claude
9:00 - Tools & papers
16:30 - GPT-4.5-turbo mystery
17:30 - US government report: total extinction by AI
19:20 - Various other news

References:
https://www.cognition-labs.com/introducing-devin
https://twitter.com/cognition_labs/status/1767548763134964000?t=ZECIn-uqbguwHtY8X_Gvtw&s=09
https://news.google.com/stories/CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2lWMUwyU0N4RnVWM3pSRWhWX01pZ0FQAQ?hl=en-US&gl=US&ceid=US%3Aen
https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant?embedded-checkout=true
https://www.bloomberg.com/authors/AQWHkoPod9g/ashlee-vance
https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant?srnd=undefined&embedded-checkout=true
https://www.bloomberg.com/news/newsletters/2024-03-12/cognition-ai-s-devin-assistant-can-build-websites-videos-from-a-prompt?srnd=undefined&embedded-checkout=true
https://archive.ph/5LZV9
https://github.com/opendevin/opendevin
https://twitter.com/MetaGPT_/status/1767965444579692832?t=dsYKmPfOBVGCFCwvPtZVWQ&s=09
https://docs.deepwisdom.ai/main/en/DataInterpreter/detail.html?id=AppleStockPriceAnalysisAndPrediction
https://docs.deepwisdom.ai/main/en/guide/use_cases/agent/interpreter/intro.html
https://github.com/geekan/MetaGPT/tree/main/examples/di
https://inflection.ai/inflection-2-5
https://twitter.com/seshubon/status/1765870717844050221
https://twitter.com/inflectionAI/status/1766173427441049684
https://www.mlxserver.com/
https://huggingface.co/spaces/mlabonne/AutoMerger
https://github.com/microsoft/aici
https://github.com/google-research/google-research/tree/master/fax
https://github.com/stanfordnlp/pyvene
https://arxiv.org/pdf/2403.06634.pdf
https://twitter.com/mattshumer_/status/1767606938538295757?t=1dYect5ylg9xrWSS4sL38Q&s=p;s=..

Mar 18, 2024

162 53:14

Mar 11, 2024

[ML News] Elon sues OpenAI | Mistral Large | More Gemini Drama

#mlnews #ainews #openai

OUTLINE:
0:00 - Intro
0:20 - Elon sues OpenAI
14:00 - Mistral Large
16:40 - ML Espionage
18:30 - More Gemini Drama
24:00 - Copilot generates spicy images
26:55 - Gemma bugs
28:45 - Varia

References: https://gist.github.com/yk/0c065cdc8e414738abfaae4f8e417e00

Thumbnail pictures: Wikipedia

If you want to support me, the best thing to do is to share out the content :)

Mar 11, 2024

156 0:59

Mar 09, 2024

On Claude 3

Mar 09, 2024

163 15:11

Mar 07, 2024

No, Anthropic's Claude 3 is NOT sentient

No, Anthropic's Claude 3 is not conscious or sentient or self-aware.

References:
https://www.anthropic.com/news/claude-3-family
https://twitter.com/_akhaliq/status/1764673955313459560?t=gkBx2uTXfrxLl-5_mL7Btg&s=09
https://twitter.com/idavidrein/status/1764675668175094169?t=pJfbN3LtKaxsU8egz83Mvg&s=09
https://twitter.com/TolgaBilge_/status/1764754012824314102?t=9bakXDnVMC1oAEyZFoKimA&s=09
https://twitter.com/karinanguyen_/status/1764670019743690757?t=gkBx2uTXfrxLl-5_mL7Btg&s=09
https://twitter.com/alexalbert__/status/1764722513014329620
https://www.lesswrong.com/posts/pc8uP4S9rDoNpwJDZ/claude-3-claims-its-conscious

If you want to support me, the best thing to do is to share out the content :)

Mar 07, 2024

161 42:33

Mar 03, 2024

[ML News] Groq, Gemma, Sora, Gemini, and Air Canada's chatbot troubles

Your dose of ML News!

OUTLINE:
0:00 - Intro
0:20 - Gemma & Gemini
3:40 - Groq
6:30 - Nvidia EOS Supercomputer
7:15 - Gpulist.ai
8:20 - Demis Hassabis on scale
10:10 - Hardware wars
12:05 - Sora
15:10 - Gemini 1.5 Pro & Long Context
18:45 - Air Canada must pay for chatbot mistake
23:30 - Giant Rat Balls
26:25 - Various News

Mar 03, 2024

178 17:35

Feb 26, 2024

Gemini has a Diversity Problem

Google turned the anti-bias dial up to 11 on their new Gemini Pro model.

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a..

Feb 26, 2024

161 50:02

Feb 21, 2024

V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video (Explained)

#vjepa #meta #unsupervisedlearning

V-JEPA is a method for unsupervised representation learning of video data by using only latent representation prediction as objective function.

Weights & Biases course on Structured LLM Outputs: https://wandb.me/course-yannic

OUTLINE:
0:00 - Intro
1:45 - Predictive Feature Principle
8:00 - Weights & Biases course on Structured LLM Outputs
9:45 - The original JEPA architecture
27:30 - V-JEPA Concept
33:15 - V-JEPA Architecture
44:30 - Experimental Results
46:30 - Qualitative Evaluation via Decoding

Blog: https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/
Paper: https://ai.meta.com/research/publications/revisiting-feature-prediction-for-learning-visual-representations-from-video/

Abstract:
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model’s parameters; e.g., using a frozen backbone, our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

Authors: Adrien Bardes Quentin Garrido Xinlei Chen Michael Rabbat Yann LeCun Mido Assran Nicolas Ballas Jean Ponce

If you want..

Feb 21, 2024

150 1:23:58

Feb 21, 2024

What a day in AI! (Sora, Gemini 1.5, V-JEPA, and lots of news)

Your regularly irregular dose of Machine Learning News!

W&B Course on LLM Structured Outputs: https://wandb.me/course-yannic

OUTLINE:
0:00 - OpenAI Sora
3:25 - Gemini 1.5 with 1 Million Tokens context window
4:50 - V-JEPA
6:50 - Sam Altman raises 7 TRILLION dollars for AI chips
9:30 - Sponsor: Weights & Biases course on Structure Output from LLMs
11:30 - Bard becomes Gemini
13:55 - GOODY-2: The world's most responsible model
16:05 - miqu-1-70b leaked from Mistral
18:25 - Zuckerberg on Meta's open approach to AI models
21:40 - 1X advances robotics
23:30 - Questions around Bard's arena leaderboard position
27:00 - Various other news

References:
https://gist.github.com/yk/65fe3d582a43540a61718b9e4b0706d0
(they were too long for this description)

If you want to support me, the best thing to do is to share out the content :)

Feb 21, 2024

185 54:23

Feb 05, 2024

Lumiere: A Space-Time Diffusion Model for Video Generation (Paper Explained)

#lumiere #texttovideoai #google

LUMIERE by Google Research tackles globally consistent text-to-video generation by extending the U-Net downsampling concept to the temporal axis of videos.

OUTLINE:
0:00 - Introduction
8:20 - Problems with keyframes
16:55 - Space-Time U-Net (STUNet)
21:20 - Extending U-Nets to video
37:20 - Multidiffusion for SSR prediction fusing
44:00 - Stylized generation by swapping weights
49:15 - Training & Evaluation
53:20 - Societal Impact & Conclusion

Paper: https://arxiv.org/abs/2401.12945
Website: https://lumiere-video.github.io/

Abstract:
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Disc..

Feb 05, 2024

191 35:26

Jan 22, 2024

AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)

#deepmind #alphageometry #llm

AlphaGeometry is a combination of a symbolic solver and a large language model by Google DeepMind that tackles IMO geometry questions without any human-generated trainind data.

OUTLINE:
0:00 - Introduction
1:30 - Problem Statement
7:30 - Core Contribution: Synthetic Data Generation
9:30 - Sampling Premises
13:00 - Symbolic Deduction
17:00 - Traceback
19:00 - Auxiliary Construction
25:20 - Experimental Results
32:00 - Problem Representation
34:30 - Final Comments

Paper: https://www.nature.com/articles/s41586-023-06747-5

Abstract:
Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning1,2,3,4, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges1,5, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 20..

Jan 22, 2024

188 34:31

Jan 14, 2024

Mixtral of Experts (Paper Explained)

#mixtral #mistral #chatgpt

OUTLINE:
0:00 - Introduction
3:00 - Mixture of Experts
6:00 - Classic Transformer Blocks
11:15 - Expert Routing
17:00 - Sparse Expert Routing
22:00 - Expert Parallelism
25:00 - Experimental Results
31:30 - Routing Analysis
33:20 - Conclusion

Paper: https://arxiv.org/abs/2401.04088

Abstract:
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.

Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitt..

Jan 14, 2024

199 3:39

Jan 11, 2024

Until the Litter End

https://litter.ykilcher.com

If you want to support me, the best thing to do is to share out the content :)

Jan 11, 2024

187 31:45

Jan 08, 2024

LLaMA Pro: Progressive LLaMA with Block Expansion (Paper Explained)

Note: The H800 is a variant of the H100 for the Chinese market

OUTLINE:
0:00 - Introduction
5:30 - Adding new blocks to LLaMA
15:00 - Block expansion
27:40 - Experiments
30:40 - Conclusion

Paper: https://arxiv.org/abs/2401.02415
Other Paper: https://proceedings.mlr.press/v162/shen22f/shen22f.pdf

Abstract:
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

Authors: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar...

Jan 08, 2024

Created 4 years, 11 months ago.

406 videos

Category Science & Technology

Yannic Kilcher

Most Viewed