How Prompt Size Directly Impacts LLM Response Latency
Understanding the mechanics of Time to First Token (TTFT) and why those extra tokens may lead to poor user experience (UX).
Continue reading ...Understanding the mechanics of Time to First Token (TTFT) and why those extra tokens may lead to poor user experience (UX).
Continue reading ...Our inboxes contain dozens (if not hundreds) of newsletters we subscribed to during moments of curiosity, but we seldom read most of them. Manually unsubscribing is tedious: open each email, scroll to the bottom, click unsubscribe, confirm … repeat 50+ times.
This post covers a personal project developing an AI agent using the ReAct pattern to analyse newsletters I have subscribed to and recommend the ones to unsubscribe based on my reading behaviour.
Continue reading ...While large language models (LLMs), such as GPT-4 and Claude, are capable of extracting structured information from text, small language models (SLMs) have historically struggled to do so reliably. Previously, the only viable approach was to fine-tune a larger open-weights model using distillation. A week ago, there was an announcement, which appears to an alternative.
Continue reading ...Researchers have identified a cyber threat known as slopsquatting, also referred to as package hallucination, in which malicious actors exploit large language models (LLMs) tendency to generate non-existent package names during code generation. These hallucinated package names, when registered by attackers with malware payloads, create a new vector for software supply chain attacks—particularly within AI-assisted development workflows.
Continue reading ...Large language models (LLM) are made up of billions of parameters, thus posing challenges when loading them onto GPU memory for model inference or fine-tuning. This post briefly explains the challenges and describes a solution to load Mixtral 8x7B, a State-of-the-art (SOTA) LLM, onto consumer-grade GPUs, followed by using the model for NLP tasks such as Named Entity Recognition (NER), Sentiment Analysis, and Text Classification.
Continue reading ...