<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Large Language Models | Jevi Waugh</title><link>https://jevi-waugh.github.io/tags/large-language-models/</link><atom:link href="https://jevi-waugh.github.io/tags/large-language-models/index.xml" rel="self" type="application/rss+xml"/><description>Large Language Models</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 01 Jan 2025 00:00:00 +0000</lastBuildDate><image><url>https://jevi-waugh.github.io/media/icon_hu7729264130191091259.png</url><title>Large Language Models</title><link>https://jevi-waugh.github.io/tags/large-language-models/</link></image><item><title>Fine-Tuning FLAN-T5 for Biomedical Lay Summarisation (BioLaySumm 2025)</title><link>https://jevi-waugh.github.io/project/flan-t5/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://jevi-waugh.github.io/project/flan-t5/</guid><description>&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">This page provides a &lt;strong>high-level overview&lt;/strong> only.&lt;br>
For full methodology, experiments, training scripts, and results, see the &lt;strong>GitHub repository&lt;/strong>.&lt;/span>
&lt;/div>
&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>This project focuses on &lt;strong>biomedical lay summarisation&lt;/strong>, translating expert-level
radiology reports into language accessible to non-experts. The work fine-tunes
instruction-tuned &lt;strong>FLAN-T5&lt;/strong> models on the &lt;strong>BioLaySumm 2025&lt;/strong> dataset and
systematically evaluates different adaptation strategies.&lt;/p>
&lt;h2 id="methods">Methods&lt;/h2>
&lt;p>Three optimisation strategies are explored:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Full Fine-Tuning (FFT)&lt;/strong>: updates all model parameters.&lt;/li>
&lt;li>&lt;strong>LoRA (PEFT)&lt;/strong>: updates a small set of low-rank adapter weights
(~2–3% of parameters).&lt;/li>
&lt;li>&lt;strong>Evolution Strategies (ES)&lt;/strong>: gradient-free optimisation via population-based
parameter perturbations.&lt;/li>
&lt;/ul>
&lt;p>Performance is evaluated using &lt;strong>ROUGE-1/2/L/Lsum&lt;/strong>, with analysis of compute cost,
convergence behaviour, and parameter efficiency.&lt;/p>
&lt;h2 id="key-findings">Key Findings&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>LoRA achieves comparable or slightly higher ROUGE scores&lt;/strong> than full fine-tuning
while training only ~2–3% of parameters.&lt;/li>
&lt;li>&lt;strong>FLAN-T5-Base + LoRA&lt;/strong> provides the best balance of quality and efficiency.&lt;/li>
&lt;li>Evolution Strategies offer fast iteration but underperform gradient-based methods
for this task.&lt;/li>
&lt;/ul>
&lt;h2 id="example-visuals">Example Visuals&lt;/h2>
&lt;!-- Replace with actual images if desired -->
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="LoRA adapter integration" srcset="
/project/flan-t5/lora_hu11866794988470897071.webp 400w,
/project/flan-t5/lora_hu15588838430717327006.webp 760w,
/project/flan-t5/lora_hu9012181816615325759.webp 1200w"
src="https://jevi-waugh.github.io/project/flan-t5/lora_hu11866794988470897071.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="Training curves comparison" srcset="
/project/flan-t5/lora_vs_fft_comparison_3epochs_hu14915178362955497661.webp 400w,
/project/flan-t5/lora_vs_fft_comparison_3epochs_hu12861419809013145994.webp 760w,
/project/flan-t5/lora_vs_fft_comparison_3epochs_hu12000932080310515424.webp 1200w"
src="https://jevi-waugh.github.io/project/flan-t5/lora_vs_fft_comparison_3epochs_hu14915178362955497661.webp"
width="760"
height="381"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="reproducibility--code">Reproducibility &amp;amp; Code&lt;/h2>
&lt;p>All experiments were run on &lt;strong>NVIDIA A100 GPUs&lt;/strong> using PyTorch and Hugging Face
Transformers. Training scripts, hyperparameters, datasets, and seeds are fully
documented in the repository.&lt;/p>
&lt;p>👉 &lt;strong>Full code and documentation:&lt;/strong>&lt;br>
&lt;a href="https://github.com/Jevi-Waugh/BioLaySumm-Flan-T5/tree/topic-recognition/recognition/FLAN-T5-Jevi-Waugh" target="_blank" rel="noopener">https://github.com/Jevi-Waugh/BioLaySumm-Flan-T5/tree/topic-recognition/recognition/FLAN-T5-Jevi-Waugh&lt;/a>&lt;/p></description></item></channel></rss>