Current location: Current location: Home > ai > GenAI Essentials – Full Course for Beginners - Ep179 Text

GenAI Essentials – Full Course for Beginners - Ep179

[ai] Time: 2025-07-11 12:05:22 Source: AIHackNode Author: cs Click: 186 times
this isjust a visualization that I've seen alot and ctmu explainedso uh this is how I kind ofremember what it is and so imagine youhave this waveline see the blue wavelineand that is um has a lot of data becauseit's nice and smooth and as we reducethe Precision we get this blocky kind ofthingum but the idea is that you still havethe same shape so it should perform thesame way now of course we're not workingwith signals here but hopefully thatvisualization kind of helps you rememberwhat quantization is why would we wantto quantize our model smaller models uhmean smaller size footprint fasterinference not always um as I you know Ilearned this this by actually doingquantization for real that when Iquantize a model I noticed the filesdidn't get smaller and the inferencewasn't faster however sometimesquantization will greatly reduce theused amount of resources like RAM and soyou might cut Ram like totally in halfand so quantization is worth it uh thedisadv advantage is potential loss inquality because you are compressing thequality of the uh the data U just thinkof a JPEG right you have a JPEG and ifyou lower the quality you get a smallerfile the loads quicker but um you knowit's not as accurate as the original oneum examples of quantization would be QLaura and also ggf files but often whenI see quantization it looks like a bigmathematical formula uh that isconverted um that I I personallycouldn't do only someone that is a datascientist can do it and quantizationtechniques greatly vary okay so um youknow seeing one does not look like theother it is not something for us meremortals it's something for uh maybe thatRola can do um and I if I had time I'dhave a followup video because she knowsquantization very well well but um yeahthat's[Music]quantization let's take a look atknowledge distillation and the reason Iwant to talk about this is becauseyou'll come across models that arecalled distilled models and that impliesthat they've gone through this knowledgedistillation process so this is when youtransfer Knowledge from a large model toa smaller model so that the smallermodel performs the same task faster anda lower resource cost knowledgeisolation goal is generally to produce asmall language model because you'remaking a faster smaller efficient modelum it is a complicated process and sothe the greatest simplication that I cangive to you for knowledge distillationis that you have predictions that aremade from the larger model and you haveground truth data and so between thesetwo things which we call soft targetsand hard targets we use that informationas we um uh train and we uh uh train thesmaller model to do something that lookssimilar to the teacher model um oneexample of performance wise is uh theneotron so the neotron is a model thathas been turned into minitron which isthe knowledge distilled version of it uhfrom whatever the 15 billion parameterone to an 8 billion parameter one tothen a 4 billion parameter one greatlyshrinking the size of that model but theidea is that it's supposed to perform asgood as theneotron um and so minitron is somethingyou might want to take a look at but ituses knowledge distillation and pruningso pruning can also be part of theprocess uh along with knowledgedistillation um to make those smallermodels[Music]okay hey everyone it's Andrew Brown andR is back and we're talking about themost popular topic of allquantization uh so we are going to uhjump right into it and so Rola hasprepared some slides as a talking pointuh for here R would you like to kick usoff yeah so we're talking aboutquantization which is the process ofreducing the Precision of the modelweights um and that helps usreduce uh the storage needed for the REMand needed for training and uh storageand the reason that's important isbecause of these size of the model sothis we're going to go through to giveyou some idea of um how big these thingsare and and why it matters toquantize and uh yeah it is a very hardword to say but uh so we have some uh orwell R has some geni size comparisons orconsiderations that are going to help usout here do I click on through am Iready to go and for those those who arewatching Rola could not share her screenso she sent me the slides and I'mcontrolling the slides uh but it allworks out because we're coordinated hereso uh first one here ml models are oftensized by numbers of parameters and thatequals the model weights so what are wetalking about here right so the machinelearning models are um a mathematicalmodel right um most llms area a neural network and what that is ifyou want to think about it abstractly isis a mathematical model it's a complexmathematical model um and part of thatmathematical model are what we callparametersMH and that's how we size that's one ofthe ways that we size models so um whenyou talk about a 7 billion parameterthat means it has s billion parametersand when you talk about a 70 billionparameter that's what we're talkingabout is the number of parameters umthat the model can uh that we can changein a model okay and just to word it inanother way uh that if it says sevenbillion there's seven billion numbersthat can be that can be changed okay yesyes the parameter is a is a number it'sa tunable number it's that that cancontrol the think of it as a knob thatumchanges that gets uh focused or tunedwith the data scene so that's whatreally um en CA thelearning and yeah those tunableparameters we we call them model weightsbecause they're weighted they're tunableto anumber yeah they are a number but theythey they change so if two models if thesame exact mathematical model or modelarchitecture sees two different sets ofdata it'll come up with two differentsets ofparameters uh so the next thing we haveis size ranges from one uh one parameterto two trillion is that two trillionabout two trillion yeah the gpts haveabout 1.8 trillion to be exact I Irounded it up to two but we're rightthere yeah how do you run those cuz I'mlike

(Editor in charge: html)

Related content
Wonderful recommendations
Popular Clicks
Friendly links