speaker 1: My name is inso, so today wewould like to go over asient tic AI asient tic language model as a progression of language model usage. So here is the outline of today's talk. We'll go over the overview of language model and how we use, and then the common limitations, and then some of the method that improve towards this common limitation. And then we'll transition into what is the agenent tic language model and its design patterns. So language model is a machine learning model that predicts the next coming word given the input text. As in this example, if the input is the students open there, then language model can predict what's the most likely word coming next as a next word. So if the language model is trained with the large corporate, it is predict, it is generating the probability of next coming award. In this example, as you can see, hooks and laptops have a higher probability than other words in the vocabulary. So the completion of this whole sentence could be the students open their books, and then if you want to keep generating the what's coming next, then we can turn them in as an input and then put it into the language model, and then language model continuously generating the next coming word, Bhow. These language models are trained largely two parts, pre training part and then post training part. And then first pre training portion is the one that language models are trained with large coppers, text collected from Internet or books or different type of text, publicly available text, and then trained with the next token or next word prediction objectives. So once the models is finished in this pretraining stage, models are fairly good at predicting any words coming next as a word, given the inputs. However, this type, the pretrain model itself, is not easy to use. So the hence the post training steps are coming. And then these post training stage would include instruction following training as well as the reinforcement learning with human feedback. And what this training stage means is we could prepare a data set such a way that specific instruction or question, and then the answers or the generated output, that is what the user would expect or more more related to the questions and answers. So that's how the models are trained so that it's easier to use and then also itrespond to specific styles. And then once this done, and then additional training method is aligning to human preference by using reinforcement learning with human feedback, which is using human preference to align the model by using rewards schemes. And let's take a quick look, really quick look on the instruction data set. This is the template that we would use to train the model in instruction following training phase. As you can see, there is a specific instructions will be substituted in and then expected output will be substituted in and then this is fed to the model and then model is only up train on the response part that is generating the generating the output based on the given instructions. All right. So language model that is trained on pre training stage and as well as post training stage is quite capable of generating text given instruction. Essentially, it has a lot of world knowledge that could easily generating the outputs. So these are rapidly developing. And then could these models are used in various of application domains that you and we use day to pay work, such as AI coding assistance or domain specific AI copilots, or most widely known chagpt and related conversational interfaces. And then in order to use these type of models, as for your applications or specific tools, you could use the cloud cloud based api calls towards the model provider or the model servers or some other ways that you could also host the models on your local your local machines or even mobile machines for models that are small enough or to host on this compute constraint environments. So what does it mean by using ati calls? So we step back. The language model is taking the input natural language, input text and then generating the output. So that means we need to prepare a certain, certain form of free form text, natural language text, as an instruction or question, then put them in a specific format that you could make an api call towards the model provider, and then the model provider takes that usually on a cloud environment, and then generate the output and then respond to your api calls with a generated output. Then your software around that, software that based on this model will parse the output and then use it as is? Or maybe you could make a follow up llm api calls to further generate the output. So the input to the model is, again, free form text. So how you prepare your input, also known as a prompting, is critical. So there are well known best practices, strategy, how you prepare your prompt, and here are some of them, and such as write a clear and very descriptive detailed instructions, thatwill help the model to generate the output that you want. And you could include a couple of examples, the form that you want to see, as in the style or form. And also you could provide the references of context, such a way that model rely on that context that you provide. And instead of just ask the model to answer right away, you could ask to give model to time to think about it, such as reasoning, enable reasoning, or using chain dot dot or comethod. And the next one is, instead of asking model really complex tests, you could break them down and then ask them chain in sequence, also chain the complex prompts. And the last one is something that as a good engineering practice, have a good way of systemic trace and logging will help you. And also automated evaluation is always helpful to develop your essential to develop your progress on your application. So let's take a quick look at on each items what that means to get more familiar with. So write clear and descriptive instruction as an example on the left, instead of asking short requyou could describe in detail, so that model knows what you are asking, because modoesn't understand model cannot read your mind. So that means you need to describe what the model, what you want the model to generate the output for you. So this is always useful for using language model in general, and include few shared examples, meaning that give model the example input and output that you would expect, as in this case, you as some type of consistent style output. Then what is the consistent style? You provide an input as example input and example output that you would ask. And then you can finally you ask your original questions. Then it's going to be the model will generate the output based on your input. So future examples is always helpful to generate output that you want you would want to generate. So provide relevant context and references. This is really helpful for many of the cases that are related to generating text based on some factual information. So llm can generate easily generate some incorrect output, also known as a hallucination for those topics that it doesn't know or is not too confident about. For those type of cases, providing context or references would always helpful. So here is the example prompt template that you would want to use in cases such as a retrieval of meditic generation that we'll look at in the following slides, saying that omanswer based on the input, based on the article that you you provide, you could substitute your related reference and then only answer based on these references. If the model cannot find the answer, you could just say that cannot find the answer, then mowill likely generate the answer based on your references. So this is important part that give models time to think. In other words, instead of ask direct question, youask model to think through it or come up with its own solution and then finally generate the finally compare and generate the output. So this is also known as a chain of thought. So here is one example that might not work in some medium size model. You could ask model saying that evaluate the stut's solution is correct or not, and then provide a solution description, and then finally keep a student solution. And since the system prompt or your original request is saying that just answer it is correct or wrong, model might not get it right. However, for the same model, that might not get the right answer. For this, if you prepare your prompt in a way that you could ask first, work out your own solution to the problem first, and then compare your solution to the student solution, then by doing this model will generate its own solution. And then, as it does, ithave an opportunity to provide good attention to all these inputs that it's from the original input as well as the output that it generated, and then turn towards the right answers. So reasoning chain upload is always helpful to to generate the output that you want to see. So here's an interesting one, probably easier to implement for your application. So instead of asking your request that includes a multiple tesks in one request, you could prepare your prompt in small, simple stages. So how you do is prepare simple prompt, then generate the output and then prepend the output to the next stage, two prompt, and then generate the output, and then again, prepare the output from the previous stage, and then generate the output in a third stage, like here, and then finally generate the output that youwant to see. So by doing this, you may need to do it manually, or this can be done by llm, as we'll see in the following slides. But having a simple clear task per each request would be a good way to do it. Okay, so this may not be obvious, but because many type of as with many engineering application or development, having a good way of keeping tracing, logging will definitely helpful for your development for debugging as well as auditing. So same principle applies to language model based development. So keep track of the log is always good. And so that also relates to having an automated evaluation from the early stage of your development will definitely help you. In other words, you need to prepare question and answer pair, ground truth answer pair, so that you could compare that against generated output. You could use a human to evaluate that, but that's usually costly and then time consuming. So you may use a language model as a judge, meaning that you could ask language model to evaluate model generated output as well with ground truth output. So that model can score the generated outputs quality so that you can use that against your currently developed your own applications. This will help. This will be very important because the language models are continuously and rapidly improving as well as the methodology and then tools that you are using for developing language model is also rapidly developing. In other words, without clear evaluation, it is hard to make forward progress or even hard to change the model, change the different type of models. Because the models are rapidly developing also means that the sumodels are rapidly deprecated, which means you may need to force to change your language model that you are using in your application. So having good evaluation methodology up front from the beginning will definitely help. All right, so this is simple idea, but will be helpful for many applications. So instead of taking your input prompt as is and then process essing, you may have some software or some model that you could detect the intention and then send it to a different prompt handlers. So or this is also known as the prompt router. So based on the input query type, you may need to use simple prompt together with simple language model, and then this will both help in terms of operation costs as well as generate more appropriate output together with more relevant prompt with the language model that is more capable of that type of query. All right. So maybe Petra there, this may be a good moment that we could take question if there's any, if you could. Thank you so much. speaker 2: inand. Such an inspirational talk already. Thank you so much. We will get to like more details about agenent tic AI just shortly. We did want to kind of provide you with some sort of a background and kind of the progression of like what has been going on in the field. But I think maybe let's ask one question that came up. It's a little bit specific, but it might be what more people are wondering about. And is there any optimal amount of data to perform a good training or like anything that you could advise people around kind of the data available or like data being used? speaker 1: Okay. Maybe I'll be short on this. So I assume the training here is meaning that fine tuning the llm as part of an additional training on top of open source language model, if Yeah depends on your task, it would definitely vary because it's hard to say one or the other. But if if you have your enough data set or text that you would want to see, then you may come up with a simple question and answer pair or instruction following data set format and then you could also make user language model to further generate more. So if you need it more, but I would think you would start with, say, tens or tens of samples of data set first and then see whether that makes the model al behave. What do you actually see it to be? Behave. And then you could add, based on the result or signals from the initial quick test, then you could Additionally add more data set data set samples. Possibly we use language model to augment it or synthetic data that you could create. Great question. speaker 2: Thank you so much. I see like questions started coming in, which is wonderful. I think we will pause the question for now, and we will try to get to as many as we can at the end. But please keep them coming. Definitely makes this session kind of more engaging and also let us know what you are interested in. Thank you. speaker 1: Thank you, Petra. All right. So so far, we've been looked at overview of language model, great, very powerful models that are out there, many models out there. And then how we use however, even there are still limitations for models that are available that are listed here, such as hallucination is a well known issue that models models can sometimes oftentimes generate incorrect or incorrect information, particularly if it's related to some computation or some other specific area. So this is also, this is a problem that is that we want to avoid in your application domain. And other thing is, there's always a knowledge cut up in data set preparation. So model provider, model creator, repair data set. However, they need to at some point cut up their data set collection and then use it so Momay not know, may have not seen the recent information or news as part of their prere training data set, lack of attribution, so Moal can answer a lot of world th knowledge questions and can answer those type of general questions. However, it may not tell, it's not going to tell you where they drew the answer from. Particular specific data source, data privacy is one fact that model al creator, you prepare their data set using publicly available data source. That means modhave not seen your proprietary data set from your organization or particular domain. And limited context length, although is rapidly increasing. However, it always its fine balance because providing a longer context will give more context information to the model or however, it comes with operational cost as well as the speed, the latency of latency in tax generations. So in order to address these common limitations, retrieval, augmented generation is one way to handle this, such as it could reduce the hallucination by using the actual related relevant reference. It also addressed the citation because it knows where this preference is coming from. And then this will allow you, as an application developer or system developer, to prepare systems so that you could use your own proprietary data set or the text. And then you could use the good use of a small number of context links, because it only select relevant data set. So how it works is you could pre index your your own data set or your own text by turn them into, turn them into smaller chunk, chunk of text and then convert them into embedding space using embedding model and then stage them as part of your database or vector database. And then when the request or query came, you could turn this query into embedding space so that you could do nearest neighboring search and then select k, top k, relevant information, relevant chunk, text chunk, and then place them as part of your prompt. Some of the slithat we see previously, as you put the reference as part of the prompt and then use that as as the model, only only make use of this reference. So this is one good way to make use of your own proprietary data. And then similar method is can be used in the actual AI search or so that instead of using index data set, you could also rely on web search or different type of search so that you could provide information as part of the index. And one of the things also mentioned here is there are many methods or ideas for retrieval. Al, generation mog commonly used method is something that we've just mentioned, we've just talked about, meaning that turn the tecchunk into embedding space and then do a nearest neighbing search. However, there are many methods, and then you could also use knowledge graph base. So if you could generate a knowledge graph from your text source, and then that could also help to extract the more relevant information, also known as a graph rack is one part of it. However, there's a many method, and then you may need to look into the right method or if the right method and make use of those. All right, tool usage. So language model being most widely used is a text in and text output, which means that it could answer many, many type of queries. However, it's not going to execute or extract information from the external. So that's where these tool usage came to rescue or also known as a function calling. So with this method, you could get real time information or you could actually do a computation by generating software or computer code. So what does it mean is let's look at an example here. So if you have a chatbot that if you ask, what is the weather in, say, San Francisco, then modo will not know it in itself. So however, if you tell mopreviously as part of the prompt, saying that if you are asked whether related question generated output in a form, that the software that parts the output can make an api call. So as in this example, model will generate the output in a way that, Hey, this is the case for tool usage. So it generates an output as in the form, as in the form here, get weather. And then input argument to this api call or function call is the place that we asked. So then software receives this text output from the model and then parse and then actually make an api call towards the weather provider and then get the weather information and then again, provide to back to the language model. And then language model will generate more human, friendly or helpful output based on this api based result. And for some cases, a model can also generate a software software code that can be executed as part of the sandbox outside of the language model by the software that is coordinating all these activities. All right, so asient language model, so there can be many definitions. One definition is it could interact with environment. So compared to simple language model usage, generally use simple language model usage, as you seen here, text input and text output agent agenent tic language model usage could be language model could do something with environment by generating two usage or retrieval requests. And then from the environment, anything outside of language model could generated, could provide an output, could provide and information that can be fed in, fed to language model as an observation. And then the whole thing, the asient tic language model, which includes language model at its core with the soatuaround, it will process process it and then also put them in a memory. And as with any, as with its conversational history, which can be taken as a memory. So this is one way of definition of agent tic language model. The other way to look at it is this agent language model usage can be defined as eccould reason as well as it could action, it could do an action, so also called react vision and action. So reasoning part, something that you could encourage model to reason about by method such by using a method such as chain of, and then doing an action using method that we had seen previous slides, as in retrieval or search engine, or actually using calculator by making an api call or different type of api call, such as weather api that we had seen, and also generating a Python code so that you could run as part of your box sandboxes. So by combining this reasoning and action model can do a lot more complex taks than simple input and output type of interaction. So here is let's look at a little more detail on this. What does it mean by reasoning and action? So reasoning part, instead of doing the test that is asked, you could prepare your prompt such a way that ask model to break down the task and then make a plan. So instead of instead of breaking down the task by yourself, as we've seen in the previous chathe prompt slide, you could ask moto break down and then prepare your task. So in other words, plan plan the actions and then based on that breaking down model can generate different actions by making an api calls or tool usage so that it could extract or collect additional information from the external world. And then by combining all this, put them in a memory so that it knows model knows what's been happening. And then based on that, finally draw an answer for you. So let's look at the concrete example here. So if you have customers for AI agent, then look how it might work. So as a customer ask I can I get a refund for product full, then agent tic system will break down this task, break down this request, test into the following four different type of actions, check the refund policy, check the customer information, and then check the products, then finally collect and then decide what to do in each step. Language model will speed out the api calls so that it could collect information. For example, check the refund policy language model, could ask a retrieval system against the pre indexed company policy refund policy, and from there it could retrthe information and then put them in its own context, and then using that, also request a customer order information. It could either ask customer back to in a chat format, collect more information, or you could look it up in the system because depends on how this chat system is prepared the same thing for the product so that it could collect more information and then finally draw the conclusion based on the policy and then product information as well as customer older information and then send a request to the follow up system as a api call as well as the send prepare, say, response draft. And then that's going to be handled by final approval. All right, so workflow is generally like this. So in a sense, agent tic language model al system is generally llm is making language model is making iterative calls by reviewing the document of teand, then making an external tool calls things that an example. If you want to do some research of certain matters, you could prepare your agent to do a research web search or different type of search and then keep summarize them, and then iteratively, and then finally prepare the report to you or to your system. Another example could be software assistant agent that you could ask the software agent, software agent or three I agent that asked the issues of certain type of software bugo issues, then this agent will look it up and review this issue, and then collect the relevant piece of code or files, and then review them, and then propose the output. Or it could also execute in its sandbox environment, and then test the correct, test the fix, and then get the output, and then iteratively try to find the fix, and then finally propose the tool request or the changes to users or developers. These are the ways that we can use language model in agent format and by doing iterative language model calls. So the difference main reasons and reasons and difference why these agent tic language model usage is getting more widely used is given that if you had the same model, if you ask just direct requests to the model, may not be able to handle it. However, if you put your task in this type of agent tic format or patterns, then mowill do more complex taeven using a model that may not be able to do it if you don't do this way. So that's one of the reason that agent tic language model is pushing the boundaries, so that the things that we can do with AI asiis more complex or different domains that we can rely on. All right. So here are the real world applications, software development, code generation or bug fixing or type of this type of development is widely investigated or being researched by different organizations as well as there are companies that trying to provide these services as decision on the right side, research and analysis that gather information, synthesize it and then provide a summary for the users. And then task automation is one of the area that agenent method could be used. All right. So to make it more clear, here are some of the design patterns that you could use. Use aggentic language model planning is critical because by asking a model to break down the task, to make it Sima task or clear taso, that the lmodel later can make an api called or use the tool usage. So planning is critical. Then reflection is something that model can generate in itself. And then the next next model call can criticize the output that actually came from the same model. So by doing this, the output could be improved. And then tool usages is something that anything, something that outside of the language model that you need a real time information or different type of information, then you can use this. And then multi agent collaboration is one way to handle this. Reflection is a pattern that quick to implement and then leads to good performance. And let's use a concrete example here. So if you want na refactor a programming code instead of ask model to improve it right away, if you do this pattern, as in this example, saying that you you ask model saying that here's is the code, and then check the code and provide constructive feedback, and then take the feedback to the second prompt, as in this example, you could also prepare prompts saying that here is the code and the feedback which came from the model itself, and then ask the model to big factor it. And then this way of reflection will likely generate better output or better picks for the code that you are asking to the model. Tool usage is something that we've seen at the four ask model to generate the api patterns so that you can use this api function prototype to make an actual code or if the task is related to actual computation or some different form, you could also ask model to generate a program as an output. And then you can run that on a safe sandbox environment that your software or software scafolding around the lmodel can execute. And then an input provide the execution output back to the model so that model can synthesize it. All right? So multi agent is an interesting way to implement or accomplish your complex task. So you could split up the task or you could split up your task and then assign those tasks in a different agent that are dedicated for specific taks. And then this agent is, in this case, in this context, could be just as a different prompt or different persona. So you could the prompt usually consists of you are a helpful AI agent. You could change that into a different persona to a different agent. And also you may or may not use the same model or the different model based on the test. So let's use a concrete example here. So if you build a multi agent system for smart home automation, you could create a different agent, climate control agent and lightning lighting control agent and so on. And then these are the sotipiece that includes a different prompt with a persona as well as handling external triggers. And then those are the ones that work internally and then that coordinate these agents, essentially a model al prompt together with soscaffolding around it, coordinate the whole activity. All right. So that brings up to our summary. So the the agent tic language model usage is a progression or extension to a existing language model usage method. So for those, for the best practice that you have used in language model, for simple cases, most of them are applicable. However, you could use different additional methods, such as more retrieval, search tool usage, and then prepare different type of prompts and then workflow, so that you could use language model as in its core, as a reasoning or smart in turn. And then you could use a tool usage or other retrieval method to interact with the external wall, and then combine these results, and then such a way that you could you could achieve a complex task instead of simple input and output type of language model usage. And that said, Petra, thank you so much enof. speaker 2: This has been really great. So so much information. Hopefully this is useful for everybody. We keep collecting the questions. We got so many. We will try to get as many as we can. But Yeah, please feel free to keep them coming. Maybe the first question for you in up and let's focus on the agenent tic AI is about the evaluation. And do you have some recommendations for good strategy for evaluating agents beyond just using an llm as a judge? It seems it should be a little bit more complicated to do the evaluation on agents. At least that's kind of the general notion, the questions, and people are wondering how that could be them. speaker 1: right? I think this is a great question. So just to make a quick context, llm as a judge is commonly used method that you actually use llm to evaluate the model al generated output against the ground truth answer or some type of preference information, which works great. And then why do we use and also I we use that as well for the I think one thing that I've recently tried is agent tic charging method, meaning that I use a reflection type of, reflection type of pattern that we have seen previously instead of just ask one question right away to llm as a judge is ask first llms, provide initial reference and then feedback. And then I also ask again another llm call or different prompt saying that, Hey, this is feedback from other other your junior engineer. If you are senior engineer, how would you compare the junior engineers evalualiation against the output that you are evaluating? So I find that this reflection pattern was helpful to get the better evaluation instead of just one shot llm as a judge output. However, I think there can be a more creative way to improve your evaluation stage using agent patterns because the evaluation, it is really, really important. I can't emphasize more because that will help you to advance fast or change models and different type of changing the prompt and so on and so forth. So I think that was a great question. speaker 2: Thank you so much, insob. We got a few questions about kind of augmenting the AI agents for specific uses and making sure that kind of they they get shaped in a way you need for your application, like on the technological side, like what is there to do? What is there to use? We got many questions kind of like going to this idea. So if you have some some kind of information, it would be helpful. speaker 1: right? Again, I think this is a great question. I would so one thing first thing came up to my mind is if you have a task which is simple already, which is simple, then you just use a language model, simple use cases. However, if you see a little more involved or complex test, you could experiment with a simple agent tic test, even if it's agent language model usage could be many, could be, you can define agent tic language model in many, many different ways. But if you have a little more involved task, you could do iterative language model called even. That could improve your output. So I think it all depends on your test case test, actual application domain. But instead of trying to apply, trying to look for the test that you could apply language model, turn that into, turn der around that, how this test cannot be solved with the simple cases. So simple language model usage. So try to apply simple k, simple usage first and then try to improve it using different patterns. Same for, I think, a slightly tangential but fine tuning cases. Also, if you could have a if you have a task that in you want to solve ld, try to use the uc model first and whether it makes sense or not. And then from there, you could decide how to do it. Then you could prepare small data samples and then try it and then make a progress and then a really quick iteration instead of trying to invest upfront too much. speaker 2: Thank you. And so few questions came in and abl can't like this is like a kind of giant question on its own. So I will kind of let you choose how you want to respond to that. But a lot of questions about kind of ethical considerations, how to avoid hallucinations, like how to avoid kind of like using data that might be like unethical or like there might be something kind of behind them. What would be your recommendation again, like this question is so big, but maybe like looking for something that you would say about this. speaker 1: All right. Yeah, another great question. Yes, due to its a probabilistic generation in nature, hallucination is always there, although a lot of people are working on it. So it's a problem as well as the the ents contents output has concern could be concerning. So I think a model provider themselves is checking the generated output in terms of different categories as well as your application builder yourself also adsome guardrails, guardrails being checking the output using small small language model so that you could quickly check the output or some type of, I guess. A criteria using classifier or some type of thing so that you could actually filter them out to see it could be either on the final generation stage or it could be in taking in the input stage that the query comes in so that you could actually avoid it. This can be can backfire, but I think as an enterprise or business, you may be on the safer side. So they're making sure input generated output as well as the input queries that being requested could be more safe or reasonable cases. So this is evolving. But I think of fundamentally, it is something that checthe output using type of classifier or even decota type model to. Thank you so much. And a little plugin. speaker 2: We do have generative AI program that also covers a lot of kind of the ethical aspects of kind of llms. It's not a technical program, but like I'm kind of reaching to people who are asking about this sort of very important questions. Maybe last question ends up for you. And you know we are not going to be endorsing, of course, like at Stanford. But like a question is like how to get started? Are there any open source models to recommend? Is there anything kind of people can do to kind of start testing this out? Like start playing with this? speaker 1: Great question. So one thing that as a take home message here is start simple and then experimented and then iterate it. So that and also agent tic language model is sounds complex, fancy, but it isn't progression and extension of language model usage. So to start, I think you could there are many language model usage framework and also agent tic framework that you could use. However, to start, I would suggest to use a playground type of environment, say a model provider generally have their playground that you could type your prompt or input so that you can see the output right away so that you can experiment it really quickly. And then once you get familiar with it and then use the api to make a call from your program and then see what's going on in that way, you gain insights as well as practice. Best practice ces for mpt, prompt preparing. So once you have that, you could intelligently decide whether it makes sense for you to actually continue your own code based or make use of widely available libraries. So in short, start simple work on a playground first and then make a simple api calls and then decide whether you want to use more extended libraries or just continue on way. This this applies for language model usage as well as agenent language model usage as well because there's also many agent tic language model framework out there. Thank you so much. speaker 2: And to close us out like in this session, in the last minute, in so you know the field is progressing so fast. There is so much happening. Basically every week we see in news about something new coming up in any of these fields. Are there any resources you study, you follow, anything you would recommend people to kind of keep track of to stay up to date in lms? Agenent? Take AI in this field. speaker 1: Difficult question, but great question there. I think it's a little hard, but I picked some some experts who are known this field. I follow them either in formally Twitter or YouTube channel. And then from there I get more upinformation and then do my own diggings. But I think find out a good starting point is good thing. And then I think a reference here, you can screenshot it. And this can be a good starting point because it includes some agent tic usage courses as well as a good type of courses, either from standport as well as different places. So Yeah, thank you so much int, this has been . speaker 2: really great. So much helpful information. And you know you're absolutely the best person to hear this room. Thank you so much for taking the time. Thank you also everyone who joined us live. Thank you for your time.