Community and Artificial Intelligence at Google I/O Connect
In the vibrant city of Amsterdam, an event has brought together more than 500 developers from all over the world for an unrivalled experience: I/O Connect. This event, which has captured the attention of the technology industry, focuses on one of the most exciting and cutting-edge topics of our time: artificial intelligence. I/O Connect has been the perfect stage for technology minds to exchange knowledge, make valuable connections and discover the latest trends that will drive the future of innovation.
We have had the opportunity to explore how this technology is transforming business and society at large. From machine learning algorithms to natural language processing, we've discovered Google's newest and most exciting AI tools. In addition, we were given access to live demonstrations and interacted with the latest technological creations.
Why LLMs?
We focus today on LLMs - an acronym for Large Language Models - for two reasons: the many challenges it impacts and the importance of starting to take the first steps towards technical control and moving from Demos to robust Products.
LLMs are Deep Learning models trained on datasets with huge amounts of text-formatted data: can cover a wide range of challenges, such as summarisation, translation, text, and code generation, and even complementing search and recommendation engines.
Bringing it back to reality, we talk about PaLM and its exploration of the parameters that control its API in one of its workshops, to offer the first step in moving from a Demo to a minimum viable product by controlling its parameters using the Python SDK to call the Vertex AI PaLM API. In addition, we explore the Responsible AI principles that this API puts at your disposal to avoid possible hallucinations and offensive generations.
The objective is to understand the possibilities it offers to make it more adaptable to the use cases we are dealing with.
In addition, each description includes the following:
Vertex AI PaLM API models:
The API offers 6 different language models with the first two blocks separated into functional criteria: whether they are language-oriented or code-oriented. The language-oriented ones are:
- Text-bison: It is the one that is associated with a lot of functions related to natural language processing tasks, it supports over 8000 input tokens which allows it to associate a decent context, and the training data is up to date as of Feb 2023 - ChatGPT's limitations state that its most used version is as of 2021. This model supports fine-tunning, or retraining.
- Chat-bison: A model retrained on Text-bison that supports less initial context, with generation similar to its analogue.
- Textembedding-gecko: Embeddings are transformations from text to numerical representation that are used as a prelude to training in language models.
For a first approximation of PaLM, we load the model, pass a prompt, and generate a response.
Figure 1. "Hello PaLM2. The three essential parts of API usage: we load a model, generate a prompt and a response. It is worth noting that the method of loading the model has the same semantics as the Huggin Face library of Transformers.
API control parameters
The initial control parameters are declared in the response as an argument. The text-bison model offers four parameters, to be explored depending on the use case.
- Temperature (range (0,1), default 0): used for sampling during response generation and controls the level of stochasticity in token selection. If this parameter is close to 0, it works for indications that require more deterministic and less open-ended responses. In comparison, if it is closer to 1 it can lead to more "creative" or diverse results. A temperature of 0 is decisive: the answer with the highest probability is always selected. For most use cases, it is recommended to start with a value of 0.2. ResponsibleAI tip: It should be noted that while the results may be more creative, they may also generate meaningless or inappropriate text.
- Max_output_tokens (range:1-1024, default 128): Is the maximum number of tokens that can be generated in the response. Specifies a lower value for shorter responses and a higher value for longer responses. A token can be smaller than a word. A token is approximately four characters long. 100 tokens correspond to approximately 60-80 words. It is essential to consider the size of the tokens, as the models have a limit on the number of input and output tokens.
- Top_p (range 0 to 1, default 0.95): The top_p parameter is used to control the diversity of the generated text and at low level changes how the model selects tokens for output. A higher value of the top_p parameter produces more "diverse" and "interesting" results, as the model is allowed to choose from a wider set of possibilities. In contrast, a lower value of the top_p parameter results in more predictable outputs, as the model is limited to a smaller set of possible tokens. Specify a lower value to avoid randomness.
- Top_k (range 0 to 40, default 40): top_k changes the way the model selects tokens for output. A top_k of 1 means that the selected token is the most likely of all tokens in the model's vocabulary (also known as greedy decoding). In contrast, a top_k of 3 means that the next token is selected from the 3 most likely tokens (using temperature). At each token selection step, the top_k tokens with the highest probabilities are sampled. The tokens are then further filtered according to top_p, and the final token is selected by temperature sampling.
Figure 2. Response loaded with the parameters with the vocation of generating the answer to the question: was it Frodo who destroyed the ring? Documentation on the parameters can be found here.
Sample Notebook here.
Conclusions: So far, we have loaded a model and learned more about the parameters that help us control the model. Finding out more about these parameters you can consult the documentation here. We now move on to the design of the inputs or prompts, with a series of best practices recommended by the Google Cloud team for the best mastery of the API.
Use case: Response generation with Vertex AI within the Question-Answer scenario
As mentioned above, there are multiple use cases that the pre-trained model can solve. Based on the different examples in the repository, we focus on functional question-answer problems. These models can solve problems associated with customer service, website chats, forums, etc.
However, in addition to the model, considering giving the model the optimal promtp can significantly influence the results. For this reason, several good practices are presented. Like the first two key concepts, we find that the prompt must be specific, concise, and rich in context, as well as free of grammatical errors and ask only one question at each prompt. We then classify the type of question depending on the domain:
- Open Domain: All questions whose answers are available online. They can belong to any category, such as history, geography, countries, politics, chemistry, etc. These include trivial or general knowledge questions, such as: Q. Who won the Olympic gold medal in swimming? Q. Who is the president of [particular country]? Q. Who wrote [specific book]? Be aware of the training limit of generative models, as questions involving information more recent than the date the model was trained may give incorrect or imaginative answers.
- Closed domain: specific questions, which correspond to an internal knowledge base not available on the public Internet. If correctly stated, the model is more likely to respond from the context provided and less likely to give answers beyond what is on the open Internet. If, for example, you want to build a question-and-answer bit based on the full documentation of a product, you can pass the full documentation to the model and ask it to answer only on that basis.
In both open and closed domains, we can use one or several questions depending on the specificity of the domain. In the case of closed domains, we also add a string as a context.
Figure 3. Within the context of Open Domain, we can pass a series of questions to give context to the model. In this case, geography, and history questions.
There are many particularities that you can explore in the open source repository that the Google Cloud team makes available to you.
In addition to learning, it has been a unique experience that has given us the opportunity to connect with the community. Congratulations to the organising team and to all attendees for their kindness and energy!