--- title: "image-annotation" output: rmarkdown::html_vignette author: Maximilian Weber vignette: > %\VignetteIndexEntry{image-annotation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Ollama also supports multimodal models, which can interact with (but not create) images. We start by loading the package: ``` r library(rollama) ``` After loading the package, we need to pull a model that can handle images. For example, the [llava](https://llava-vl.github.io/) model. Using `pull_model("llava")` will download the model, or just load it if it has already been downloaded before. ``` r pull_model("llava") #> ✔ model llava pulled succesfully ``` We can use textual and visual input together. For instance, we can ask a question and provide a link to a picture or a local file path, such as `images = "/home/user/Pictures/IMG_4561.jpg"`. In the first example, we ask the model to describe the logo of this package: ``` r query("Excitedly desscribe this logo", model = "llava", images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png") #> #> ── Answer from llava ───────────────────────────────────────────────── #> The image you've shared is a vibrant and playful logo. At the center #> of the design, there's an animated character that appears to be a #> white, cat-like creature with blue eyes and ears. This character #> seems to be in a relaxed state, laying on its stomach with its head #> resting comfortably on one arm while the other arm is stretched out, #> adding to the overall whimsical feel of the logo. #> #> Above this character, there's a blue circular element with some sort #> of design or text, but it's not clear enough for me to describe. #> Below the character, the word "ROLLAM" is prominently displayed in #> bold black letters, suggesting that this could be the name of the #> entity represented by the logo. #> #> The background of the logo features a light blue color, providing a #> soft contrast to the central character and text elements. The overall #> design of the logo suggests it might be for a gaming or #> entertainment-related company or product, given the animated #> character and playful aesthetic. ``` The second example asks a classification question: ``` r query("Which animal is in this image: a llama, dog, or walrus?", model = "llava", images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png") #> #> ── Answer from llava ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── #> The image features a character that appears to be a llama wearing a #> blue helmet, lying on grass. ```