---
title: "image-annotation"
output: rmarkdown::html_vignette
author: Maximilian Weber
vignette: >
  %\VignetteIndexEntry{image-annotation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


Ollama also supports multimodal models, which can interact with (but not create) images.

We start by loading the package:


``` r
library(rollama)
```

After loading the package, we need to pull a model that can handle images.
For example, the [llava](https://llava-vl.github.io/) model.
Using `pull_model("llava")` will download the model, or just load it if it has already been downloaded before.


``` r
pull_model("llava")
#> ✔ model llava pulled succesfully
```

We can use textual and visual input together.
For instance, we can ask a question and provide a link to a picture or a local file path, such as `images = "/home/user/Pictures/IMG_4561.jpg"`.

In the first example, we ask the model to describe the logo of this package:


``` r
query("Excitedly desscribe this logo", model = "llava",
      images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png")
#> 
#> ── Answer from llava ─────────────────────────────────────────────────
#> The image you've shared is a vibrant and playful logo. At the center
#> of the design, there's an animated character that appears to be a
#> white, cat-like creature with blue eyes and ears. This character
#> seems to be in a relaxed state, laying on its stomach with its head
#> resting comfortably on one arm while the other arm is stretched out,
#> adding to the overall whimsical feel of the logo.
#> 
#> Above this character, there's a blue circular element with some sort
#> of design or text, but it's not clear enough for me to describe.
#> Below the character, the word "ROLLAM" is prominently displayed in
#> bold black letters, suggesting that this could be the name of the
#> entity represented by the logo.
#> 
#> The background of the logo features a light blue color, providing a
#> soft contrast to the central character and text elements. The overall
#> design of the logo suggests it might be for a gaming or
#> entertainment-related company or product, given the animated
#> character and playful aesthetic.
```

The second example asks a classification question:


``` r
query("Which animal is in this image: a llama, dog, or walrus?",
      model = "llava",
      images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png")
#> 
#> ── Answer from llava ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> The image features a character that appears to be a llama wearing a
#> blue helmet, lying on grass.
```