uform-gen2-qwen-500m Beta
Image-to-Text • unumUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
Usage
Workers - TypeScript
  export interface Env {  AI: Ai;}
export default {  async fetch(request: Request, env: Env): Promise<Response> {    const res = await fetch("https://cataas.com/cat");    const blob = await res.arrayBuffer();    const input = {      image: [...new Uint8Array(blob)],      prompt: "Generate a caption for this image",      max_tokens: 512,    };    const response = await env.AI.run(      "@cf/unum/uform-gen2-qwen-500m",      input      );    return new Response(JSON.stringify(response));  },} satisfies ExportedHandler<Env>;Parameters
Input
-  0stringBinary string representing the image contents. 
-  1object-  temperaturenumberControls the randomness of the output; higher values produce more random results. 
-  promptstringThe input text prompt for the model to generate a response. 
-  rawbooleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting. 
-  imageone of-  0arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values -  itemsnumberA value between 0 and 255 
 
-  
-  1stringBinary string representing the image contents. 
 
-  
-  max_tokensinteger default 512The maximum number of tokens to generate in the response. 
 
-  
Output
-  descriptionstring
API Schemas
The following schemas are based on JSON Schema
{    "oneOf": [        {            "type": "string",            "format": "binary",            "description": "Binary string representing the image contents."        },        {            "type": "object",            "properties": {                "temperature": {                    "type": "number",                    "description": "Controls the randomness of the output; higher values produce more random results."                },                "prompt": {                    "type": "string",                    "description": "The input text prompt for the model to generate a response."                },                "raw": {                    "type": "boolean",                    "default": false,                    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."                },                "image": {                    "oneOf": [                        {                            "type": "array",                            "description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",                            "items": {                                "type": "number",                                "description": "A value between 0 and 255"                            }                        },                        {                            "type": "string",                            "format": "binary",                            "description": "Binary string representing the image contents."                        }                    ]                },                "max_tokens": {                    "type": "integer",                    "default": 512,                    "description": "The maximum number of tokens to generate in the response."                }            },            "required": [                "image"            ]        }    ]}{    "type": "object",    "contentType": "application/json",    "properties": {        "description": {            "type": "string"        }    }}