Multimodal prompting

[full_width]

Multimodal prompting

Multimodal prompting: How to prompt AI with text, images, documents, and data

Prompting started with text.

People typed questions, instructions, and requests into AI tools and received written answers. They asked AI to write emails, explain concepts, summarise articles, create ideas, and improve documents.

But AI is no longer limited to text.

Many modern AI tools can now work with multiple types of input, including text, images, PDFs, screenshots, spreadsheets, charts, audio, video, and data files. This is called multimodal AI. The word “multimodal” simply means that the AI can understand or generate more than one type of content.

For example, you can ask AI to:

summarise a PDF,
explain a chart,
analyse a screenshot,
describe an image,
extract key points from a document,
review a spreadsheet,
create an image from a text description,
suggest improvements to a slide,
compare two visuals,
or turn a table into a written explanation.

This changes how prompting works.

A simple text prompt may ask: Explain online learning.

A multimodal prompt may ask: Analyse this uploaded report on online learning. Summarise the key findings, identify risks, extract important statistics, and create five recommendations for school leaders.

The second prompt uses a file as context. The AI is not only responding to your words. It is also using the uploaded material.

This article explains how to use multimodal prompting effectively. It covers prompting with documents, images, screenshots, spreadsheets, charts, slides, audio, video, and AI-generated visuals. It also explains common mistakes, privacy concerns, and best practices.

LLM KNOWLEDGE CENTRE PROMPTING KNOWLEDGE CENTRE AI KNOWLEDGE CENTRE AI CAREER CENTRE

NATURAL INTELLIGENCE UPDATES TOOLS MODELS AI TURBOCHARGER - NEW COHORT - ENROL NOW

1. What is multimodal prompting?

Multimodal prompting means giving AI more than one kind of input.

Instead of using only text, you may provide:

a written instruction,
a PDF,
a photo,
a chart,
a table,
a spreadsheet,
a screenshot,
a slide deck,
an image,
an audio transcript,
or a video summary.

The AI then uses your prompt and the uploaded material together.

For example: Look at this image and describe what is happening.

Or: Read this PDF and summarise it for a beginner.

Or: Analyse this spreadsheet and identify trends, outliers, and possible errors.

Or: Review this slide and suggest how to make it clearer for a business audience.

In each case, the prompt tells the AI what to do with the material.

The uploaded file gives the AI the content. The quality of the result depends on both.

If you upload a document and only say: Summarise this.

You may get a basic summary.

But if you say: Summarise this document for senior managers. Focus on decisions required, risks, financial implications, and next steps. Keep the summary under 500 words.

You will get a more useful result.

That is the heart of multimodal prompting: the file provides information, but the prompt provides direction.

2. Why multimodal prompting matters

Multimodal prompting matters because much of real work is not purely text.

Professionals work with reports, slides, charts, emails, spreadsheets, dashboards, images, scanned documents, screenshots, forms, diagrams, and videos. Students work with textbooks, notes, diagrams, question papers, handwritten material, and lecture slides. Teachers work with lesson plans, worksheets, images, rubrics, presentations, and student submissions. Business users work with invoices, customer feedback, sales data, product images, marketing creatives, contracts, proposals, and performance reports.

Multimodal AI can help make these materials easier to understand, analyse, transform, and reuse.

For example, you can use it to:

convert a long report into an executive summary,
explain a complex diagram in simple language,
identify confusing parts of a slide,
extract action items from meeting notes,
turn a chart into key insights,
compare two versions of a design,
identify patterns in spreadsheet data,
create captions for images,
generate quiz questions from a document,
or convert rough notes into a structured article.

This is powerful because it reduces the gap between information and action.

However, multimodal prompting requires care. AI may misread a chart, miss small text, misunderstand a visual, or make assumptions about a document. You should verify important outputs.

3. The basic multimodal prompt formula

A useful formula for multimodal prompting is:

Input + task + context + audience + output format + constraints

Let us understand each part.

1. Input

This is the file, image, document, table, screenshot, or data you provide.

Example:

I have uploaded a PDF report.

Or:

Here is a screenshot of a website page.

Or:

This spreadsheet contains monthly sales data.

2. Task

This is what you want the AI to do.

Examples:

summarise,
explain,
extract,
compare,
classify,
review,
analyse,
improve,
convert,
or create.

3. Context

This explains why the task matters.

Example:

I need this for a board presentation.

Or:

I am using this to teach class 9 students.

Or:

I want to understand whether the campaign is working.

4. Audience

This tells AI who the output is for.

Examples:

students,
senior managers,
customers,
teachers,
founders,
policymakers,
investors,
or beginners.

5. Output format

This tells AI how to present the answer.

Examples:

bullet points,
table,
executive summary,
checklist,
slide outline,
report,
email,
FAQ,
or action plan.

6. Constraints

These are rules and limits.

Examples:

keep it under 500 words,
use simple language,
do not invent missing information,
mention uncertainty,
quote only from the document,
separate facts from assumptions,
or identify what needs verification.

Master formula

I have uploaded [input]. Act as a [role]. Your task is to [task]. The context is [context]. The audience is [audience]. Present the output as [format]. Follow these constraints: [constraints].

Example:

I have uploaded a PDF report. Act as a business analyst. Your task is to summarise the report for senior managers. Focus on key findings, risks, decisions required, and next steps. Present the output as an executive summary under 500 words. Do not invent facts. Mention anything that needs verification.

This formula works for most multimodal tasks.

4. Prompting with documents and PDFs

Documents and PDFs are among the most common multimodal inputs.

You can ask AI to summarise, explain, extract, compare, reorganise, or review documents.

Basic document summary prompt

Summarise this document for [audience]. Include key points, important details, risks, decisions needed, and recommended next steps. Keep it under [word count] words.

Example:

Summarise this PDF for senior business leaders. Include key findings, business implications, risks, and recommended next steps. Keep it under 600 words.

Document extraction prompt

Extract the following from this document: main topics, important dates, names, numbers, commitments, risks, action items, and unanswered questions. Present the answer in a table.

Beginner explanation prompt

Explain this document to a beginner. Use simple language, short sections, and examples. Avoid jargon. Mention any parts that are unclear or need expert review.

Policy or legal document prompt

Review this document and identify key obligations, deadlines, responsibilities, risks, and clauses that need expert review. Do not give legal advice. Present the answer as a checklist.

Research paper prompt

Summarise this research paper. Include research question, method, sample, key findings, limitations, and relevance. Do not invent information not present in the paper.

Important caution

Some PDFs are scanned images, not selectable text. Some may contain tables, charts, footnotes, or small print that AI may miss or misread. For important documents, always check the original.

Use prompts such as:

Mention any sections, tables, or figures you could not read clearly.

Or:

Identify which points should be checked manually in the original document.

5. Prompting with images

AI can analyse images in many useful ways.

You can ask it to describe an image, identify objects, interpret a scene, review a design, create captions, suggest improvements, or explain visual content.

Image description prompt

Describe this image in clear and simple language. Mention the main objects, setting, people, actions, mood, and any visible text.

Educational image prompt

Explain this image to students of [grade or level]. Use simple language and highlight the main learning points.

Design review prompt

Review this image as a visual design. Comment on clarity, layout, message, colour use, readability, audience fit, and possible improvements.

Social media caption prompt

Create five caption options for this image for [platform]. The audience is [audience]. Use a [tone] tone. Keep each caption under [word count] words.

Accessibility prompt

Write alt text for this image. Keep it accurate, concise, and useful for someone using a screen reader.

Image comparison prompt

Compare these two images. Identify similarities, differences, strengths, weaknesses, and which one is more suitable for [purpose].

Important caution

AI may misidentify people, places, objects, brands, or small details. It may also infer emotions or intentions from appearance. Be careful with sensitive interpretations.

Avoid asking AI to make unsupported claims about a person’s identity, health, character, emotions, or intentions from an image.

A safer prompt is:

Describe only what is visibly present in the image. Do not infer identity, emotions, or intentions unless clearly shown.

6. Prompting with screenshots

Screenshots are useful because they capture real interfaces, web pages, dashboards, errors, forms, social media posts, or app screens.

AI can help review screenshots for clarity, usability, errors, missing information, or design improvement.

Website screenshot prompt

Review this website screenshot. Comment on clarity, first impression, message, layout, readability, call to action, trust signals, and improvements. The target audience is [audience].

App screen prompt

Analyse this app screen for user experience. Identify confusing elements, missing labels, readability issues, possible user friction, and improvement suggestions.

Error screenshot prompt

Look at this error screenshot. Explain what the error likely means, possible causes, and safe next steps. Do not assume anything not visible.

Dashboard screenshot prompt

Analyse this dashboard screenshot. Identify key metrics, visible trends, possible concerns, and questions I should ask. Mention anything that is unclear or difficult to read.

Presentation screenshot prompt

Review this slide screenshot. Suggest improvements for clarity, visual hierarchy, text length, layout, and audience impact.

Screenshots are very useful, but they can be incomplete. The AI sees only what is visible in the screenshot. It may not know what happened before or after.

Always provide context.

Instead of:

What is wrong here?

Use:

This is a screenshot from my course website checkout page. I want to know whether the page is clear for first-time buyers. Review the layout, instructions, trust signals, and possible confusion points.

LLM KNOWLEDGE CENTRE PROMPTING KNOWLEDGE CENTRE AI KNOWLEDGE CENTRE AI CAREER CENTRE

NATURAL INTELLIGENCE UPDATES TOOLS MODELS AI TURBOCHARGER - NEW COHORT - ENROL NOW

7. Prompting with spreadsheets and tables

Spreadsheets contain structured data. AI can help summarise, analyse, clean, explain, and interpret data.

Depending on the tool, you may upload a spreadsheet or paste a table.

Data summary prompt

Analyse this spreadsheet. Summarise the key trends, unusual values, missing data, possible errors, and important questions. Present the answer in bullet points.

Sales data prompt

This table shows monthly sales data. Identify trends, best-performing months, weak months, possible seasonality, and questions for further analysis. Present insights in a table.

Data cleaning prompt

Review this table for data quality issues. Identify missing values, duplicates, inconsistent labels, unusual entries, and formatting problems. Suggest cleaning steps.

Business insight prompt

Analyse this data for business insights. Focus on revenue, customer segments, growth patterns, risks, and possible actions. Clearly separate observations from recommendations.

Chart suggestion prompt

Based on this table, suggest the best charts to visualise the data. For each chart, explain what insight it would show and why it is suitable.

Important caution

AI can help identify patterns, but it may make calculation mistakes if the data is complex or incomplete. For financial, scientific, legal, operational, or high-stakes analysis, verify calculations using proper tools.

Use prompts such as:

Show the calculation steps for any numerical conclusion.

Or:

Mention which insights are directly supported by the data and which are assumptions.

8. Prompting with charts and graphs

Charts are common in reports, dashboards, research papers, and presentations.

AI can help explain charts, identify trends, write interpretations, and turn visuals into summaries.

Chart explanation prompt

Explain this chart in simple language. Identify the title, axes, categories, visible trend, highest and lowest values, and main takeaway.

Executive insight prompt

Interpret this chart for senior managers. Focus on what changed, why it may matter, risks, and decisions that may be needed.

Student explanation prompt

Explain this graph to students. Describe what the graph shows, how to read it, and what conclusions can be drawn carefully.

Chart critique prompt

Review this chart for clarity. Comment on title, labels, scale, colours, readability, missing context, and whether the chart could be misleading.

Caution prompt

Analyse this chart, but do not overclaim. Clearly separate what is visible in the chart from what would require additional data.

This last instruction is important.

A chart may show correlation but not causation. It may show a trend but not explain why the trend happened. It may show limited data but not represent the full population.

AI should be prompted to avoid overinterpretation.

9. Prompting with slide decks

Slide decks are common in business, education, consulting, training, and policy work.

AI can help review, summarise, restructure, and improve slides.

Slide summary prompt

Summarise this slide deck for [audience]. Include main message, key points, important data, recommendations, and questions for the presenter.

Slide improvement prompt

Review this slide deck for clarity, flow, structure, visual consistency, text overload, and audience fit. Suggest improvements slide by slide.

Presentation outline prompt

Convert this document into a [number]-slide presentation outline. For each slide, include title, key message, bullet points, suggested visual, and speaker notes.

Speaker notes prompt

Create speaker notes for this slide deck. Use a clear and conversational tone. Keep each slide’s notes under [word count] words.

Executive deck prompt

Improve this deck for a senior leadership audience. Focus on concise messaging, decision points, risks, business implications, and next steps.

Teaching deck prompt

Convert this content into a teaching presentation. Include learning objectives, explanations, examples, activities, discussion questions, and recap slides.

When prompting with slides, mention the audience and purpose. A training deck, investor deck, board deck, and classroom deck need very different styles.

LLM KNOWLEDGE CENTRE PROMPTING KNOWLEDGE CENTRE AI KNOWLEDGE CENTRE AI CAREER CENTRE

NATURAL INTELLIGENCE UPDATES TOOLS MODELS AI TURBOCHARGER - NEW COHORT - ENROL NOW

10. Prompting with audio and video material

Some AI tools can work directly with audio or video. Others work with transcripts.

You can use AI to summarise lectures, meetings, interviews, webinars, podcasts, training videos, or recorded discussions.

Audio transcript summary prompt

Summarise this transcript. Include main topics, key points, decisions, action items, questions raised, and follow-up tasks.

Interview analysis prompt

Analyse this interview transcript. Identify themes, quotes worth reviewing, participant concerns, motivations, pain points, and research insights. Do not invent anything not in the transcript.

Lecture summary prompt

Summarise this lecture for students. Include key concepts, definitions, examples, and five revision questions.

Podcast repurposing prompt

Turn this podcast transcript into a blog outline, five social media posts, a newsletter summary, and 10 key takeaways.

Video review prompt

Review this video transcript for clarity, structure, audience engagement, repetition, and missing points. Suggest improvements for the next recording.

Important caution

Transcripts may contain errors. Speaker names may be wrong. Context may be missing. If tone, body language, or visuals matter, a transcript alone may not be enough.

Use prompts such as:

Identify points that may require checking against the original recording.

Or:

Mention where the transcript appears unclear or incomplete.

11. Prompting AI to generate images

Multimodal prompting is not only about analysing files. It can also involve creating images from text.

AI image generation can help create:

illustrations,
concept art,
educational visuals,
social media graphics,
website images,
story scenes,
diagrams,
icons,
posters,
and visual metaphors.

A weak image prompt is:

Create an image about AI.

A better prompt is:

Create a minimalist, bright, watercolor-style illustration showing a teacher and students using AI as a learning assistant in a modern classroom. Use lots of white space, soft colours, and a hopeful mood. The image should feel educational, human-centred, and non-technical.

This prompt gives subject, style, mood, composition, and purpose.

Image generation prompt formula

Create an image of [subject] in [style]. The scene should include [details]. The mood should be [mood]. The composition should be [composition]. The image is for [purpose or audience]. Avoid [things to avoid].

Example:

Create an image of a professional using AI to organise ideas on a digital board. Use a clean, minimalist illustration style with soft colours and white space. The mood should be calm, productive, and thoughtful. The image is for an article on prompting at work. Avoid robots, dark backgrounds, and overly futuristic elements.

Important caution

When generating images, be careful with copyrighted characters, real people, logos, misleading visuals, stereotypes, and sensitive topics.

For educational and business use, prefer original, respectful, clear, and purpose-driven visuals.

12. Prompting with diagrams and flowcharts

Diagrams and flowcharts are useful for explaining systems, processes, workflows, and relationships.

AI can help interpret existing diagrams or create new diagram descriptions.

Diagram explanation prompt

Explain this diagram in simple language. Describe each part, how the parts are connected, and what the overall process means.

Flowchart review prompt

Review this flowchart for clarity, logical sequence, missing steps, confusing labels, and improvement suggestions.

Create flowchart prompt

Create a flowchart for [process]. Include the main steps, decision points, inputs, outputs, and possible exceptions. Present it as a numbered flow that can be converted into a visual diagram.

Example:

Create a flowchart for handling customer complaints in an online course business. Include complaint receipt, classification, response, escalation, resolution, follow-up, and learning from complaints.

Concept map prompt

Create a concept map for [topic]. Include main concepts, sub-concepts, relationships, examples, and possible learning sequence.

AI can help design the structure of diagrams even if the final visual is created in another tool.

13. Prompting across multiple files

Sometimes you may need AI to compare or combine information from multiple files.

For example:

compare two reports,
review old and new policy documents,
combine notes from several meetings,
compare two versions of a proposal,
analyse multiple customer feedback files,
or summarise a folder of learning material.

Multi-document comparison prompt

Compare these two documents. Identify similarities, differences, new additions, removed points, contradictions, risks, and questions that need clarification. Present the answer in a table.

Version comparison prompt

Compare version 1 and version 2 of this document. Identify what changed, whether the changes improve clarity, and what issues remain.

Multi-file synthesis prompt

Synthesize the uploaded files into one summary for [audience]. Identify recurring themes, differences, key insights, risks, and recommended next steps.

Meeting notes synthesis prompt

Combine these meeting notes into a single project summary. Include decisions, action items, owners, deadlines, risks, repeated issues, and unresolved questions.

When working with multiple files, clearly name or describe each file.

For example:

Treat document A as the old policy and document B as the revised policy.

This reduces confusion.

14. Common mistakes in multimodal prompting

Multimodal prompting is powerful, but users often make mistakes.

Mistake 1: Uploading a file without clear instructions

Weak prompt:

See this.

Better prompt:

Review this document for clarity, risks, missing information, and next steps. Present the answer as a table.

Mistake 2: Not defining the audience

Weak prompt:

Summarise this report.

Better prompt:

Summarise this report for senior managers who need decisions, risks, and next steps.

Mistake 3: Asking AI to infer too much

Weak prompt:

What is the problem here?

Better prompt:

This is a screenshot of my checkout page. Review it for possible user confusion, missing trust signals, and unclear instructions.

Mistake 4: Ignoring unreadable details

AI may miss small text, unclear scans, or complex tables.

Better prompt:

Mention any parts of the image or document that are unclear, unreadable, or require manual checking.

Mistake 5: Treating visual interpretation as certainty

AI may misread charts, images, or screenshots.

Better prompt:

Separate what is directly visible from what you are inferring.

Mistake 6: Sharing sensitive files carelessly

Users may upload private contracts, customer data, student records, employee files, or financial documents without thinking.

Better approach:

Remove or anonymise sensitive data before uploading. Ask AI to work with generalised or masked information where possible.

15. Privacy and safety in multimodal prompting

Multimodal prompting often involves files, images, and data. This can create privacy risks.

Before uploading anything, ask:

Does this file contain personal information?
Does it include names, phone numbers, addresses, emails, or IDs?
Does it contain customer, student, employee, patient, or client data?
Is it confidential business information?
Is it legally sensitive?
Does my organisation allow uploading this type of file?
Can I remove sensitive details first?

Be especially careful with:

contracts,
legal notices,
medical records,
financial statements,
student records,
HR files,
customer lists,
private emails,
unpublished strategy documents,
passwords,
identification documents,
and internal company data.

Safer prompting approach

Instead of uploading a real customer complaint with personal details, anonymise it:

A customer complained that their course access did not work after payment. The customer is upset and wants urgent help. Write a polite support response asking for order details through the official support channel and promising to investigate.

You do not need to expose personal information to get useful help.

Responsible multimodal prompting includes protecting privacy.

16. Verification in multimodal prompting

AI may make mistakes when interpreting files, images, charts, and data.

It may:

miss small text,
misread numbers,
misunderstand a chart,
overlook footnotes,
confuse labels,
make calculation errors,
infer too much from an image,
or summarise a document too broadly.

For important tasks, use verification prompts.

Verification prompts

Identify any parts of the file that are unclear or may need manual checking.

Separate what is directly stated in the document from your interpretation.

List all numbers, dates, and names that should be verified.

Do not invent missing information. If something is not visible, say so.

Mention the confidence level of your interpretation.

Suggest what I should check in the original file before using this output.

High-stakes caution

Be especially careful with multimodal AI outputs in:

legal work,
medical information,
finance,
contracts,
academic research,
public policy,
safety-related decisions,
engineering,
and compliance.

AI can assist, but humans must verify.

17. A master prompt for multimodal analysis

Here is a reusable master prompt:

I have uploaded [type of file or image]. Act as a [role].

My goal is [goal].
The audience is [audience].
The context is [context].

Please analyse the file and provide:

a short summary,
key details,
important numbers, names, dates, or visual elements,
insights or interpretation,
risks or concerns,
recommended next steps,
unclear parts or missing information,
and items that need verification.

Present the answer as [format]. Do not invent information. Clearly separate what is visible or stated from what you are inferring.

Example:

I have uploaded a sales dashboard screenshot. Act as a business analyst.

My goal is to understand sales performance for the last quarter.
The audience is the sales leadership team.
The context is that we need to identify growth areas and weak segments.

Please analyse the screenshot and provide a short summary, key metrics, visible trends, risks or concerns, recommended next steps, unclear parts, and items that need verification.

Present the answer as a table. Do not invent information. Clearly separate what is visible from what you are inferring.

This prompt is useful because it combines context, analysis, caution, and verification.

18. A checklist for multimodal prompting

Before sending a multimodal prompt, ask:

What type of file or image am I giving?
What exactly do I want AI to do with it?
Who is the output for?
What context does AI need?
What format do I want?
What should AI avoid?
Are there sensitive details I should remove?
Could the file contain unclear text, charts, or numbers?
Do I need AI to separate facts from assumptions?
What must I verify manually?

This checklist helps you avoid vague, risky, or careless multimodal prompting.

Conclusion: The future of prompting is multimodal

Prompting is no longer only about typing text into a box.

AI can now work with documents, images, charts, screenshots, slides, spreadsheets, audio, video, and data. This makes AI far more useful for real-world work and learning.

But multimodal AI still needs clear human direction.

A weak multimodal prompt says: Summarise this.

A better multimodal prompt says: Summarise this report for senior managers. Focus on key findings, risks, decisions required, and next steps. Do not invent information. Mention anything that needs verification.

The difference is clarity. The file gives AI information but the prompt gives AI purpose. Good multimodal prompting requires you to define the input, task, context, audience, format, and constraints. It also requires caution, especially when dealing with private data, complex charts, financial information, legal documents, screenshots, and images of people.

AI can help you understand and transform many kinds of information. It can summarise documents, explain charts, review slides, analyse screenshots, interpret images, and generate visuals.

But it should not replace human judgment. Use multimodal AI to see more clearly, organise faster, and communicate better. Then verify what matters.

That is the responsible way to prompt in a world where AI can read, see, analyse, and create across many forms of information.

LLM KNOWLEDGE CENTRE PROMPTING KNOWLEDGE CENTRE AI KNOWLEDGE CENTRE AI CAREER CENTRE

NATURAL INTELLIGENCE UPDATES TOOLS MODELS AI TURBOCHARGER - NEW COHORT - ENROL NOW

Insights - Billion Hopes

Header$type=social_icons

Multimodal prompting