Multimodal prompting
Multimodal prompting: How to prompt AI with text, images, documents, and data
Prompting started with text.
People typed questions, instructions, and requests into AI
tools and received written answers. They asked AI to write emails, explain
concepts, summarise articles, create ideas, and improve documents.
But AI is no longer limited to text.
Many modern AI tools can now work with multiple types of input, including text, images, PDFs, screenshots, spreadsheets, charts, audio, video, and data files. This is called multimodal AI. The word “multimodal” simply means that the AI can understand or generate more than one type of content.
For example, you can ask AI to:
- summarise
a PDF,
- explain
a chart,
- analyse
a screenshot,
- describe
an image,
- extract
key points from a document,
- review
a spreadsheet,
- create
an image from a text description,
- suggest
improvements to a slide,
- compare
two visuals,
- or
turn a table into a written explanation.
This changes how prompting works.
A simple text prompt may ask: Explain online learning.
A multimodal prompt may ask: Analyse this uploaded report on online learning. Summarise the key findings, identify risks, extract important statistics, and create five recommendations for school leaders.
The second prompt uses a file as context. The AI is not only
responding to your words. It is also using the uploaded material.
This article explains how to use multimodal prompting
effectively. It covers prompting with documents, images, screenshots,
spreadsheets, charts, slides, audio, video, and AI-generated visuals. It also
explains common mistakes, privacy concerns, and best practices.
1. What is multimodal prompting?
Multimodal prompting means giving AI more than one kind of
input.
Instead of using only text, you may provide:
- a
written instruction,
- a PDF,
- a
photo,
- a
chart,
- a
table,
- a
spreadsheet,
- a
screenshot,
- a
slide deck,
- an
image,
- an
audio transcript,
- or a
video summary.
The AI then uses your prompt and the uploaded material
together.
For example: Look at this image and describe what is happening.
Or: Read this PDF and summarise it for a beginner.
Or: Analyse this spreadsheet and identify trends, outliers, and possible errors.
Or: Review this slide and suggest how to make it clearer for a business audience.
In each case, the prompt tells the AI what to do with the
material.
The uploaded file gives the AI the content. The quality of the result depends on both.
If you upload a document and only say: Summarise this.
You may get a basic summary.
But if you say: Summarise this document for senior managers. Focus on decisions required, risks, financial implications, and next steps. Keep the summary under 500 words.
You will get a more useful result.
That is the heart of multimodal prompting: the file provides
information, but the prompt provides direction.
2. Why multimodal prompting matters
Multimodal prompting matters because much of real work is
not purely text.
Professionals work with reports, slides, charts, emails, spreadsheets, dashboards, images, scanned documents, screenshots, forms, diagrams, and videos. Students work with textbooks, notes, diagrams, question papers, handwritten material, and lecture slides. Teachers work with lesson plans, worksheets, images, rubrics, presentations, and student submissions. Business users work with invoices, customer feedback, sales data, product images, marketing creatives, contracts, proposals, and performance reports.
Multimodal AI can help make these materials easier to
understand, analyse, transform, and reuse.
For example, you can use it to:
- convert
a long report into an executive summary,
- explain
a complex diagram in simple language,
- identify
confusing parts of a slide,
- extract
action items from meeting notes,
- turn
a chart into key insights,
- compare
two versions of a design,
- identify
patterns in spreadsheet data,
- create
captions for images,
- generate
quiz questions from a document,
- or
convert rough notes into a structured article.
This is powerful because it reduces the gap between
information and action.
However, multimodal prompting requires care. AI may misread
a chart, miss small text, misunderstand a visual, or make assumptions about a
document. You should verify important outputs.
3. The basic multimodal prompt formula
A useful formula for multimodal prompting is:
Input + task + context + audience + output format +
constraints
Let us understand each part.
1. Input
This is the file, image, document, table, screenshot, or
data you provide.
Example:
I have uploaded a PDF report.
Or:
Here is a screenshot of a website page.
Or:
This spreadsheet contains monthly sales data.
2. Task
This is what you want the AI to do.
Examples:
- summarise,
- explain,
- extract,
- compare,
- classify,
- review,
- analyse,
- improve,
- convert,
- or
create.
3. Context
This explains why the task matters.
Example:
I need this for a board presentation.
Or:
I am using this to teach class 9 students.
Or:
I want to understand whether the campaign is working.
4. Audience
This tells AI who the output is for.
Examples:
- students,
- senior
managers,
- customers,
- teachers,
- founders,
- policymakers,
- investors,
- or
beginners.
5. Output format
This tells AI how to present the answer.
Examples:
- bullet
points,
- table,
- executive
summary,
- checklist,
- slide
outline,
- report,
- email,
- FAQ,
- or
action plan.
6. Constraints
These are rules and limits.
Examples:
- keep
it under 500 words,
- use
simple language,
- do
not invent missing information,
- mention
uncertainty,
- quote
only from the document,
- separate
facts from assumptions,
- or
identify what needs verification.
Master formula
I have uploaded [input]. Act as a [role]. Your task is to
[task]. The context is [context]. The audience is [audience]. Present the
output as [format]. Follow these constraints: [constraints].
Example:
I have uploaded a PDF report. Act as a business analyst.
Your task is to summarise the report for senior managers. Focus on key
findings, risks, decisions required, and next steps. Present the output as an
executive summary under 500 words. Do not invent facts. Mention anything that
needs verification.
This formula works for most multimodal tasks.
4. Prompting with documents and PDFs
Documents and PDFs are among the most common multimodal
inputs.
You can ask AI to summarise, explain, extract, compare,
reorganise, or review documents.
Basic document summary prompt
Summarise this document for [audience]. Include key points,
important details, risks, decisions needed, and recommended next steps. Keep it
under [word count] words.
Example:
Summarise this PDF for senior business leaders. Include key
findings, business implications, risks, and recommended next steps. Keep it
under 600 words.
Document extraction prompt
Extract the following from this document: main topics,
important dates, names, numbers, commitments, risks, action items, and
unanswered questions. Present the answer in a table.
Beginner explanation prompt
Explain this document to a beginner. Use simple language,
short sections, and examples. Avoid jargon. Mention any parts that are unclear
or need expert review.
Policy or legal document prompt
Review this document and identify key obligations,
deadlines, responsibilities, risks, and clauses that need expert review. Do not
give legal advice. Present the answer as a checklist.
Research paper prompt
Summarise this research paper. Include research question,
method, sample, key findings, limitations, and relevance. Do not invent
information not present in the paper.
Important caution
Some PDFs are scanned images, not selectable text. Some may
contain tables, charts, footnotes, or small print that AI may miss or misread.
For important documents, always check the original.
Use prompts such as:
Mention any sections, tables, or figures you could not read
clearly.
Or:
Identify which points should be checked manually in the
original document.
5. Prompting with images
AI can analyse images in many useful ways.
You can ask it to describe an image, identify objects,
interpret a scene, review a design, create captions, suggest improvements, or
explain visual content.
Image description prompt
Describe this image in clear and simple language. Mention
the main objects, setting, people, actions, mood, and any visible text.
Educational image prompt
Explain this image to students of [grade or level]. Use
simple language and highlight the main learning points.
Design review prompt
Review this image as a visual design. Comment on clarity,
layout, message, colour use, readability, audience fit, and possible
improvements.
Social media caption prompt
Create five caption options for this image for [platform].
The audience is [audience]. Use a [tone] tone. Keep each caption under [word
count] words.
Accessibility prompt
Write alt text for this image. Keep it accurate, concise,
and useful for someone using a screen reader.
Image comparison prompt
Compare these two images. Identify similarities,
differences, strengths, weaknesses, and which one is more suitable for
[purpose].
Important caution
AI may misidentify people, places, objects, brands, or small
details. It may also infer emotions or intentions from appearance. Be careful
with sensitive interpretations.
Avoid asking AI to make unsupported claims about a person’s
identity, health, character, emotions, or intentions from an image.
A safer prompt is:
Describe only what is visibly present in the image. Do not
infer identity, emotions, or intentions unless clearly shown.
6. Prompting with screenshots
Screenshots are useful because they capture real interfaces,
web pages, dashboards, errors, forms, social media posts, or app screens.
AI can help review screenshots for clarity, usability,
errors, missing information, or design improvement.
Website screenshot prompt
Review this website screenshot. Comment on clarity, first
impression, message, layout, readability, call to action, trust signals, and
improvements. The target audience is [audience].
App screen prompt
Analyse this app screen for user experience. Identify
confusing elements, missing labels, readability issues, possible user friction,
and improvement suggestions.
Error screenshot prompt
Look at this error screenshot. Explain what the error likely
means, possible causes, and safe next steps. Do not assume anything not
visible.
Dashboard screenshot prompt
Analyse this dashboard screenshot. Identify key metrics,
visible trends, possible concerns, and questions I should ask. Mention anything
that is unclear or difficult to read.
Presentation screenshot prompt
Review this slide screenshot. Suggest improvements for
clarity, visual hierarchy, text length, layout, and audience impact.
Screenshots are very useful, but they can be incomplete. The
AI sees only what is visible in the screenshot. It may not know what happened
before or after.
Always provide context.
Instead of:
What is wrong here?
Use:
This is a screenshot from my course website checkout page. I
want to know whether the page is clear for first-time buyers. Review the
layout, instructions, trust signals, and possible confusion points.
7. Prompting with spreadsheets and tables
Spreadsheets contain structured data. AI can help summarise,
analyse, clean, explain, and interpret data.
Depending on the tool, you may upload a spreadsheet or paste
a table.
Data summary prompt
Analyse this spreadsheet. Summarise the key trends, unusual
values, missing data, possible errors, and important questions. Present the
answer in bullet points.
Sales data prompt
This table shows monthly sales data. Identify trends,
best-performing months, weak months, possible seasonality, and questions for
further analysis. Present insights in a table.
Data cleaning prompt
Review this table for data quality issues. Identify missing
values, duplicates, inconsistent labels, unusual entries, and formatting
problems. Suggest cleaning steps.
Business insight prompt
Analyse this data for business insights. Focus on revenue,
customer segments, growth patterns, risks, and possible actions. Clearly
separate observations from recommendations.
Chart suggestion prompt
Based on this table, suggest the best charts to visualise
the data. For each chart, explain what insight it would show and why it is
suitable.
Important caution
AI can help identify patterns, but it may make calculation
mistakes if the data is complex or incomplete. For financial, scientific,
legal, operational, or high-stakes analysis, verify calculations using proper
tools.
Use prompts such as:
Show the calculation steps for any numerical conclusion.
Or:
Mention which insights are directly supported by the data
and which are assumptions.
8. Prompting with charts and graphs
Charts are common in reports, dashboards, research papers,
and presentations.
AI can help explain charts, identify trends, write
interpretations, and turn visuals into summaries.
Chart explanation prompt
Explain this chart in simple language. Identify the title,
axes, categories, visible trend, highest and lowest values, and main takeaway.
Executive insight prompt
Interpret this chart for senior managers. Focus on what
changed, why it may matter, risks, and decisions that may be needed.
Student explanation prompt
Explain this graph to students. Describe what the graph
shows, how to read it, and what conclusions can be drawn carefully.
Chart critique prompt
Review this chart for clarity. Comment on title, labels,
scale, colours, readability, missing context, and whether the chart could be
misleading.
Caution prompt
Analyse this chart, but do not overclaim. Clearly separate
what is visible in the chart from what would require additional data.
This last instruction is important.
A chart may show correlation but not causation. It may show
a trend but not explain why the trend happened. It may show limited data but
not represent the full population.
AI should be prompted to avoid overinterpretation.
9. Prompting with slide decks
Slide decks are common in business, education, consulting,
training, and policy work.
AI can help review, summarise, restructure, and improve
slides.
Slide summary prompt
Summarise this slide deck for [audience]. Include main
message, key points, important data, recommendations, and questions for the
presenter.
Slide improvement prompt
Review this slide deck for clarity, flow, structure, visual
consistency, text overload, and audience fit. Suggest improvements slide by
slide.
Presentation outline prompt
Convert this document into a [number]-slide presentation
outline. For each slide, include title, key message, bullet points, suggested
visual, and speaker notes.
Speaker notes prompt
Create speaker notes for this slide deck. Use a clear and
conversational tone. Keep each slide’s notes under [word count] words.
Executive deck prompt
Improve this deck for a senior leadership audience. Focus on
concise messaging, decision points, risks, business implications, and next
steps.
Teaching deck prompt
Convert this content into a teaching presentation. Include
learning objectives, explanations, examples, activities, discussion questions,
and recap slides.
When prompting with slides, mention the audience and
purpose. A training deck, investor deck, board deck, and classroom deck need
very different styles.
10. Prompting with audio and video material
Some AI tools can work directly with audio or video. Others
work with transcripts.
You can use AI to summarise lectures, meetings, interviews,
webinars, podcasts, training videos, or recorded discussions.
Audio transcript summary prompt
Summarise this transcript. Include main topics, key points,
decisions, action items, questions raised, and follow-up tasks.
Interview analysis prompt
Analyse this interview transcript. Identify themes, quotes
worth reviewing, participant concerns, motivations, pain points, and research
insights. Do not invent anything not in the transcript.
Lecture summary prompt
Summarise this lecture for students. Include key concepts,
definitions, examples, and five revision questions.
Podcast repurposing prompt
Turn this podcast transcript into a blog outline, five
social media posts, a newsletter summary, and 10 key takeaways.
Video review prompt
Review this video transcript for clarity, structure,
audience engagement, repetition, and missing points. Suggest improvements for
the next recording.
Important caution
Transcripts may contain errors. Speaker names may be wrong.
Context may be missing. If tone, body language, or visuals matter, a transcript
alone may not be enough.
Use prompts such as:
Identify points that may require checking against the
original recording.
Or:
Mention where the transcript appears unclear or incomplete.
11. Prompting AI to generate images
Multimodal prompting is not only about analysing files. It
can also involve creating images from text.
AI image generation can help create:
- illustrations,
- concept
art,
- educational
visuals,
- social
media graphics,
- website
images,
- story
scenes,
- diagrams,
- icons,
- posters,
- and
visual metaphors.
A weak image prompt is:
Create an image about AI.
A better prompt is:
Create a minimalist, bright, watercolor-style illustration
showing a teacher and students using AI as a learning assistant in a modern
classroom. Use lots of white space, soft colours, and a hopeful mood. The image
should feel educational, human-centred, and non-technical.
This prompt gives subject, style, mood, composition, and
purpose.
Image generation prompt formula
Create an image of [subject] in [style]. The scene should
include [details]. The mood should be [mood]. The composition should be
[composition]. The image is for [purpose or audience]. Avoid [things to avoid].
Example:
Create an image of a professional using AI to organise ideas
on a digital board. Use a clean, minimalist illustration style with soft
colours and white space. The mood should be calm, productive, and thoughtful.
The image is for an article on prompting at work. Avoid robots, dark
backgrounds, and overly futuristic elements.
Important caution
When generating images, be careful with copyrighted
characters, real people, logos, misleading visuals, stereotypes, and sensitive
topics.
For educational and business use, prefer original,
respectful, clear, and purpose-driven visuals.
12. Prompting with diagrams and flowcharts
Diagrams and flowcharts are useful for explaining systems,
processes, workflows, and relationships.
AI can help interpret existing diagrams or create new
diagram descriptions.
Diagram explanation prompt
Explain this diagram in simple language. Describe each part,
how the parts are connected, and what the overall process means.
Flowchart review prompt
Review this flowchart for clarity, logical sequence, missing
steps, confusing labels, and improvement suggestions.
Create flowchart prompt
Create a flowchart for [process]. Include the main steps,
decision points, inputs, outputs, and possible exceptions. Present it as a
numbered flow that can be converted into a visual diagram.
Example:
Create a flowchart for handling customer complaints in an
online course business. Include complaint receipt, classification, response,
escalation, resolution, follow-up, and learning from complaints.
Concept map prompt
Create a concept map for [topic]. Include main concepts,
sub-concepts, relationships, examples, and possible learning sequence.
AI can help design the structure of diagrams even if the
final visual is created in another tool.
13. Prompting across multiple files
Sometimes you may need AI to compare or combine information
from multiple files.
For example:
- compare
two reports,
- review
old and new policy documents,
- combine
notes from several meetings,
- compare
two versions of a proposal,
- analyse
multiple customer feedback files,
- or
summarise a folder of learning material.
Multi-document comparison prompt
Compare these two documents. Identify similarities,
differences, new additions, removed points, contradictions, risks, and
questions that need clarification. Present the answer in a table.
Version comparison prompt
Compare version 1 and version 2 of this document. Identify
what changed, whether the changes improve clarity, and what issues remain.
Multi-file synthesis prompt
Synthesize the uploaded files into one summary for
[audience]. Identify recurring themes, differences, key insights, risks, and
recommended next steps.
Meeting notes synthesis prompt
Combine these meeting notes into a single project summary.
Include decisions, action items, owners, deadlines, risks, repeated issues, and
unresolved questions.
When working with multiple files, clearly name or describe
each file.
For example:
Treat document A as the old policy and document B as the
revised policy.
This reduces confusion.
14. Common mistakes in multimodal prompting
Multimodal prompting is powerful, but users often make
mistakes.
Mistake 1: Uploading a file without clear instructions
Weak prompt:
See this.
Better prompt:
Review this document for clarity, risks, missing
information, and next steps. Present the answer as a table.
Mistake 2: Not defining the audience
Weak prompt:
Summarise this report.
Better prompt:
Summarise this report for senior managers who need
decisions, risks, and next steps.
Mistake 3: Asking AI to infer too much
Weak prompt:
What is the problem here?
Better prompt:
This is a screenshot of my checkout page. Review it for
possible user confusion, missing trust signals, and unclear instructions.
Mistake 4: Ignoring unreadable details
AI may miss small text, unclear scans, or complex tables.
Better prompt:
Mention any parts of the image or document that are unclear,
unreadable, or require manual checking.
Mistake 5: Treating visual interpretation as certainty
AI may misread charts, images, or screenshots.
Better prompt:
Separate what is directly visible from what you are
inferring.
Mistake 6: Sharing sensitive files carelessly
Users may upload private contracts, customer data, student
records, employee files, or financial documents without thinking.
Better approach:
Remove or anonymise sensitive data before uploading. Ask AI
to work with generalised or masked information where possible.
15. Privacy and safety in multimodal prompting
Multimodal prompting often involves files, images, and data.
This can create privacy risks.
Before uploading anything, ask:
- Does
this file contain personal information?
- Does
it include names, phone numbers, addresses, emails, or IDs?
- Does
it contain customer, student, employee, patient, or client data?
- Is it
confidential business information?
- Is it
legally sensitive?
- Does
my organisation allow uploading this type of file?
- Can I
remove sensitive details first?
Be especially careful with:
- contracts,
- legal
notices,
- medical
records,
- financial
statements,
- student
records,
- HR
files,
- customer
lists,
- private
emails,
- unpublished
strategy documents,
- passwords,
- identification
documents,
- and
internal company data.
Safer prompting approach
Instead of uploading a real customer complaint with personal
details, anonymise it:
A customer complained that their course access did not work
after payment. The customer is upset and wants urgent help. Write a polite
support response asking for order details through the official support channel
and promising to investigate.
You do not need to expose personal information to get useful
help.
Responsible multimodal prompting includes protecting
privacy.
16. Verification in multimodal prompting
AI may make mistakes when interpreting files, images,
charts, and data.
It may:
- miss
small text,
- misread
numbers,
- misunderstand
a chart,
- overlook
footnotes,
- confuse
labels,
- make
calculation errors,
- infer
too much from an image,
- or
summarise a document too broadly.
For important tasks, use verification prompts.
Verification prompts
Identify any parts of the file that are unclear or may need
manual checking.
Separate what is directly stated in the document from your
interpretation.
List all numbers, dates, and names that should be verified.
Do not invent missing information. If something is not
visible, say so.
Mention the confidence level of your interpretation.
Suggest what I should check in the original file before
using this output.
High-stakes caution
Be especially careful with multimodal AI outputs in:
- legal
work,
- medical
information,
- finance,
- contracts,
- academic
research,
- public
policy,
- safety-related
decisions,
- engineering,
- and
compliance.
AI can assist, but humans must verify.
17. A master prompt for multimodal analysis
Here is a reusable master prompt:
I have uploaded [type of file or image]. Act as a [role].
My goal is [goal].
The audience is [audience].
The context is [context].
Please analyse the file and provide:
- a
short summary,
- key
details,
- important
numbers, names, dates, or visual elements,
- insights
or interpretation,
- risks
or concerns,
- recommended
next steps,
- unclear
parts or missing information,
- and
items that need verification.
Present the answer as [format]. Do not invent information.
Clearly separate what is visible or stated from what you are inferring.
Example:
I have uploaded a sales dashboard screenshot. Act as a
business analyst.
My goal is to understand sales performance for the last
quarter.
The audience is the sales leadership team.
The context is that we need to identify growth areas and weak segments.
Please analyse the screenshot and provide a short summary,
key metrics, visible trends, risks or concerns, recommended next steps, unclear
parts, and items that need verification.
Present the answer as a table. Do not invent information.
Clearly separate what is visible from what you are inferring.
This prompt is useful because it combines context, analysis,
caution, and verification.
18. A checklist for multimodal prompting
Before sending a multimodal prompt, ask:
- What
type of file or image am I giving?
- What
exactly do I want AI to do with it?
- Who
is the output for?
- What
context does AI need?
- What
format do I want?
- What
should AI avoid?
- Are
there sensitive details I should remove?
- Could
the file contain unclear text, charts, or numbers?
- Do I
need AI to separate facts from assumptions?
- What
must I verify manually?
This checklist helps you avoid vague, risky, or careless
multimodal prompting.
Conclusion: The future of prompting is multimodal
Prompting is no longer only about typing text into a box.
AI can now work with documents, images, charts, screenshots,
slides, spreadsheets, audio, video, and data. This makes AI far more useful for
real-world work and learning.
But multimodal AI still needs clear human direction.
A weak multimodal prompt says: Summarise this.
A better multimodal prompt says: Summarise this report for senior managers. Focus on key findings, risks, decisions required, and next steps. Do not invent information. Mention anything that needs verification.
The difference is clarity. The file gives AI information but the prompt gives AI purpose. Good multimodal prompting requires you to define the input, task, context, audience, format, and constraints. It also requires caution, especially when dealing with private data, complex charts, financial information, legal documents, screenshots, and images of people.
AI can help you understand and transform many kinds of
information. It can summarise documents, explain charts, review slides, analyse
screenshots, interpret images, and generate visuals.
But it should not replace human judgment. Use multimodal AI to see more clearly, organise faster, and communicate better. Then verify what matters.
That is the responsible way to prompt in a world where AI
can read, see, analyse, and create across many forms of information.