State of the State: Generative AI – the Second Installment

a man sitting in front of a laptop computer

Generative AI is undoubtedly the hottest topic right now, and not just within marketing circles; it’s dominating legislative, scientific, and cultural discussions at large. But this space is also in a state of constant evolution, which means that keeping up with all its twists and turns is no small task.

AI POV series, where each month we will distill the state of the state, outlining critical trends, advancements, conversation topics, and related marketing takeaways. This installment unpacks:

The ever-growing list of new generative AI technologies
The transition of AI tools from focus to feature
The problem of transparency in generative AI

Here’s what we know as of May 1, 2023.

New technologies: the arms race continues

Since the last iteration of our generative AI POV, the race to engineer new technologies has only become more competitive. And now, tech giants are redirecting resources to build generative AI tools.

Here’s a quick snapshot of the ever-evolving landscape (including a helpful repository of all the AI tech currently available):

DuckDuckGo announced DuckAssist, a search results feature that summarizes information for users
Amazon unveiled several new tools for developers, including Bedrock, which will allow companies to create their own generative AI tools
Bloomberg announced the development of its own large language model for finance, BloombergGPT
Elon Musk has founded a new AI company called X.AI Corp
Both Google and Meta have teased new generative AI tools for advertisers
Atlassian has announced the integration of generative AI tools into Jira and Confluence
TikTok will reportedly launch AI-generated avatars for users

Key takeaway:

Models may develop different strengths. The companies unveiling new large language models (LLMs) and generative tools are building these systems from their own datasets, including indexed websites, reviews, social posts, user profiles, and hosted content. Eventually, brands looking to leverage a chatbot might be able to select one that was trained on data more relevant to their business.

Generative AI: from focus to expected feature

Over the last month, we’ve observed a change in how generative AI is being reported on and viewed. What was once a buzzworthy novelty is already becoming table-stakes for the largest technology providers. And the expectation of offering AI innovation is already forcing business changes. Google will reportedly integrate an AI chatbot into search after rumors circulated that Samsung would switch its default mobile search engine to Microsoft’s Bing to more easily facilitate chat interactions for device users.

And while the novelty of asking ChatGPT to perform entertaining tasks, such as writing songs about self-love in the style of Taylor Swift, might be wearing off, the possible applications of generative AI for businesses are exploding. To that end: Microsoft has already begun integrating AI into its Microsoft 365 apps. Apple is working on an AI health coach. Grammarly unveiled GrammarlyGO, an assistant for its popular browser extension. And generative AI advertising tools are on the horizon for digital marketers.

Key takeaway:

AI expectations. As we settle into a new world of ubiquitous AI integrations, it will be crucial for marketers and business leaders to regularly assess the tools that are available and consider how they might be leveraged. Utilization of AI to accelerate processes and enhance consumer experiences will be expected in the near future.

Large language model mysteries

While generative AI outputs have impressed users, few of the model developers have disclosed the sources of text data used to train their LLMs. Without some insight into the data being used to train the models, it will be difficult to fully trust in the output of chatbots.

To its credit, Google has made the websites in its dataset available to researchers. A Washington Post analysis, however, showed that the company failed to remove sites dedicated to conspiracy theories, white supremacy, and anti-government rhetoric, which could affect the outputs of chatbots built from this data. The web domains in Google’s dataset were categorized and ranked by the Washington Post in a searchable directory containing millions of sites.

The volume and variety of sites housed within Google’s dataset also raises the question of whether site owners should be allowed to opt out of participation or be compensated for contributing to LLM training. Reddit, for example, announced that it would start charging for API access to its database of human-generated content and conversations.

Key takeaways:

Transparency builds trust. Pressure to disclose datasets should create more transparency in the LLM industry. As additional models and tools are developed, trust in the training and data could be both a differentiator and market advantage.
Pay to train. It is likely that large, text-filled domains like Reddit and Wikipedia will be able to charge for API access, as the information housed within these sites is invaluable to language model developers. What remains unclear, however, is whether website owners will have any say in how their text data are crawled and used.