WordProof is one of the 23 finalists selected out of the 178 applications received for the 5 x €1.000.000 EIC Horizon Prize on ‘Blockchains for Social Good’. Monday, February 10th, was Finalists’ Day. 23 initiatives gave a pitch and demo. Here’s the full transcript of WordProof’s pitch, presenting the impact of our solutions on social good aspects and the underlying economic model.Continue reading “3-minute WordProof Pitch at European Commission’s ‘Blockchains for Social Good’ 🇪🇺”
Powerful public AI tools like ChatGPT are great tools for content creation. But as with any technology, there are limits. We are starting to feel those limitations: from the quality of output to risks, such as copyright. So, what now? Welcome to the fascinating world of private AI! This is artificial intelligence trained only on your own data. We dove deep into this over the past six months and are happy to share our findings.
This article is a translation of this article I wrote for FrankWatching.com on Private AI, a leading Dutch marketing blog.
The challenges of public AI
Artificial intelligence has now established itself as a powerful tool for content creation. But traditional AI also brings challenges:
- Outdated data: most AI systems work with datasets that age quickly. For example, a model trained on data from September 2021 misses more recent developments, which can lead to outdated or irrelevant content.
- Dilution due to lots of bad information: public models like ChatGPT’s are trained based on practically all the information on the Internet. So a lot of training data is of low quality. With the large amount of data on the Internet, and the output that is an average of that, in many cases you end up with inauthentic content.
- Quality concerns: the well-known phrase “rubbish in, rubbish out” is very applicable to artificial intelligence. If an AI system is fed low-quality data, it is likely to produce low-quality output.
- Copyright risks: without clear source attribution or transparency, AI can generate content that infringes copyright. This can expose companies to legal risks.
- Black box: many people consider traditional AI a “black box,” because it is not always clear how decisions are made. This lack of transparency can create trust issues.
The solution: private AI
That’s where private AI comes in. To address many of these issues, you can set up your own private AI trained only on your own data. With real-time training data and training on at least 1,000 articles per topic, you can come up with authentic and accurate content creation.
With open source components, you can arrive at a transparently private AI solution. In doing so, it is even possible to trace from the output which original sources led to the generated piece of content. With much experimentation, we have arrived at private AI pilots that meet the following specifications and functionality:
Real-time training data
In addition to the original training data, the model is re-trained daily with the newly generated content (if desired). As a result, you always generate content based on up-to-date information, which is very relevant for news, for example. Per use case, it is a strategic choice which content you do or do not use for re-training the model. For example, you can choose: we don’t use the AI-generated articles for tutoring, but we do use the largely manually written content.
Instead of overwhelming the AI with all available information, it trains with specific, factual input.
Transparency for sources and SEO on steroids
Users can see exactly which part of which article led to a particular output. With the ability to generate traceable content, a well-designed private AI offers the chance to take your SEO to the next level. Namely, if you know which sources from your own site(s) led to the new content, then you can link from those original articles to the new articles. This makes your private AI an extremely powerful tool for achieving strong, relevant internal links.
Authentic AI is an appropriate solution when it comes to the fine balance between quality and quantity of content creation. It is a valuable tool for content creation, answering your questions based on your own carefully generated archive.
In practice: setting up private AI
Here is an in-depth look at how to set up private AI in practice and the building blocks involved.
Building Block 1. Training dates
- Training with your own archive: by training the AI with your own archive, the output can be tailored specifically to your corporate culture, terminology and style guide. This ensures seamless integration with existing content.
- Training with external information: this involves feeding the AI with data from reliable external sources, such as scientific articles, professional journals or even other sources that are your property. This expands the AI’s knowledge base and allows it to cover a wider range of topics.
- Realtime retraining: this is constantly updating the AI model with new information. While this ensures up-to-date knowledge, it is essential to be careful. Overtraining or training with inaccurate data can lead to “hallucinations” or inaccurate output.
Building Block 2. Content formulas – powerful tools for scalable output
- Customized content formulas: private AI allows you to use customized formulas that can answer specific questions or “prompts” at scale. By using formulas fed from a database, large-scale prompts can be answered efficiently.
- Prompt Query Language: a specially developed language designed to allow the AI to ask precise queries to both private and public AI, and combine multiple prompts to arrive at in-depth long-reads. This can be enormously useful for complex data requests or to generate specific content formats.
Building Block 3. Feeding Prompts
Once you’ve trained your model, and arrived at various content formulas, it’s time to put your model and formulas to work. This can be done manually, article by article. But with smart setup, you can also do this at scale, such as using already existing databases in your website(s). For inspiration:
- Example company website: use company-specific databases to provide relevant and contextual information to the AI.
- Comparing companies: using various data sources, the AI can compare companies on various parameters such as revenue, location, demographics, etc.
- Example recipe site: it can consist of all kinds of data sets, such as demographic information, customer feedback, sales figures, etc.
- Searches as a feed: by entering common or trending searches, the AI can better respond to current needs and queries.
With private AI, the power of advanced content creation and data processing is in your hands. Whether it’s personalized content strategies or in-depth data analysis, with the right building blocks and training, your AI model can work wonders.
Our experience with private AI: reflection on 100 days since GO LIVE
After 100 days of intensive work with private AI, we gained a range of exciting experiences and insights. Our collaborative projects ranged from partnerships with commercial organizations to demos with publishers. Here’s an overview of what we learned:
Pay attention to input quality
The “rubbish in, rubbish out” principle proved true time and again. The quality of the training data largely determines the quality of the outcome.
Avoid intertwining opinions and facts
When we trained a model with 5,000 articles from a newspaper, it generally produced well-researched, objective articles. But one particular paragraph was unexpectedly opinionated and activist in nature. This was because we had included opinion articles in our training. Now we either adjust our training process to exclude such articles or we make sure that specific training data is not used in specific questions.
The challenge of writing prompts
Writing good prompts is an art in itself. Regardless of whether you are working with public or private AI: without significant experience, you will not easily reach a desired outcome. As a guideline, it’s advisable to have more than 100 hours of experience with prompts or work with an expert who can help write effective prompts.
Volume of training data
We have found that you need between 1,000 and 2,500 articles per topic for optimal results. Although we have experimented with archives of up to 100,000 articles, we are confident that we can handle up to 1,000,000 articles with our current approach.
Pay attention to balance between new and old
For example, if you are a newspaper and you want to embrace modern writing styles but at the same time make use of your historical archives, it is essential to keep this balance in mind when designing your private AI.
Provide diversity in content
It is crucial to have multiple content formulas or formats. Repeatedly generating one type of content can lead to monotony.
Working with private AI is something you learn by doing. It has helped us understand how powerful, but also how nuanced AI-based content creation can be. We look forward to further experiments in this area with end users and content agencies.
Private AI as the next step
Private AI, in contrast to public AI, marks a significant leap into a future where content creation is more authentic, nuanced and efficient, while battling many of the risks of public AI. This technology is inherently customizable. Each application is unique, given the variation in training data, formulas and prompts. As a result, it requires a combination of technical expertise and strategic content marketing insights. This, coupled with the fact that there is a lot of trial-and-error involved, means that private AI is still a vast, unexplored area, brimming with opportunity.
The header image was generated with DALL-E 3 by Sebastiaan van der Lans.
In addition, here’s a podcast I recorded with Joost de Valk (founder Yoast SEO) on AI and Search, earlier this year:
Publishers are constantly seeking innovative ways to optimize their content and improve their search engine rankings. Private AI models offer a groundbreaking solution that can revolutionize the publishing industry. By training AI models on their own data, publishers can unlock the power of personalization, streamline content creation, and maximize SEO and structured data opportunities. In this blog, I’ll explore how private AI models can turn a publisher’s content archive into a valuable SEO goldmine
Personalization and Streamlined Content Creation
Private AI models enable publishers to create personalized drafts and content suggestions that align perfectly with their brand identity. By analyzing historical data, these models capture the publisher’s unique tone of voice and historical patterns. This level of personalization streamlines content creation saves time, and ensures consistency and authenticity across all published content.
SEO and Structured Data Optimization
When combined with deeply structured data, private AI models have the potential to transform a publisher’s SEO strategy. By integrating timestamps and provenance technology, publishers can enhance search engine visibility, improve Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T) signals, and drive organic traffic growth. Timestamping content establishes credibility and signals freshness, while provenance technology provides transparent data about the origin and history of content.
Expertise and Collaboration
As experts in the SEO, policy, provenance and content spaces, we are uniquely positioned to drive the success of publishers leveraging private AI models. Our team’s extensive experience in developing tools, coupled with our expertise in timestamps and provenance technology, ensures that publishers receive the best possible guidance and results. We understand the industry’s best practices and search engine guidelines to help publishers maximize their SEO benefits.
We are excited to collaborate with forward-thinking publishers on implementing private AI models to unlock the full potential of their content archives. This transformative solution offers the opportunity to boost SEO, enhance E-A-T signals, and establish publishers as authoritative sources of information. Join us in harnessing the power of private AI models to drive SEO growth and revolutionize the publishing industry.
Private AI models present an exceptional opportunity for publishers to optimize their content archives and boost their SEO performance. By leveraging personalization, streamlining content creation, and integrating timestamps and provenance technology, publishers can turn their archives into valuable SEO goldmines. With our expertise and collaborative approach, we are ready to empower publishers to achieve unprecedented SEO success and elevate their online presence.
Contact us today to explore how private AI models can transform your content archive into an SEO goldmine. Together, we can revolutionize your publishing strategy and drive sustainable growth.
DUBLIN, October 28th, 2020 – I was invited to deliver a keynote on Blockchain & Media on behalf of WordProof. Here you’ll find the slides with links to all resources mentioned as many elements in the slides are clickable.
If you find the presentation helpful or relevant to what you work on, feel free to connect!
- Connect on LinkedIn
- Follow WordProof and me on Twitter
- Get Started with WordProof by watching a demo or right away via WordPress
Furthermore, here’s a recent interview on Media and Blockchain at the Dutch Media Week together with Joost de Valk (founder of Yoast SEO, 11 million sites use their tools), Vincent Everts, and the creative director of a leading Dutch media outlet.
Here’s an article, based on this interview: Will Europe Lead the ‘Trusted Web’ after GDPR?
A truthful internet makes a trustworthy society. That’s what we firmly believe at WordProof, and as winners of Europe’s Blockchains for Social good, I dare to say that our visions align: we need a better internet for all citizens. Europe has a reputation for fighting for a better internet, hence GDPR. This article focuses on what policymakers could do to lead the way towards a ‘Trusted Web’. TLDR; there’s a massive opportunity for policymakers today!
Even though the internet has brought us many good things, it has a deep-rooted issue: trust. On the internet, citizens suffer fraud, manipulation, and theft on a daily basis. Policymakers can play a role in guiding the world towards a trustworthy internet. That’s what Europe tried to do with GDPR, and that’s what organizations like Europe’s NGI – Next Generation Internet – work on every day.Continue reading “Will Europe Lead the ‘Trusted Web’ after GDPR?”
Today, WordProof officially received the € 1 million prize as the winner of Europe’s Blockchain for Social Good competition. “How did WordProof win that € 1 million from the European Commission?” is a question we frequently get, as winning prizes is an interesting alternative (or addition) to searching for investors as it doesn’t cost you equity. I sat down with Frank van Dalen, who was the initiator of WordProof’s submission, to share our lessons learned about winning prizes to finance and grow your start-up!
Frank started as WordProof‘s angel investor and is actively involved in the company. He has a very (!) rich track record in both entrepreneurship as in politics. We took the time to write down our lessons learned, based on Frank’s track record and our collaboration as a team at WordProof over the last 15 months.
Although a prize is not the same as a subsidy or funding application, there are also many similarities. In this blog, 8 insights are shared that enabled WordProof to keep 175 other blockchain companies behind in Europe’s Blockchains for Social Good competition.Continue reading “How to win € 1 million? 8 tips for winning prizes to grow your start-up 🏆”
In the last decade, WordPress’s market share grew from a little over ten percent to over one-third of the web. This makes it the most used Content Management System (CMS) by a large margin. Can WordPress still grow, and where will that growth come from? Is WordPress your best bet as a CMS? Here’s why I firmly believe that WordPress will cross the magic 50% market share mark before the end of this decade while being the best choice for individuals, businesses and enterprises!
At the time of writing (June 2020), WordPress’s market share is at a staggering 37.3%. At the start of 2011, its market share was 13,1%, so it reached an average growth of 2.47% per year.
If WordPress continues at this rate, we’ll end up with over 60% market share on January 1st, 2030 (hitting 50% at the end of 2025). However, as WordPress is already used by so many websites, where will those new users come from and why will WordPress be interesting to them?
“In the coming years, 3 billion new people will be connected to the internet. WordPress is free to use and translated into over 50 languages. As translations are a community effort, they don’t need to make sense from a business perspective.” said WordPress founder Matt Mullenweg in this 2017 interview.
I agree that this will be a logical driver for adoption from a macro perspective. However, I will look at it from an enterprise perspective, presenting you five drivers that will fuel further WordPress adoption this decade:
- WordPress at Scale: Enterprises use WordPress too!
- The WordPress Ecosystem and Its Economics
- Marketing is Changing and WordPress Fits in Perfectly
- Google Loves WordPress
- WordPress and e-Commerce
After exploring these drivers, you might agree that just half of the web at the end of this decade, or even by 2025 is a conservative estimate. Joost de Valk, the founder of Yoast SEO, made a bold guess: “WordPress has reached critical mass in multiple ways and is on its way to 50%, maybe even within the next two years”.Continue reading “Why WordPress Will Empower Half of the Web Soon”
One of the many highlights of WordCamp Europe 2020, the biggest online WordPress event in history was the conversation with Matt Mullenweg on the future of WordPress. Firstly he and Matías Ventura looked at upcoming features for Gutenberg, WordPress’s revolutionary new editor. Some of those will be shipped around August, in WordPress 5.5. After that, they did a 40-minutes Q&A. I cherry-picked some of the many highlights and transcribed those insights for you.
- Gutenberg Demo
- The State of Gutenberg and WordPress 5.5
- How Gutenberg’s License might affect Adoption
- Moving Humanity Forward with Open-Source WordPress
- Is WordPress a Monopoly?
- HTML in Gutenberg
- A Paywall Site in Gutenberg Without Coding
- Are Freelancers and Agencies in Danger as WordPress Gets so User-friendly?
- Images and Image APIs in Gutenberg
- How to Train Users in Using Gutenberg?
- WordPress, e-Commerce, and Shopify
- Pay it forward; Grow WordPress through Radical Generosity
Ever needed to convince someone of choosing WordPress? And Why is WordPress so popular anyway? Recently I wrote an article explaining why WordPress’s market share will soon cross the magical 50% mark. From that article, I extracted five compelling arguments to convince anyone of choosing WordPress as a CMS:
- WordPress is not just a Blog! Enterprises use it too
Is WordPress a tool for blogs and small businesses? Yes, it is! But it’s not just for them. 2,645 of the top 10,000 sites on the web are built with WordPress. There’s a reason why TED, News Corp, Disney, New York Post, and many more are using it. If you think of WordPress as a toy for amateurs which is not suitable for B2B or enterprise, you should probably reconsider.
- Explain the WordPress Ecosystem: its economics, technology, and community
Often, open-source communities and software have the reputation of being slow-evolving, insecure, and not enterprise-ready. WordPress is different. We matured massively on three important aspects: technology, finance, and community.
- Structure Matters and WordPress’s new editor outputs perfectly structured content
In 2016, WordPress started to fully redesign its editor, called Gutenberg. The idea of Gutenberg is that every component will be modular until everything is a block, and blocks are structure! To stay the dominant search engine, Google needs robust structured markup on every single website. Due to its modular nature, Gutenberg is the perfect fundament for WordPress to benefit from the changing marketing dynamics on the internet.
- Google loves WordPress (more than any other CMS)
It’s in Google’s benefit to create a level playing field where every small business can win. Google realized that they can only fix the web if they work on the underlying technologies that power those websites, which is why they are now deeply invested in the WordPress community. By just updating your WordPress plugins, your website also gets better over time. It automatically adopts these new features. This fact alone makes the business case for using WordPress no brainer, albeit only for the content part.
- WordPress wins massive market share in e-Commerce too
With the support for Magento M1 ending, and the acquisition of the Open-source Magento by Adobe, Magento’s market share is decreasing rapidly. This creates space for other solutions to grow and WooCommerce is positioned well to fill this gap.
These are crazy times. Yesterday I stumbled upon this motivational speech by Ryan Serhant which resonated a lot with me. He’s a New York-based real estate broker his first day in this business on the day Lehman Brothers filed for bankruptcy. September 15, 2008. For all entrepreneurs out there; here’s Ryan’s powerful metaphor, on how to swim in the storm like a professional swimmer!
“I know, I am at the office, I’m at work, I’m not supposed to be here. There are executive orders … I am supposed to be at home … quarantine.” says Ryan, “But listen … I’m the captain of a very large ship. People look to me … people depend on me. These are unlike any times out there”.
In this article I transcripted the very powerful metaphor he used in this video. Here you have it.Continue reading “Be Safe, Be Healthy, Stay Happy, Stay Productive”
Stuart Haber, one of the two inventors of the blockchain, tells the story of the technology, explaining how it works and describing applications that decentralize maintenance of the integrity of digital records. What started as a solution for a time-stamping problem developed into something that is already transforming entire industries.
Here’s the Whitepaper he describes in his talk.
The Longest Running Blockchain Started in 1995
Here’s Stuart Haber, showing the backside of the NY Times. On a weekly base, they published the unique fingerprint of the recent state of Surety‘s chain.
About Stuart Haber
Dr. Haber serves as Chief Scientist for Auditchain. As a young cryptographer at Bellcore (Bell Communications Research), in 1990 Dr. Haber co-invented the blockchain technique for ensuring the integrity of digital records. He was cofounder with Scott Stornetta of Surety, which was spun off from Bellcore in 1993. Surety offers digital time-stamping services and is the first commercial deployment of a blockchain. Dr. Haber’s work in cryptographic time-stamping was later adopted by Satoshi Nakamoto as the basic mechanism for data integrity in Bitcoin.
Here you see the last page of Bitcoin’s whitepaper, referring to Stuart Haber’s work: