Databricks releases Dolly 2.0: The First Open Instruction-Following Language Learning Model

On July 11-12, come to San Francisco and meet with prominent businesspeople to discover how managers combine and make the most of Artificial Intelligence investments to achieve success.

Databricks has just unveiled their latest version of the large language model (LLM), Dolly 2.0, which provides ChatGPT-like human interactivity (aka instruction-following). This is only two weeks after the initial release of LLM.

The firm has declared that Dolly 2.0 is the first open-source, instruction-following LLM fine-tuned on a transparent and freely obtainable dataset that is also open-sourced for commercial objectives. This implies that Dolly 2.0 can be employed for business purposes without any requirement of paying for API access or distributing data with outside entities.

Ali Ghodsi, the CEO of Databricks, stated that other language models could be used commercially. Plus, he noted that users could modify and enhance the training data since it is shared under an open-source license.

Ali Ghodsi says:

“They won’t talk to you like Dolly 2.0.”

“So you can make your own version of Dolly,”

Examine the characteristics of the data bricks-dolly-15k dataset, which Databricks makes available for open-source use. This compilation has more than 15,000 records created by various Databricks employees, and it is reported to be the biggest natural language database ever released by a tech company.

Ali Ghodsi went on to say:

“First open source, human-generated instruction corpus specifically designed to enable large language to exhibit the magical interactivity of ChatGPT.”

Come to San Francisco on July 11-12 and hear from top business leaders about how they have effectively used AI in their investments and what mistakes to avoid.

Don’t miss out – sign up now! Recently, there has been a surge of open-source, ChatGPT-like Language Model (LLM) releases that have come out over the past two months. These include Meta’s LLaMA, which sparked the creation of similar models such as Alpaca, Koala, Vicuna, and Databricks’ Dolly 1.0.

Ghodsi noted that many “open” models were subject to “industrial capture,” as they had been trained on datasets designed to restrict business use. For example, he mentioned a 52,000 question-and-answer dataset from the Stanford Alpaca project, created using OpenAI’s ChatGPT output. He further pointed out that according to OpenAI’s terms of usage, it is not allowed for anyone to use their services to compete with them.

Databricks has come up with a solution to overcome this difficulty, named Dolly 2.0; it is a 12B parameter language model based on the open source Eleuthera AIpythia model family and was exclusively trained using a small, open source corpus of instruction records (data bricks-dolly-15k), which were formulated by Databricks personnel. The license terms for this dataset permit it to be used, modified, or extended for any objective, whether academic or commercial. Currently, models created using ChatGPT have existed in a questionable legal space.

Ali Ghodsi went on to say:

“The whole community has been tiptoeing around this and everybody’s releasing these models, but none of them could be used commercially,”

“So that’s why we’re super excited.”

Dolly 2.0: Small But Powerful – Unlock New Possibilities

The Databricks blog post highlighted that the 2.0 version of Dolly, although not cutting edge, can still demonstrate a surprisingly capable instruction-following behavior with a relatively small training corpus. It suggested that the effort and cost of creating robust AI technologies are much less than initially expected.

Ghodi continues to say:

“Everyone else wants to go bigger, but we’re actually interested in smaller,”

“Second, it’s high quality. We looked over all the answers.”

Ghodi expressed his confidence that Dolly 2.0 will create a chain reaction, resulting in others involved in the AI world joining forces and developing various options.

He described the limitation on commercial use as a major challenge but revealed they had found a way around it. He promised that with the 15,000 questions, people would find models that suddenly become interactive when applying them to various models.

VentureBeat strives to be a place where technical decision-makers can learn about cutting-edge enterprise technology and make important business decisions. Look into our Briefings for more information.

The Dolly 2.0 by Databricks represents a significant advancement in NLP and AI, offering a powerful and open instruction-following LLM for commercial use. Dolly 2.0’s capabilities hold the potential for revolutionizing various industries and applications, enabling businesses to leverage advanced language understanding capabilities in new and innovative ways.

However, it is crucial to approach Dolly 2.0 and similar AI technologies with ethical considerations, striving for responsible and ethical AI practices. With careful and responsible use, Dolly 2.0 has the potential to pave the way for more advanced and user-friendly interactions with AI systems, driving the advancement of AI-powered solutions in diverse domains.

Source: VentureBeat

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top