Google’s Bard, Open AI’s ChatGPT, and Microsoft’s Bing depend heavily on the data provided by legal blogs and open law for their AI development.
This morning, Robert Ambrogi – a veteran tech blogger and legal journalist – wrote a piece about it.
Though AI is trained on large language models (LLMsLLM), little is known of the data on which the AI is trained.
According to Ambrogi, this black box has now been opened due to a report by The Washington Post. It was previously unrevealed, but per the publication by The Washington Post, this black box is no longer a mystery.
Using the C4 data set from Google, a compilation of 15 million websites, the Allen Institute for AI analyzed how to teach high-profile English-language AIs like Google’s T5 and Facebook’s LLaMA.
This process categorized websites according to the kind of content they produced (e.g., journalism or entertainment). It ranked them based on the number of tokens extracted from their data sets for the analysis, including pieces of text used for understanding and processing unstructured information.
Ambrogi discovered that his blog, LawSites, was ranked 63,769th among the websites used to build the dataset. In queries related to law and legal matters such as court and cases, he noticed a variety of influential legal blogs.
The use of AI in legal research is an exciting development that has the potential to transform the way legal information is accessed and understood. By leveraging the power of AI to extract insights from law blogs and open law, legal sites can provide more accurate and comprehensive information to their users and help to level the playing field in the legal system. However, it is important to approach AI in law cautiously and prioritize ethical considerations to ensure the benefits are realized without unintended consequences.
Source: AI in Publishing