James Tsang

James Tsang

A developer.
github
twitter
tg_channel

ARTS Check-in Day 7

A: 290. Word Pattern#

Given a pattern and a string s, determine if s follows the same pattern.
Here, "follow" means a full match, such that there is a one-to-one correspondence between a letter in pattern and a non-empty word in s.
Example 1:
Input: pattern = "abba", s = "dog cat cat dog"
Output: true
Example 2:
Input = "abba", s = "dog cat cat fish"
Output: false
Example 3:
Input: pattern = "aaaa", s = "dog cat cat dog"
Output: false

function wordPattern(pattern: string, s: string): boolean {
  const arr = s.split(' ')
  if (pattern.length !== arr.length) {
    return false
  }
  const map = new Map()
  const patternAppearArr = Array.from(new Set(pattern.split('')))
  let resultPattern = ''
  let index = -1
  for (let i = 0; i < arr.length; i += 1) {
    const char = arr[i]
    if (map.has(char)) {
      resultPattern += map.get(char)
    } else {
      index += 1
      map.set(char, patternAppearArr[index])
      resultPattern += map.get(char)
    }
  }
  return resultPattern === pattern
}

Submission result:

41/41 cases passed (68 ms)
Your runtime beats 44.12 % of typescript submissions
Your memory usage beats 73.53 % of typescript submissions (42.1 MB)

Record the letters that appear in the pattern and deduplicate them. Then match each word in the string to the corresponding letter representation, and store the matching result in a map. If a word has appeared before, use the recorded value; otherwise, move the representation one step forward and store it in the map. Finally, obtain the pattern that represents the current string, and check if it matches the original pattern.

R: How to Match LLM Patterns to Problems#

The author previously wrote an article discussing the patterns for building LLM systems and applications, and received some questions about how to match specific problems with patterns. This article further explores the potential issues people may encounter when applying these patterns.

External or Internal Models, Strong or Weak Data Dependencies#

External models are models that we cannot fully control. We cannot fine-tune them and they are subject to limitations such as call speed and context length. We may also be concerned about sending confidential or proprietary data to them. Nevertheless, external models currently perform at a leading level.

Internal models are models that we develop and deploy ourselves. They do not have the limitations of external models and are generally trained using open-source models. However, the performance of these models often lags behind the commercial models of third-party companies by several months or even years.

To determine the patterns to apply LLM, we need to understand the role of data in the application scenario: is data a primary component or a byproduct? Or is data an irrelevant factor?

For example, model evaluation and fine-tuning are strongly dependent on data. Caching, "defensive measures to ensure user experience," and "guardrail patterns" to ensure output quality are more related to infrastructure.

RAG (Retrieval Augmented Generation) and user feedback on mobile phones are in the middle. RAG requires filling in prompts for in-context learning but also relies on retrieval index services. Fine-tuning with user feedback data requires designing user interfaces and performing data analysis and dependent data pipelines.

Matching Patterns to Problems#

Let's take a look at which patterns to apply to specific problems:

  • Lack of performance measurement for specific tasks: Whether it is an external or internal model, when we change prompts, fine-tune models, or improve RAG processes, we need a way to measure how much improvement has been achieved and to perform regression testing. In addition, we need to measure whether users like or dislike new model features and the impact of our adjustments on users. For these problems, we can use the "evaluation" and "collect user feedback" tasks.
  • Poor performance of external models: This may be due to outdated model training data, lack of proprietary data for the model, or insufficient context during generation. For these problems, RAG and evaluation can be used. Evaluation is used to measure the performance improvement achieved after retrieval.
  • Poor performance of internal models: The model may generate non-factual responses, off-topic responses, or responses that are not fluent enough in tasks such as extraction and summarization. In this case, fine-tuning and fine-tuning with user feedback can be considered.
  • Limited by external models: This may be due to technical limitations such as call frequency and token length, or it may be due to the inability to send machine data or the cost of API calls. In such cases, it is necessary to contact LLM vendors to try local deployment, or to fine-tune and evaluate using user feedback.
  • Delay exceeds user experience requirements: Some use cases may require the model to return within a few hundred milliseconds, including the time to control data quality. While streaming output can optimize user experience, it may not be suitable for all scenarios, such as non-chatbot scenarios. In this case, caching, guardrail patterns, etc. can be used.
  • Ensuring customer experience: LLM may not always produce the desired accurate output, so it is necessary to implement user experience safeguards to handle errors, such as setting correct expectations and effective ignoring and correction. In addition, it is important to acknowledge and mitigate the impact of errors and enter the fault-tolerant process. This requires defensive user experience and fine-tuning with user feedback to understand and address issues.
  • Lack of visibility into the impact on users: Sometimes we deploy LLM applications, but the actual effect may deteriorate. To determine whether the effect has improved or worsened, monitoring and collecting user feedback are needed.

T: pyannote-audio Speech Annotation#

A speech annotation model that can distinguish different speakers in an audio and annotate the time intervals of each speaker's speech. This can be used for audio segmentation and then input into the whisper model for speech-to-text conversion, achieving the conversion of a multi-speaker dialogue audio into text.

S: A Technique for Speed Reading#

When reading, try to avoid backtracking, as this forces our thinking to keep up with the article and switch along with it. This may be initially challenging, but for content that does not require a high level of understanding, it is possible to understand the parts that were not fully grasped earlier through context.


Reference:

  1. ARTS Weekly Challenge
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.