ARTS Check-in Day 5

A：268. Missing Number #

Given an array nums containing n numbers in the range [0, n], find the number that is missing from the range [0, n].
Example 1:
Input: nums = [3,0,1]
Output: 2
Explanation: n = 3, since there are 3 numbers, all numbers are in the range [0,3]. 2 is the missing number because it does not appear in nums.
Example 2:
Input: nums = [0,1]
Output: 2
Explanation: n = 2, since there are 2 numbers, all numbers are in the range [0,2]. 2 is the missing number because it does not appear in nums.
Example 3:
Input: nums = [9,6,4,2,3,5,7,0,1]
Output: 8
Explanation: n = 9, since there are 9 numbers, all numbers are in the range [0,9]. 8 is the missing number because it does not appear in nums.
Example 4:
Input: nums = [0]
Output: 1
Explanation: n = 1, since there is 1 number, all numbers are in the range [0,1]. 1 is the missing number because it does not appear in nums.

There isn't much time left for the check-in, so the writing is a bit rushed and not very good.

function missingNumber(nums: number[]): number {
  const maxLength = nums.length
  const allNums: number[] = new Array(maxLength).fill(0).map((_, index) => index)
  allNums.push(maxLength)
  const missNums = allNums.filter(num => !nums.includes(num))
  if (missNums.length === 1) {
    return missNums[0]
  }
  throw new Error('no result')
};

The submission result is:

122/122 cases passed (700 ms)
Your runtime beats 5.49 % of typescript submissions
Your memory usage beats 21.98 % of typescript submissions (45.8 MB)

The official solution is more efficient and also uses a hash set, as both adding and looking up elements in a hash set have a time complexity of O(1), making it more efficient:

function missingNumber(nums: number[]): number {
  const set = new Set()
  const n: number = nums.length;
  for (let i = 0; i < n; i++) {
    set.add(nums[i])
  }
  let missing: number = -1
  for (let i = 0; i <= n; i++) {
    if (!set.has(i)) {
      missing = i
      break
    }
  }
  return missing
}

First, each element is recorded in the Set, and then compared with the existing values in the Set in numerical order to find the missing one directly. The benefits of using a hash set are felt.

Due to the recent popularity of large language models (LLMs), the author has become curious about them and wants to know how they will impact work and how to use LLMs in software delivery practices. This is a record of her findings after some exploration and practice.

Toolchain#

For a technology that is still evolving, it is necessary to establish a mental model to understand how it works, which helps in processing the massive influx of information. What types of problems does it solve? What parts need to be pieced together to solve the problem? How do they combine?

How tools are categorized:
In the mental model, tools are categorized as follows to support programming work:

Task type: Quickly look up information in context; generate code; reasoning about code (explaining or identifying issues); code transformation (e.g., obtaining documentation or diagrams)
Interaction mode: chat window; inline assistant (like GitHub Copilot); command line
Prompt composition: user input from scratch; combining user input and context
Model attributes: Is the model used for code generation tasks? What languages can it generate code for? When was it trained, and how fresh is the information? The model's parameter count; the model's context length limit; how the model's content filtering works, and who is serving it
Hosting method: Is it a product hosted by a commercial company? Which open-source tools can connect to large language model services? Self-built tools connected to large language model services; self-built tools fine-tuned, self-built large language model API

The author analyzed GitHub Copilot, GitHub Copilot Chat, ChatGPT, etc., along these dimensions:

Median Function - The Story of Three Functions#

This is a story about generating a median function, which can illustrate the usefulness and limitations of LLM assistance.

Typically, the author searches for "median JS function" to implement it, but this time she tried using GitHub Copilot for assistance. Copilot first generated the correct function signature and then provided three different implementations.

The first implementation is: first sort, then take the middle number; if it's an even sequence, take the average of the two middle numbers. The goal is achieved, but the problem is that the sort method is not immutable and will change the order of the original array, which may introduce hard-to-trace bugs.

The second implementation is: slice the original array and then use the first implementation, which does not change the original array, so there are no issues.

The third implementation is: slice the original array and directly take the number at Math.floor(sorted.length / 2), which will have issues when the array length is even, as it does not take the average of the two middle numbers.

From the performance of these three functions, it is still important to understand what our functions are doing and to write sufficient test cases to test the generated code.

Is using Copilot to generate code different from searching and then copying and pasting? When we search and copy-paste, we know the source of the code, allowing us to judge the reliability of the pasted code through votes and comments on platforms like Stack Overflow, but with Copilot-generated code, we lack a basis for judgment.

Should we generate test cases, code, or both? The author used Copilot to generate test cases for this median function, and the results were indeed good. For a task of this complexity, she is willing to use Copilot to generate both cases and code. However, for more complex functions, she prefers to write test cases herself to ensure quality and better organize the structure of the test cases, avoiding omissions even if part of the content is given to Copilot for generation.

Can Copilot help me fix errors in generated code? After asking Copilot to refactor, it indeed provided some reasonable suggestions, including giving ChatGPT, which could directly point out the errors. However, all of this is predicated on my awareness that the code still needs improvement and correction.

Conclusion:

You must clearly know what you are doing to judge how to handle the generated code. Just like in the above example, one must understand what a median is and what edge cases to consider to obtain reasonable test cases and code.
Copilot itself can improve problematic code again, which raises the question of whether we need to engage in a dialogue with AI tools while using them.
Even if there are doubts about the quality of the generated test cases and code, we can choose not to adopt its code, using only the generated code to help cover omitted scenarios in test cases or to assist in inferring our own written code.

When is an inline code assistant more useful?#

For inline code assistants, opinions vary on their usefulness, depending on the specific context and expectations.

What does "useful" specifically mean? Here, useful means that after using it, I can solve problems faster with comparable quality. This applies not only when writing code but also includes subsequent manual reviews and rework, as well as quality issues.

Factors Affecting the Usefulness of Generated Content#

The following factors are relatively objective, but the author elaborated on each point with subjective insights in the original text; please read the original for more details.

More Popular Technology Stack

The more popular the technology stack used, the richer the dataset about that technology stack in the model, which means that data for Java and JS is more abundant than for Lua. However, some colleagues have also achieved good testing results in languages like Rust, where data is less abundant.

Simple and Common Problems

Simple and common problems generally include the following examples: the problem itself is very simple; there are common solution patterns in context; templated code; repetitive patterns.

This is helpful for scenarios where repetitive code is often handwritten, but for those familiar with advanced IDE features, shortcuts, and multi-cursor operations, the reduction of repetitive work by Copilot may not be as significant, and it may even reduce motivation for refactoring.

Smaller Problems

Smaller problems are easier to review the generated code. As the scale increases, both the problems and the code become harder to understand, often requiring multiple steps, which increases the risk of insufficient test coverage and may introduce unnecessary content.

More Experienced Developers

Experienced developers can better judge the quality of generated code and use it efficiently.

Greater Error Tolerance

Judging the quality and correctness of generated code is very important, but there is also the issue of our tolerance for quality and correctness problems in our scenarios. In situations with high error tolerance, suggestions can be more readily accepted, but in low-tolerance scenarios, such as security policies like Content-Security-Policy HTTP headers, we find it harder to adopt Copilot's suggestions.

Conclusion:
Using inline code assistants has suitable applications, but many factors affect their usefulness, and the techniques for using them cannot be fully explained through a training course or blog post. Only through extensive use, even exploring beyond useful boundaries, can we better utilize this tool.

When can an inline code assistant become a hindrance?#

Having discussed when Copilot is useful, it naturally follows that there are situations where it is not useful. For example:

Amplifying Bad or Outdated Practices

Since Copilot references any content in the associated context, such as open files in the same language, it may also bring over bad code examples.

For instance, if we want to refactor a codebase, but Copilot keeps adopting old patterns because the old pattern code is still widely present in the codebase, the author calls this "poisoned context," and there is currently no good solution for this situation.

Conclusion:
AI's hope to improve the prompt context through the code in the codebase has both benefits and drawbacks, which is one of the reasons developers no longer trust Copilot.

Review Fatigue from Generated Code

Using Copilot means repeatedly reviewing the small chunks of generated code. Typically, the flow of programming involves continuously writing the solutions we have in mind. With Copilot, we need to continuously read and review the generated code, which is a different cognitive approach and lacks the enjoyment of continuous code production, leading to review fatigue and a feeling of interrupted flow. If we do not address this review fatigue, we may start to judge the generated code carelessly.

Additionally, there may be other impacts:

Automation Bias: Once we have a good experience with generative AI, we may overtrust it.
Sunk Cost: Once we spend time using Copilot to generate some locally problematic code, we may be more inclined to spend 20 minutes getting that code to work with Copilot rather than rewriting it ourselves in 5 minutes.
Anchoring Effect: The suggestions given by Copilot are likely to anchor our thinking, influencing our subsequent thoughts. Therefore, it is also important to free ourselves from this cognitive influence and not be anchored by it.

Conclusion:
It is important not to let Copilot limit our thinking; we need to break free from its constraints. Otherwise, we may find ourselves navigating into a lake, driving our car into the water.

Code Assistants Cannot Replace Pair Programming#

Although inline code assistants and chatbots can interact with developers in a human-like manner to a large extent, the author does not believe that this practice can replace pair programming.

The belief that programming assistants and robots can replace pair programming may stem from some conceptual misunderstandings. Here are the advantages of pair programming:

Programming assistants can have a significant impact in the first area of the above image: "1+1>2." They can help us overcome difficulties, get started faster, and achieve workable results, allowing us to focus more on designing overall solutions while also sharing more knowledge with us.

However, pair programming is not just about sharing explicit knowledge in code; it also includes implicit knowledge such as the evolution history of the codebase, which cannot be obtained from large language models. Additionally, pair programming can improve team workflows, avoid wasted time, and make continuous integration easier. It also helps us develop communication, empathy, and feedback skills. It provides valuable opportunities for teams to connect in remote work.

Conclusion:
Programming assistants can only cover a small portion of the goals and benefits of pair programming because pair programming not only helps individuals but also enhances the overall performance of teams. Pair programming can improve the communication and collaboration levels of the entire team, enhance workflows, and increase code ownership awareness. Furthermore, it does not encounter the aforementioned drawbacks experienced when not using programming assistants.

Using GitHub Copilot for TDD#

After using AI programming assistants, do we no longer need tests? Is TDD outdated? To answer this question, we will test the two benefits that TDD brings to software development: 1. Providing good feedback; 2. Using a divide-and-conquer approach to solve problems.

Providing Good Feedback

Good feedback needs to be fast and accurate. Neither manual testing, documentation, nor code reviews can provide feedback as quickly as unit tests. Therefore, whether it is manually written code or AI-generated code, quick feedback is needed to verify correctness and quality.

Divide and Conquer

Divide and conquer is a quick approach to solving large problems. It also enables the implementation of continuous integration, trunk-based development, and continuous delivery.

Even in the case of AI-generated code, an iterative development model is still adopted. More importantly, there is a notion that LLMs can enhance the quality of model outputs with similar CoT prompts, which aligns well with the principles advocated in TDD.

Tips for Using GitHub Copilot for TDD#

Start

Starting from an empty test file does not mean starting from an empty context; there are usually related user story notes and discussions with pair programming partners.

These are all things that Copilot "cannot see"; it can only handle spelling, syntax, and similar errors. Therefore, we need to provide it with this context:

Provide mocks
Write down acceptance criteria
Hypothetical guidance: for example, no GUI is needed, use object-oriented programming or functional programming

Additionally, Copilot uses open files as context, so we need to keep both the test file and implementation file open.

Start by writing a descriptive test case name; the more descriptive the name, the better Copilot's generated test code will perform.

The Given-When-Then structure can help us in three ways: first, it reminds us to provide business context; second, it gives Copilot the opportunity to generate expressive case names; finally, it allows us to see Copilot's understanding of the problem.

For example, when naming the test case, if Copilot suggests "Assuming the user... clicks the purchase button," it indicates that Copilot does not fully understand our intent. We can add context descriptions at the top of the file, such as "Assuming no GUI is needed" or "This is an API test suite for a Python Flask application."

Green

Now we can start implementing the code. An existing, expressive, and readable test case can maximize Copilot's potential. At this point, Copilot has more input to work with and does not need to "learn to walk" like a baby.

Filling in test cases: At this time, Copilot is likely to generate larger blocks of code rather than "taking small steps," and this code may not have complete test cases. In this case, we can go back and supplement the test cases. Although this is not the standard TDD process, it does not seem to present major issues at the moment.

Deleting and regenerating: For code that needs to be re-implemented, the best way to get effective results from Copilot is to delete the implementation and let it rewrite it, as there are test cases in place, making even a rewrite relatively safe. If this fails, deleting the content and writing comments step by step may help. If it still does not work, we may need to turn off Copilot and write it ourselves.

Refactor

In TDD, refactoring means making incremental changes to improve the maintainability and scalability of the code while keeping all behaviors consistent.

For this, Copilot may have some limitations. Consider the following two scenarios:

"I know what refactoring I want to do": Using IDE shortcuts and features, such as multi-cursor, function extraction, renaming, etc., may be faster than refactoring with Copilot.
"I don't know where to start refactoring": For small, localized refactoring, Copilot can provide suggestions, but it still cannot handle large-scale refactoring well.

In some scenarios where we know what we want to do but just can't remember the syntax and API, Copilot can be very helpful, automatically completing tasks that would otherwise require a search.

Conclusion:
As the saying goes, "garbage in, garbage out," this applies to data engineers and generative AI LLMs alike. In the author's practice, TDD ensures high quality in the codebase, and this high-quality input allows Copilot to perform better. Therefore, it is recommended to use TDD when using Copilot.

T：Umami Website Visit Statistics Analysis#

A framework for website visit statistics analysis that can be deployed for use. The current xLog blog supports configuring umami, and this site uses a self-deployed service.

S：Reading “Li Xiaolai: My Reading Experience” Notes#

Reading has two fundamental values:
• Recognizing reality and thinking about the future;
• Preferring knowledge that is reproductive.

Reference:

ARTS Check-in Activity

A：268. Missing Number#

R：Exploring Generative AI#