The Challenge: Building wc
I recently took on the Write Your Own wc Tool challenge from John Crickett's Coding Challenges. The goal is simple: recreate the classic Unix wc utility. While it sounds basic, implementing it correctly in Go taught me a lot about performance, Unicode, and idiomatic Go design.
The First Attempt: The "Java" Way
My initial implementation followed a very straightforward, almost "Java-like" approach. I used bufio.Reader to read the file line-by-line and strings.Fields to count words.
// Snippet of the first version
for {
line, err := fileReader.ReadString('\n')
// ... handling EOF
numBytes += len([]byte(line))
numLines++
numWords += len(strings.Fields(line))
}What was wrong?
- Memory Inefficiency:
ReadString('\n')reads the entire line into memory. If a file has an extremely long line, the program could crash with an OOM (Out of Memory) error. - Unnecessary Allocations:
[]byte(line)creates a copy of the string just to get its length, andstrings.Fieldscreates a whole new slice of strings for every single line. - Unicode Confusion: In my first pass, I wasn't fully distinguishing between bytes (
-c) and characters (-m).
The Evolution: Thinking in Go
After some review and research into how the original Linux wc (written in C) works, I realized I needed a single-pass, character-based approach.
Enter: The rune
Go's rune is an alias for int32 and represents a Unicode Code Point. Instead of reading lines, I switched to reading runes. This allowed me to handle multi-byte characters (like emojis) correctly and efficiently.
Optimization: The inWord State Machine
Instead of splitting strings, I implemented a simple state machine. We keep track of whether the current character is a space or part of a word. This is how the real wc avoids the overhead of string manipulation.
The Final Code
I refactored the code to use a Count struct and methods for better encapsulation. I also ensured that the output column order matches the standard wc (Lines, Words, Chars, Bytes) and that a "total" line is only shown when multiple files are processed.
package main
import (
"bufio"
"flag"
"fmt"
"io"
"os"
"unicode"
)
type Count struct {
Lines, Words, Bytes, Chars int
}
func (c *Count) Add(other Count) {
c.Lines += other.Lines
c.Words += other.Words
c.Bytes += other.Bytes
c.Chars += other.Chars
}
func main() {
showLines := flag.Bool("l", false, "Show number of lines")
showWords := flag.Bool("w", false, "Show number of words")
showBytes := flag.Bool("c", false, "Show number of bytes")
showChars := flag.Bool("m", false, "Show number of characters")
flag.Parse()
if !*showBytes && !*showLines && !*showWords && !*showChars {
*showBytes, *showLines, *showWords = true, true, true
}
files := flag.Args()
if len(files) == 0 {
count, err := countWords(os.Stdin)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
printWordCount(count, "", *showLines, *showWords, *showBytes, *showChars)
return
}
var totalCount Count
exitCode := 0
for _, fileName := range files {
file, err := os.Open(fileName)
if err != nil {
fmt.Fprintf(os.Stderr, "%s wc error: %v\n", fileName, err)
exitCode = 1
continue
}
count, err := countWords(file)
file.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "%s wc error: %v\n", fileName, err)
exitCode = 1
continue
}
printWordCount(count, fileName, *showLines, *showWords, *showBytes, *showChars)
totalCount.Add(count)
}
if len(files) > 1 {
printWordCount(totalCount, "total", *showLines, *showWords, *showBytes, *showChars)
}
os.Exit(exitCode)
}
func countWords(reader io.Reader) (Count, error) {
var count Count
bufferReader := bufio.NewReader(reader)
isCurrentlyOnWord := false
for {
r, size, err := bufferReader.ReadRune()
if err != nil {
if err == io.EOF {
break
}
return count, err
}
count.Bytes += size
count.Chars++
if r == '\n' {
count.Lines++
}
if unicode.IsSpace(r) {
isCurrentlyOnWord = false
} else if !isCurrentlyOnWord {
count.Words++
isCurrentlyOnWord = true
}
}
return count, nil
}
func printWordCount(count Count, fileName string, showLines, showWords, showBytes, showChars bool) {
if showLines {
fmt.Printf("%9d", count.Lines)
}
if showWords {
fmt.Printf("%9d", count.Words)
}
if showBytes {
fmt.Printf("%9d", count.Bytes)
}
if showChars {
fmt.Printf("%9d", count.Chars)
}
if len(fileName) > 0 {
fmt.Printf(" %s", fileName)
}
fmt.Println()
}Key Lessons Learned
- Scope and Naming: I learned that Go developers prefer short variable names (like
rfor reader,cfor count) when the scope is small. It reduces visual noise and keeps the focus on the logic. - Locality of Reasoning: By moving the counting logic into its own function, the code became much easier to test and reason about.
- Single Pass is King: For CLI utilities, avoiding multiple passes over the same data is the secret to performance.
- Safety in Types: Unlike C, Go's handling of UTF-8 strings as byte slices and the
runetype makes Unicode support almost transparent.
Building this tool was a great way to bridge the gap between "writing code that works" and "writing code that is production-ready." If you're looking to sharpen your Go skills, I highly recommend this challenge!
