Why Your AI Can't Parse Your Mobile Code (And How Tree-sitter Fixes It)

🎧
Listen to this article 9 min
Download MP3

You have used AI coding assistants on your Python or TypeScript projects. They work. You search for "authentication middleware" and the right files appear. You ask about the database connection logic and the AI finds it. Then you switch to your Android or iOS project and everything falls apart.

Search results are garbage. The AI returns files that mention the word "user" somewhere but misses the actual UserRepository class you need. It finds half of a method signature. It completely ignores the Kotlin extension function that does exactly what you asked for. This is not your imagination. AI coding tools genuinely perform worse on mobile codebases, and there is a technical reason why.

The Regex Trap

Most AI code tools do not actually parse your code. They use regex patterns and heuristics to identify functions, classes, and methods. For Python, this works reasonably well. A function starts with def, has a predictable structure, and ends when indentation returns to the previous level. TypeScript is similar: curly braces, standard keywords, consistent patterns.

Mobile languages are different. Java and Kotlin are annotation-heavy. Swift has property wrappers. Both have deeply nested generics, lambda expressions with implicit parameters, and syntactic sugar that makes regex-based parsing unreliable.

Consider what happens when a regex-based parser encounters this Kotlin code:

@HiltViewModel
class UserProfileViewModel @Inject constructor(
    private val userRepository: UserRepository,
    private val analyticsTracker: AnalyticsTracker,
    @IoDispatcher private val ioDispatcher: CoroutineDispatcher
) : ViewModel() {

    private val _uiState = MutableStateFlow<UiState<UserProfile>>(UiState.Loading)
    val uiState: StateFlow<UiState<UserProfile>> = _uiState.asStateFlow()

    fun loadProfile(userId: String) = viewModelScope.launch(ioDispatcher) {
        userRepository.getUser(userId)
            .catch { e -> _uiState.value = UiState.Error(e.message ?: "Unknown error") }
            .collect { user ->
                _uiState.value = UiState.Success(user)
            }
    }
}

A regex parser looking for "class X {" will get confused by the constructor injection syntax. It might think the class ends at the first ). It will struggle with the nested generics in MutableStateFlow<UiState<UserProfile>>. The lambda inside catch might be extracted as a separate entity. The collect block's relationship to loadProfile becomes unclear.

The result: when you search for "user profile loading," the AI might return the class declaration but not the loadProfile method. Or it returns a chunk that starts mid-function. Or it misses the whole file because the heuristics broke on the annotation chain.

Specific Failure Modes

Nested Generics

Java and Kotlin use generics extensively. Map<String, List<Pair<Int, UserData>>> is normal. Regex parsers that look for closing brackets get confused by nested angle brackets. They either grab too little or too much, producing code chunks that are semantically incomplete.

Kotlin Data Classes with Complex Types

A simple data class like data class User(val name: String) parses fine. But real Android code has defaults, nullability, and nested types:

data class NetworkResponse<T>(
    val data: T? = null,
    val error: ErrorWrapper? = null,
    val metadata: Map<String, Any> = emptyMap(),
    val timestamp: Instant = Clock.System.now()
)

The default values with function calls, the nullable types, and the generic parameter combine to break naive parsing. The chunk boundary might land between properties, producing an incomplete definition that confuses downstream semantic search.

Swift Property Wrappers

Swift's property wrappers add another layer of complexity:

@MainActor
class ProfileViewModel: ObservableObject {
    @Published private(set) var profile: UserProfile?
    @AppStorage("lastSyncDate") var lastSync: Date = Date()
    @Environment(\.dismiss) private var dismiss

    func refresh() async throws {
        profile = try await networkService.fetchProfile()
    }
}

Each @ symbol introduces a decorator that modifies the next declaration. A regex parser might see @Published and private(set) as separate things, or miss that @AppStorage creates a persisted property. The actual semantics of what this class does get lost.

Android ViewBinding and Compose

Modern Android uses generated code extensively. ViewBinding creates accessor classes. Jetpack Compose has deeply nested @Composable functions with lambdas:

@Composable
fun ProfileScreen(
    viewModel: ProfileViewModel = hiltViewModel(),
    onNavigateToSettings: () -> Unit
) {
    val uiState by viewModel.uiState.collectAsStateWithLifecycle()

    Scaffold(
        topBar = {
            TopAppBar(
                title = { Text("Profile") },
                actions = {
                    IconButton(onClick = onNavigateToSettings) {
                        Icon(Icons.Default.Settings, contentDescription = "Settings")
                    }
                }
            )
        }
    ) { padding ->
        when (val state = uiState) {
            is UiState.Loading -> LoadingIndicator()
            is UiState.Success -> ProfileContent(state.data, Modifier.padding(padding))
            is UiState.Error -> ErrorMessage(state.message)
        }
    }
}

This is a single function, but it contains nested lambdas, inline composables, when expressions, and delegate properties. A regex parser cannot tell where one logical unit ends and another begins. The semantic meaning of "profile screen" spans the entire function, but naive chunking might split it arbitrarily.

What Tree-sitter Actually Does

Tree-sitter is a parser generator and incremental parsing library. Instead of using regex patterns to guess at code structure, it uses actual language grammars to build an Abstract Syntax Tree (AST). The difference is fundamental.

When tree-sitter parses the Kotlin ViewModel example from earlier, it does not pattern-match against "class" followed by text. It recognizes:

  • An annotation (@HiltViewModel) attached to a class declaration
  • A primary constructor with injected parameters, each with their own annotations
  • A class body containing property declarations and function definitions
  • Each property's type, including nested generic parameters
  • Each function's full scope, including lambdas and flow operators

The parser knows that loadProfile is a function that returns Job (from viewModelScope.launch), that its body extends through the collect block, and that the nested lambdas are part of this function, not separate entities.

This matters for code search because it means we can extract semantically complete units. When we chunk the codebase for embedding, we get whole functions, whole classes, whole meaningful blocks. The embedding model sees complete code, not arbitrary slices.

What This Means for Semantic Search

Semantic search works by embedding code chunks into vector space, where similar code clusters together. The quality of search depends entirely on what gets embedded. If your chunks are broken, your search is broken.

With regex-based parsing, you might embed:

fun loadProfile(userId: String) = viewModelScope.launch(ioDispatcher) {
        userRepository.getUser(userId)

That is an incomplete function. The embedding captures "load profile, get user" but misses the error handling, the state updates, and the flow collection. When you search for "error handling in profile loading," this chunk will not match well because the error handling is in a different chunk.

With tree-sitter parsing, you embed the complete function. The embedding captures the full semantic content: loading a profile, handling errors, updating UI state. Search queries find the right code because the right code is actually in the index.

The same principle applies to imports and dependencies. Tree-sitter knows exactly which classes are imported, which allows the search system to boost results that share dependencies with your current file. If you are in a ViewModel and search for "user data," files that import the same repository classes rank higher.

How Pyckle Uses Tree-sitter

When you run index_codebase() on a mobile project, Pyckle detects Java, Kotlin, and Swift files and routes them through tree-sitter parsers instead of the regex-based fallback. The indexer:

  1. Parses each file into an AST using language-specific grammars
  2. Extracts semantic units: classes, functions, extensions, protocols, composables
  3. Preserves relationships: which class contains which method, which function calls which dependency
  4. Chunks intelligently: respecting AST boundaries so no chunk splits a function or class mid-definition
  5. Embeds complete units: each vector represents a whole, meaningful piece of code

The result is search that actually works on mobile codebases. When you search for "network retry logic," you get the retry interceptor, not a random line that happens to contain "retry." When you search for "compose animation," you get complete @Composable functions, not fragments.

The Technical Details

Tree-sitter grammars are available for most languages. We use:

  • tree-sitter-java for Java files
  • tree-sitter-kotlin for Kotlin, including Compose
  • tree-sitter-swift for Swift and SwiftUI

Parsing is fast. Tree-sitter was designed for IDE use cases where parsing happens on every keystroke. Indexing a 100,000-line mobile codebase takes seconds, not minutes. The parsers are also incremental, so re-indexing after changes only re-parses modified files.

The AST extraction is configurable. By default, we extract:

  • Top-level classes and interfaces
  • All methods and functions
  • Extension functions (Kotlin) and extensions (Swift)
  • Companion objects and nested classes
  • Property declarations with complex types

Small utility functions group together. Large classes split at logical boundaries. The goal is chunks that are individually searchable and semantically coherent.

What You Get

Mobile developers who switch from general-purpose AI tools to Pyckle consistently report the same thing: search actually returns what they were looking for. The Android developer searching for "permission handling" finds the permission request flow. The iOS developer searching for "SwiftUI navigation" finds the navigation stack implementation. The code is there, correctly chunked, correctly embedded, correctly retrieved.

This is not magic. It is just correct parsing. Most AI tools took shortcuts because mobile languages are harder to parse, and the majority of their users write Python or JavaScript. Those shortcuts break mobile code. Tree-sitter does not.

If your AI coding tool cannot understand your mobile codebase, the problem is probably in the parser. Switching to a tool that uses real AST parsing is the fix.

Tree-sitter parsing for Java, Kotlin, and Swift is available on Pyckle Pro. Connect your mobile codebase at pyckle.co/pricing.html and see the difference correct parsing makes.

← Back to Blog

Go Deeper — Free Guides

Free Guides

Books & Guides — Code Intelligence

Free ebooks and guides on semantic search, embeddings, RAG, and AI-assisted development.

Browse all guides →