# Core ML Transformer Modelleri: iOS'ta LLM Devrimi
Transformer mimarisi, yapay zekanin cehresini degistirdi. ChatGPT, BERT, T5 gibi modeller transformer'lara dayanir. Peki bu guclu modelleri iOS cihazlarda, tamamen on-device olarak calistirabilir miyiz? Evet! Apple'in Core ML framework'u ve Neural Engine'i sayesinde transformer modelleri mobil cihazlarda verimli sekilde calistirabiliriz. Bu rehberde transformer mimarisini, model donusumunu ve iOS entegrasyonunu derinlemesine inceleyecegiz.
"On-device LLM, gizlilik ve performansi bir arada sunan gelecek nesil mobil AI'in temelidir." — Apple ML Research
Icindekiler
- Transformer Mimarisi Nedir?
- iOS icin Transformer Modelleri
- Model Donusumu ve Optimizasyon
- BERT Entegrasyonu
- Text Generation (GPT-tarzı)
- Summarization ve Translation
- Apple Intelligence ve Foundation Models
- Performance Benchmarks
- Okuyucu Odulu
- Sonuc
1. Transformer Mimarisi Nedir?
Transformer, 2017'de "Attention Is All You Need" makalesi ile tanitilan bir derin ogrenme mimarisidir. Self-attention mekanizmasi sayesinde dizideki her elemanin diger elemanlarla iliskisini paralel olarak hesaplar.
Transformer Turleri
Model Turu | Ornek | Gorev | Parametre |
|---|---|---|---|
Encoder-only | BERT, DistilBERT | Siniflandirma, NER | 66M-340M |
Decoder-only | GPT-2, LLaMA | Text generation | 117M-70B |
Encoder-Decoder | T5, BART | Translation, summarization | 60M-11B |
Vision Transformer | ViT, DeiT | Goruntu siniflandirma | 86M-632M |
Multimodal | CLIP, LLaVA | Goruntu+metin | 150M-13B |
On-device iOS icin genellikle kucuk modeller tercih edilir: DistilBERT (66M), TinyLLaMA (1.1B), Phi-2 (2.7B).
2. iOS icin Transformer Modelleri
Apple, iOS 17+ ile birlikte transformer modellerini daha iyi desteklemeye basladi. Neural Engine, attention mekanizmasi icin ozel optimizasyonlar iceriyor.
Desteklenen Model Boyutlari
Cihaz | Maks Model (RAM) | Oneri |
|---|---|---|
iPhone 13 (4GB) | ~500MB | DistilBERT, TinyBERT |
iPhone 14 Pro (6GB) | ~1.5GB | BERT-base, GPT-2 small |
iPhone 15 Pro (8GB) | ~3GB | Phi-2, TinyLLaMA |
iPad Pro M4 (16GB) | ~8GB | LLaMA 7B (4-bit) |
3. Model Donusumu ve Optimizasyon
PyTorch veya Hugging Face modellerini Core ML'e donusturmek icin coremltools kullanilir.
swift
1// Swift tarafinda donusturulmus BERT modelini yukleme2import CoreML3 4class BERTModelLoader {5 private var model: MLModel?6 7 func loadModel() throws {8 let config = MLModelConfiguration()9 config.computeUnits = .all10 11 // Compile edilmis modeli yukle12 guard let modelURL = Bundle.main.url(13 forResource: "DistilBERT",14 withExtension: "mlmodelc"15 ) else {16 throw TransformerError.modelNotFound17 }18 19 self.model = try MLModel(contentsOf: modelURL, configuration: config)20 }21 22 func predict(inputIDs: MLMultiArray, attentionMask: MLMultiArray) throws -> MLMultiArray {23 guard let model = model else {24 throw TransformerError.modelNotLoaded25 }26 27 let input = try MLDictionaryFeatureProvider(dictionary: [28 "input_ids": MLFeatureValue(multiArray: inputIDs),29 "attention_mask": MLFeatureValue(multiArray: attentionMask)30 ])31 32 let output = try model.prediction(from: input)33 34 guard let logits = output.featureValue(for: "logits")?.multiArrayValue else {35 throw TransformerError.predictionFailed36 }37 38 return logits39 }40}41 42enum TransformerError: Error {43 case modelNotFound44 case modelNotLoaded45 case predictionFailed46 case tokenizationFailed47}4. BERT Entegrasyonu
BERT (Bidirectional Encoder Representations from Transformers), metin siniflandirma, soru cevaplama ve NER icin idealdir.
swift
1import CoreML2 3class BERTClassifier {4 private let model: MLModel5 private let tokenizer: BERTTokenizer6 private let maxLength = 1287 8 init() throws {9 let config = MLModelConfiguration()10 config.computeUnits = .all11 12 guard let modelURL = Bundle.main.url(13 forResource: "BERTClassifier",14 withExtension: "mlmodelc"15 ) else {16 throw TransformerError.modelNotFound17 }18 19 self.model = try MLModel(contentsOf: modelURL, configuration: config)20 self.tokenizer = BERTTokenizer()21 }22 23 func classify(text: String) throws -> ClassificationResult {24 // Tokenization25 let tokens = tokenizer.tokenize(text)26 let inputIDs = try createMultiArray(from: tokens.inputIDs)27 let attentionMask = try createMultiArray(from: tokens.attentionMask)28 29 // Prediction30 let input = try MLDictionaryFeatureProvider(dictionary: [31 "input_ids": MLFeatureValue(multiArray: inputIDs),32 "attention_mask": MLFeatureValue(multiArray: attentionMask)33 ])34 35 let output = try model.prediction(from: input)36 37 guard let logits = output.featureValue(for: "logits")?.multiArrayValue else {38 throw TransformerError.predictionFailed39 }40 41 // Softmax ve sonuc42 let probabilities = softmax(logits)43 let maxIndex = argmax(probabilities)44 45 return ClassificationResult(46 label: labels[maxIndex],47 confidence: probabilities[maxIndex]48 )49 }50 51 private func createMultiArray(from array: [Int]) throws -> MLMultiArray {52 let mlArray = try MLMultiArray(shape: [1, NSNumber(value: maxLength)], dataType: .int32)53 for (index, value) in array.prefix(maxLength).enumerated() {54 mlArray[index] = NSNumber(value: value)55 }56 return mlArray57 }58 59 private func softmax(_ logits: MLMultiArray) -> [Double] {60 var values: [Double] = []61 for i in 0..<logits.count {62 values.append(logits[i].doubleValue)63 }64 let maxVal = values.max() ?? 065 let expValues = values.map { exp($0 - maxVal) }66 let sumExp = expValues.reduce(0, +)67 return expValues.map { $0 / sumExp }68 }69 70 private func argmax(_ array: [Double]) -> Int {71 var maxIdx = 072 var maxVal = array[0]73 for (i, val) in array.enumerated() where val > maxVal {74 maxVal = val75 maxIdx = i76 }77 return maxIdx78 }79 80 private let labels = ["Pozitif", "Negatif", "Notr"]81}82 83struct ClassificationResult {84 let label: String85 let confidence: Double86}Easter Egg
Gizli bir bilgi buldun!
Bu bölümde gizli bir bilgi var. Keşfetmek ister misin?
5. Text Generation
swift
1class TextGenerator {2 private let model: MLModel3 private let tokenizer: GPT2Tokenizer4 5 func generate(6 prompt: String,7 maxTokens: Int = 100,8 temperature: Double = 0.7,9 topK: Int = 4010 ) throws -> String {11 var inputTokens = tokenizer.encode(prompt)12 var generatedTokens: [Int] = []13 14 for _ in 0..<maxTokens {15 let inputArray = try createInputArray(inputTokens)16 let output = try model.prediction(from: inputArray)17 18 guard let logits = output.featureValue(for: "logits")?.multiArrayValue else {19 break20 }21 22 // Son token'in logit'lerini al23 let lastLogits = extractLastTokenLogits(logits, seqLength: inputTokens.count)24 25 // Temperature scaling26 let scaledLogits = lastLogits.map { $0 / temperature }27 28 // Top-K sampling29 let nextToken = topKSample(scaledLogits, k: topK)30 31 // EOS token kontrolu32 if nextToken == tokenizer.eosTokenID {33 break34 }35 36 generatedTokens.append(nextToken)37 inputTokens.append(nextToken)38 }39 40 return tokenizer.decode(generatedTokens)41 }42 43 private func topKSample(_ logits: [Double], k: Int) -> Int {44 let indexed = logits.enumerated().sorted { $0.element > $1.element }45 let topK = Array(indexed.prefix(k))46 let probs = softmax(topK.map { $0.element })47 48 // Weighted random sampling49 let random = Double.random(in: 0..<1)50 var cumulative = 0.051 for (i, prob) in probs.enumerated() {52 cumulative += prob53 if random < cumulative {54 return topK[i].offset55 }56 }57 return topK[0].offset58 }59 60 private func softmax(_ values: [Double]) -> [Double] {61 let maxVal = values.max() ?? 062 let expVals = values.map { exp($0 - maxVal) }63 let sum = expVals.reduce(0, +)64 return expVals.map { $0 / sum }65 }66 67 private func extractLastTokenLogits(_ logits: MLMultiArray, seqLength: Int) -> [Double] {68 // Implementation69 return []70 }71 72 private func createInputArray(_ tokens: [Int]) throws -> MLDictionaryFeatureProvider {73 let array = try MLMultiArray(shape: [1, NSNumber(value: tokens.count)], dataType: .int32)74 for (i, token) in tokens.enumerated() {75 array[i] = NSNumber(value: token)76 }77 return try MLDictionaryFeatureProvider(dictionary: [78 "input_ids": MLFeatureValue(multiArray: array)79 ])80 }81}6. Summarization ve Translation
Encoder-decoder modelleri (T5, mBART) ile metin ozetleme ve ceviri yapilabilir. Ancak bu modeller genellikle buyuk oldugundan, mobilde distilled versiyonlari tercih edilmelidir.
7. Apple Intelligence ve Foundation Models
iOS 18+ ile Apple Intelligence, cihaz uzerinde calisan foundation model'ler sunuyor. Developers icin Foundation Models framework'u ile bu modellere erisim saglaniyor.
8. Performance Benchmarks
Model | Boyut | iPhone 15 Pro | iPad Pro M4 | Gorev |
|---|---|---|---|---|
DistilBERT | 250MB | 15ms | 8ms | Classification |
BERT-base | 420MB | 35ms | 18ms | Classification |
GPT-2 Small | 500MB | 50ms/token | 25ms/token | Generation |
TinyLLaMA 1.1B | 650MB | 80ms/token | 35ms/token | Generation |
Phi-2 (4-bit) | 1.6GB | 120ms/token | 50ms/token | Generation |
ALTIN İPUCU
Bu yazının en değerli bilgisi
Bu ipucu, yazının en önemli çıkarımını içeriyor.
Okuyucu Ödülü
Tebrikler! Bu yazıyı sonuna kadar okuduğun için sana özel bir hediyem var:
11. Sonuc ve Oneriler
Transformer modelleri iOS'ta yeni bir cag aciyor. Apple'in Neural Engine'i her nesilde daha guclu hale geliyor ve on-device LLM'ler artik gerceklik. Kucuk modellerle baslayip (DistilBERT, TinyLLaMA), kullanici deneyimini olcumleyerek buyuyun. Apple Intelligence API'lerini de takip edin — gelecekte daha guclu on-device modeller gelecek.

